{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem Sheet 8\n", "\n", "## Convexity and Stochastic Gradient Descent (SGD) method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task (a)\n", "Consider $\\boldsymbol\\theta=(\\theta_1,\\theta_2) \\in \\mathbb{R}^2$ and the function\n", "$$\n", "L(\\boldsymbol\\theta) = \\theta_1^2 + (\\theta_1 - \\theta_2)^2.\n", "$$\n", "- Prove that $L(\\boldsymbol\\theta)$ is convex on $\\mathbb{R}^2$.\n", "\n", "_Hint: use Lemma 4.24 and that_ $2ab \\le a^2+b^2$ _for any_ $a,b \\in \\mathbb{R}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task (b) (Warm-up)\n", "Consider the same function as in Task (a).\n", "- Find $m \\in \\mathbb{N}$, $\\mathbf{x}_i \\in \\mathbb{R}^2$ and $y_i \\in \\mathbb{R}$, $i=1,\\ldots,m$, such that $L(\\boldsymbol\\theta)$ is equal to the loss $L(\\boldsymbol\\theta) = \\frac{1}{m} \\sum_{i=1}^{m}(\\langle \\boldsymbol\\theta, \\mathbf{x}_i \\rangle - y_i)^2$ of the linear regression problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task (c)\n", "Consider the same function as in Task (a) and (b).\n", "- Prove that $L(\\boldsymbol\\theta)$ is $\\lambda$-strongly convex on $\\mathbb{R}^2$ and suggest a value of $\\lambda>0$.\n", " \n", "_Hint: use Task (b) and Theorem 4.32._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 1: SGD for polynomial regression\n", "\n", "- Write a Python function `sgd_poly(V, Y, theta0, t0, iters=10000)` which implements the Stochastic Gradient Descent method for the polynomial regression problem considered in Problem Sheet 7, taking \n", "$$\n", "\\mathbf{v}_k = \\nabla_{\\boldsymbol\\theta} \\tilde \\ell_{x_i,y_i}(\\boldsymbol\\theta_k) = 2 \\mathbf{v}_{i}^\\top (\\mathbf{v}_{i} \\boldsymbol\\theta - y_i),\n", "$$\n", "where $i=0,\\ldots,m-1$ is sampled uniformly at random in each iteration, and $V \\in \\mathbb{R}^{m \\times (d+1)}$ is as previously the Vandermonde matrix, and $\\mathbf{v}_i \\in \\mathbb{R}^{n+1}$ is the $i$-th row of $V$. Once again, the code to compute $V$ and `Y`$=[y_1,\\ldots,y_m]$ is as follows:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt \n", "\n", "def TrueSample():\n", " x = np.random.uniform(-1,1)\n", " y = x - x**3 + 0.1*np.random.randn()\n", " return x,y\n", "\n", "def features(x,n):\n", " powers = np.arange(n+1) # [0,1,2,...,n]\n", " powers = np.reshape(powers, (1, -1)) # Make it explicitly a row vector\n", " x = np.reshape(x, (-1, 1)) # Make it explicitly a column vector\n", " return x**powers # Python automatically broadcasts the vectors to each others' shapes \n", " # and takes the power between the resulting matrices elementwise\n", "\n", "Nsamples = 30\n", "Y = np.zeros(Nsamples)\n", "X = np.zeros(Nsamples)\n", "for i in range(Y.size):\n", " X[i],Y[i] = TrueSample()\n", "\n", "n = 3\n", "V = features(X,n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The other inputs to `sgd_poly` are the initial guess `theta0`$=\\boldsymbol\\theta_0$, the initial learning rate `t0`$=t_0$, which defines the learning rate $t_k = t_0/(k+1)$ in the $k$-th iteration, and the fixed number of iterations `iters`. The function `sgd_poly` should compute and return both `theta` **and** the loss `L(theta, V, Y)` as defined in Task 1 of Problem Sheet 7 at the final iteration.\n", "\n", "_Hint: you may find the_ [np.random.randint](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html) _function useful._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 2\n", "\n", "- Write a Python code that applies `sgd_poly` to compute $\\boldsymbol\\theta_k \\approx \\boldsymbol\\theta^* = \\arg\\min L(\\boldsymbol\\theta)$ for $L(\\boldsymbol\\theta)$ defined in Task 1, starting from $\\boldsymbol\\theta_0 = \\mathbf{0}$, using default `iters`, and $t_0$ which you should select from the range $1/4,1/2,1,2,4$ such that the algorithm produces the smallest loss on average over a few (up to 10) restarts. What is this $t_0$ and the corresponding loss?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 2 }