{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem Sheet 8\n",
    "\n",
    "## Convexity and Stochastic Gradient Descent (SGD) method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task (a)\n",
    "Consider $\\boldsymbol\\theta=(\\theta_1,\\theta_2) \\in \\mathbb{R}^2$ and the function\n",
    "$$\n",
    "L(\\boldsymbol\\theta) = \\theta_1^2 + (\\theta_1 - \\theta_2)^2.\n",
    "$$\n",
    "- Prove that $L(\\boldsymbol\\theta)$ is convex on $\\mathbb{R}^2$.\n",
    "\n",
    "_Hint: use Lemma 4.24 and that_ $2ab \\le a^2+b^2$ _for any_ $a,b \\in \\mathbb{R}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task (b) (Warm-up)\n",
    "Consider the same function as in Task (a).\n",
    "- Find $m \\in \\mathbb{N}$, $\\mathbf{x}_i \\in \\mathbb{R}^2$ and $y_i \\in \\mathbb{R}$, $i=1,\\ldots,m$, such that $L(\\boldsymbol\\theta)$ is equal to the loss $L(\\boldsymbol\\theta) = \\frac{1}{m} \\sum_{i=1}^{m}(\\langle \\boldsymbol\\theta, \\mathbf{x}_i \\rangle - y_i)^2$ of the linear regression problem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task (c)\n",
    "Consider the same function as in Task (a) and (b).\n",
    "- Prove that $L(\\boldsymbol\\theta)$ is $\\lambda$-strongly convex on $\\mathbb{R}^2$ and suggest a value of $\\lambda>0$.\n",
    " \n",
    "_Hint: use Task (b) and Theorem 4.32._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 1: SGD for polynomial regression\n",
    "\n",
    "- Write a Python function `sgd_poly(V, Y, theta0, t0, iters=10000)` which implements the Stochastic Gradient Descent method for the polynomial regression problem considered in Problem Sheet 7, taking \n",
    "$$\n",
    "\\mathbf{v}_k = \\nabla_{\\boldsymbol\\theta} \\tilde \\ell_{x_i,y_i}(\\boldsymbol\\theta_k) = 2 \\mathbf{v}_{i}^\\top (\\mathbf{v}_{i} \\boldsymbol\\theta - y_i),\n",
    "$$\n",
    "where $i=0,\\ldots,m-1$ is sampled uniformly at random in each iteration, and $V \\in \\mathbb{R}^{m \\times (d+1)}$ is as previously the Vandermonde matrix, and $\\mathbf{v}_i \\in \\mathbb{R}^{n+1}$ is the $i$-th row of $V$. Once again, the code to compute $V$ and `Y`$=[y_1,\\ldots,y_m]$ is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from matplotlib import pyplot as plt    \n",
    "\n",
    "def TrueSample():\n",
    "    x = np.random.uniform(-1,1)\n",
    "    y = x - x**3 + 0.1*np.random.randn()\n",
    "    return x,y\n",
    "\n",
    "def features(x,n):\n",
    "    powers = np.arange(n+1)               # [0,1,2,...,n]\n",
    "    powers = np.reshape(powers, (1, -1))  # Make it explicitly a row vector\n",
    "    x = np.reshape(x, (-1, 1))            # Make it explicitly a column vector\n",
    "    return x**powers                      # Python automatically broadcasts the vectors to each others' shapes \n",
    "                                          # and takes the power between the resulting matrices elementwise\n",
    "\n",
    "Nsamples = 30\n",
    "Y = np.zeros(Nsamples)\n",
    "X = np.zeros(Nsamples)\n",
    "for i in range(Y.size):\n",
    "    X[i],Y[i] = TrueSample()\n",
    "\n",
    "n = 3\n",
    "V = features(X,n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The other inputs to `sgd_poly` are the initial guess `theta0`$=\\boldsymbol\\theta_0$, the initial learning rate `t0`$=t_0$, which defines the learning rate $t_k = t_0/(k+1)$ in the $k$-th iteration, and the fixed number of iterations `iters`. The function `sgd_poly` should compute and return both `theta` **and** the loss `L(theta, V, Y)` as defined in Task 1 of Problem Sheet 7 at the final iteration.\n",
    "\n",
    "_Hint: you may find the_ [np.random.randint](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html) _function useful._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Task 2\n",
    "\n",
    "- Write a Python code that applies `sgd_poly` to compute $\\boldsymbol\\theta_k \\approx \\boldsymbol\\theta^* = \\arg\\min L(\\boldsymbol\\theta)$ for $L(\\boldsymbol\\theta)$ defined in Task 1, starting from $\\boldsymbol\\theta_0 = \\mathbf{0}$, using default `iters`, and $t_0$ which you should select from the range $1/4,1/2,1,2,4$ such that the algorithm produces the smallest loss on average over a few (up to 10) restarts. What is this $t_0$ and the corresponding loss?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}