{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Problem Sheet 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bias-variance decomposition\n", "\n", "## Task (a) (Warm-up)\n", "Assume that $h_{\\theta}(x) = \\theta x$, and $(X,Y) \\sim \\mathbb{P}$, where $Y = \\theta^{true} X + \\xi$ for some fixed \"ground truth\" parameter $\\theta^{true} \\in \\mathbb{R}$, and $\\xi$ is a random variable independent of $X$ with $\\mathbb{E}[\\xi] = 0$ and $\\mathrm{Var}(\\xi)<\\infty$. Let the dataset $D=\\{(x_1,y_1),\\ldots,(x_m,y_m)\\}$ contain $m$ independent samples of $(X,Y)$. Assume quadratic pointwise loss, $\\ell(y,\\hat y) = (y-\\hat y)^2.$\n", "- Find the empirical risk minimizer $\\theta^* = \\arg\\min_{\\theta \\in \\mathbb{R}} L_{D}(\\theta)$.\n", "- Since data are sampled at random, the entire dataset $D$, and hence $\\theta^*$ which depends only on $D$, can be also seen as realisations of random variables $\\Delta$ and $\\Theta^*$, respectively. Let $\\mathbb{E}_{\\Delta}$ denote the expectation with respect to the distribution of $\\Delta$ only. Prove that $\\mathbb{E}_{\\Delta}[\\Theta^*] = \\theta^{true}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task (b)\n", "- In the assumptions of Task (a), **prove** that $\\theta^{best} = \\theta^{true}$, where $\\theta^{best} = \\mathbb{E}[XY]/\\mathbb{E}[X^2]$ is the expected risk minimizer from Task (a) of Problem Sheet 2.\n", "- Hence prove that $L_{bias} = L(\\mathbb{E}_{\\Delta}[\\Theta^*])$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task (c): why the variance loss is a variance\n", "- In the assumptions of Task (a), **prove** that $\\mathbb{E}_{\\Delta} [L_{var}(\\Theta^*)] = \\mathrm{Var}_{\\Delta}[\\Theta^*] \\mathbb{E}[X^2]$, where $\\mathrm{Var}_{\\Delta}$ is the variance with respect to the distribution of $\\Delta$.\n", "\n", "_Hint: start with adding and subtracting_ $\\mathbb{E}_{\\Delta}[\\Theta^*]X$ under the loss, \n", "$$\n", "L_{var}(\\Theta^*) + L_{bias} = L(\\Theta^*) = \\mathbb{E}[(\\Theta^* X - Y)^2] = \\mathbb{E}[((\\Theta^* X - \\mathbb{E}_{\\Delta}[\\Theta^*]X) + (\\mathbb{E}_{\\Delta}[\\Theta^*]X - Y))^2],\n", "$$\n", "_expand the square under the expectation, and finally take_ $\\mathbb{E}_{\\Delta}$ _of everything. Note that_ **precomputed** $\\Delta$ _is independent of_ **new** $(X,Y)$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 0: simulating the \"true\" distribution\n", "- **Copy** over the solution of Task 1 of Problem Sheet 2 (generation of synthetic data)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 1: K-fold cross validation\n", "- **Copy** over functions `features`, `optimise_loss` and `split_data` from Problem Sheet 2.\n", "- **Write a Python function** `cv(X,Y,K,n)` that takes as input arrays X and Y, the number of folds K, and the polynomial degree n, and computes the cross validation loss as follows:\n", " - create an integer array `ind` containing a random shuffle of the $0,\\ldots,N-1$ index sequence, where $N$ is the size of X and Y.\n", " - for each $k=0,\\ldots,K-1$,\n", " - let $X_{train}, Y_{train}, X_{test},Y_{test}$ be training and test arrays produced from `split_data` with K folds applied to shuffled arrays `X[ind]`, `Y[ind]`, such that $X_{test},Y_{test}$ are the $k$-th chunks of `X[ind]`, `Y[ind]`, respectively.\n", " - compute $\\boldsymbol\\theta^{(k)} = \\arg\\min_{\\boldsymbol\\theta\\in\\mathbb{R}^{n+1}}L_{D_{train}}(\\boldsymbol\\theta)$ using `features` to compute the Vandermonde matrix, and `optimise_loss` to solve the equations, defined by $D_{train} = (X_{train}, Y_{train})$.\n", " - compute the $k$-th test loss $L_k:=L_{D_{test}^{(k)}}(\\boldsymbol\\theta^{(k)})$, where $D_{test}^{(k)}:=(X_{test},Y_{test})$.\n", " - Return the cross validation loss $L_{cv} = \\frac{1}{K} \\sum_{k=0}^{K-1}L_k$ " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 2: model selection\n", "- **Vary** $n$ from 0 to 8 and **plot** the $5$-fold cross validation loss for datasets created in Task 0 as a function of $n$.\n", "- **Find** which value of $n$ gives the smallest cross validation loss. Can we expect this value if we know how the datasets were produced in Task 0?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Task 3: convergence of the cross validation loss\n", "- **Vary** the number of samples in Task 0 in a range 30, 100, 300, 1000, 3000, and for each corresponding realisation of X and Y repeat the cross validation loss plotting in Task 2. What you observe as the number of samples gets larger?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" } }, "nbformat": 4, "nbformat_minor": 2 }