Least Squares as Projection 最小二乘法的投影解释

Introduction

The goal is to find the linear model $y = \beta_0 + \beta_1 x$ such that the sum of squared errors between the predicted values and the actual data is minimized.

Linear Model

The form of the linear model is:

$$ y_i = \beta_0 + \beta_1 x_i + \epsilon_i $$

where $y_i$ is the observed value, $x_i$ is the independent variable, $\beta_0$ is the intercept, $\beta_1$ is the slope, and $\epsilon_i$ is the error term.

We wish to find $\beta_0$ and $\beta_1$ such that the predicted values $\hat{y}_i = \beta_0 + \beta_1 x_i$ minimize the sum of squared errors between $\hat{y}_i$ and the observed values $y_i$.

Design Matrix and Observation Vector

To make the problem more convenient, we represent it using vectors and matrices.

Design Matrix

Define the design matrix $\mathbf{X}$ as:

$$ \mathbf{X} = \begin{bmatrix} 1 & 1 \newline 1 & 2 \newline 1 & 3 \newline 1 & 4 \end{bmatrix} $$

The first column contains only 1s, representing the constant term $\beta_0$, and the second column contains the values of the independent variable $x_i$.

Observation Vector

Define the observation vector $\mathbf{y}$ as:

$$ \mathbf{y} = \begin{bmatrix} 2 \newline 3 \newline 5 \newline 7 \end{bmatrix} $$

This vector contains all the observed values $y_i$.

Parameter Vector

Define the parameter vector $\boldsymbol{\beta} = \begin{bmatrix} \beta_0 \newline \beta_1 \end{bmatrix}$.

Sum of Squared Errors Objective Function

In regression, our goal is to find the parameters $\boldsymbol{\beta}$ such that the predicted values $\hat{\mathbf{y}} = \mathbf{X} \boldsymbol{\beta}$ are as close as possible to the observed values $\mathbf{y}$, by minimizing the sum of squared errors (SSE):

$$ S(\beta_0, \beta_1) = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})^T (\mathbf{y} - \mathbf{X} \boldsymbol{\beta}) $$

Deriving the Normal Equation

The key idea of least squares is to find $\boldsymbol{\beta}$ such that the residual $\mathbf{y} - \mathbf{X} \boldsymbol{\beta}$ is minimized. Geometrically, this means that the residual should be orthogonal to the column space of the design matrix $\mathbf{X}$, which leads to the normal equation:

$$ \mathbf{X}^T (\mathbf{y} - \mathbf{X} \hat{\boldsymbol{\beta}}) = 0 $$

Expanding this:

$$ \mathbf{X}^T \mathbf{y} = \mathbf{X}^T \mathbf{X} \hat{\boldsymbol{\beta}} $$

This is the normal equation, which can be solved to find the least squares estimate $\hat{\boldsymbol{\beta}}$.

Solving the Normal Equation

Now, let’s compute the parts of the normal equation.

Compute $\mathbf{X}^T \mathbf{X}$

$$ \mathbf{X}^T \mathbf{X} = \begin{bmatrix} 1 & 1 & 1 & 1 \newline 1 & 2 & 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 1 \newline 1 & 2 \newline 1 & 3 \newline 1 & 4 \end{bmatrix} = \begin{bmatrix} 4 & 10 \newline 10 & 30 \end{bmatrix} $$

Compute $\mathbf{X}^T \mathbf{y}$

$$ \mathbf{X}^T \mathbf{y} = \begin{bmatrix} 1 & 1 & 1 & 1 \newline 1 & 2 & 3 & 4 \end{bmatrix} \begin{bmatrix} 2 \newline 3 \newline 5 \newline 7 \end{bmatrix} = \begin{bmatrix} 17 \newline 50 \end{bmatrix} $$

Solve the Normal Equation

Now we solve the normal equation:

$$ \begin{bmatrix} 4 & 10 \newline 10 & 30 \end{bmatrix} \hat{\boldsymbol{\beta}} = \begin{bmatrix} 17 \newline 50 \end{bmatrix} $$

To solve this, we first compute the inverse of $\mathbf{X}^T \mathbf{X}$:

$$ (\mathbf{X}^T \mathbf{X})^{-1} = \frac{1}{(4)(30) - (10)(10)} \begin{bmatrix} 30 & -10 \newline -10 & 4 \end{bmatrix} = \frac{1}{20} \begin{bmatrix} 30 & -10 \newline -10 & 4 \end{bmatrix} $$

Next, we compute $\hat{\boldsymbol{\beta}}$:

$$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} $$

$$ \hat{\boldsymbol{\beta}} = \frac{1}{20} \begin{bmatrix} 30 & -10 \newline -10 & 4 \end{bmatrix} \begin{bmatrix} 17 \newline 50 \end{bmatrix} = \frac{1}{20} \begin{bmatrix} (30)(17) + (-10)(50) \newline (-10)(17) + (4)(50) \end{bmatrix} $$

$$ \hat{\boldsymbol{\beta}} = \frac{1}{20} \begin{bmatrix} 510 - 500 \newline -170 + 200 \end{bmatrix} = \frac{1}{20} \begin{bmatrix} 10 \newline 30 \end{bmatrix} = \begin{bmatrix} 0.5 \newline 1.5 \end{bmatrix} $$

Least Squares Estimate

By solving the normal equation, we find $\hat{\beta}_0 = 0.5$ and $\hat{\beta}_1 = 1.5$. Thus, the regression model is:

$$ \hat{y} = 0.5 + 1.5x $$

Conclusion

Using the projection approach, we see that the least squares estimate is the projection of the observation vector $\mathbf{y}$ onto the space spanned by the columns of the design matrix $\mathbf{X}$. By solving the normal equation, we found the parameters $\hat{\beta}_0 = 0.5$ and $\hat{\beta}_1 = 1.5$, which minimize the sum of squared errors.