Properties of Least Squares Estimate \(\hat{\beta}\)

  1. Expectation of \(\hat{\beta}\)

By the matrix form of the simple linear regression model, we have the least squares estimates \(\hat{\beta}\)

\[ \hat{\beta} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \mathbf{Y} =(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\mathbf{X}\beta+\varepsilon)=\beta+(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \varepsilon\] Hence \[\mathrm{E}\hat{\beta}=\beta +\mathrm{E}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \varepsilon)=\beta\] Hence the lesast squares estimate \(\hat{\beta}\) is the unbiased estimate of \(\beta\) with \(\mathrm{E}\hat{\beta}_0=\beta_0\) and \(\mathrm{E}\hat{\beta}_1=\beta_1\).

 

  1. variance of \(\hat{\beta}\)

For the simple regression model \[\begin{eqnarray*} \mbox{cov}(\hat{\beta}) &\hat{=}&\left(\begin{array}{cc} \mbox{var}(\hat{\beta}_0) & \mbox{cov}(\hat{\beta_0},\hat{\beta_1}) \\ \mbox{cov}(\hat{\beta_0},\hat{\beta_1}) & \mbox{var}(\hat{\beta}_1) \end{array}\right) =\left(\begin{array}{cc} \mbox{E}(\hat{\beta}_0-\mbox{E}\hat{\beta_0})^2 & \mbox{E}(\hat{\beta}_0-\mbox{E}\hat{\beta_0})(\hat{\beta}_1-\mbox{E}\hat{\beta_1}) \\ \mbox{E}(\hat{\beta}_0-\mbox{E}\hat{\beta_0})(\hat{\beta}_1-\mbox{E}\hat{\beta_1}) & \mbox{E}(\hat{\beta}_1-\mbox{E}\hat{\beta_1})^2 \end{array}\right)\\ &=& \mbox{E}(\hat{\beta}-\mbox{E}\hat{\beta})(\hat{\beta}-\mbox{E}\hat{\beta})^T=\mathrm{E}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \varepsilon)(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \varepsilon)^T\\ &=& \mbox{E} (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \varepsilon \varepsilon^T \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \\ &=& (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T(\mbox{E}\{\varepsilon \varepsilon^T\}) \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \\ &=& \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} \end{eqnarray*}\] Hence \[\begin{eqnarray*} \mbox{var}(\hat{\beta}_0)&=&\sigma^2 \{(\mathbf{X}^T\mathbf{X})^{-1}\}_{11} =\frac{\sigma^2\sum\limits_{i=1}^n x^2_i}{n\sum\limits_{i=1}^n x_i^2-\left(\sum\limits_{i=1}^n x_i\right)^2} \\ &=& \sigma^2\left(\frac{\sum\limits_{i=1}^n x^2_i-n\bar{x}^2+n\bar{x}^2}{n\sum\limits_{i=1}^n x_i^2-n^2 \bar{x}^2}\right)=\sigma^2\left(\frac{1}{n}+\frac{\bar{x}^2}{\sum\limits_{i=1}^n(x_i-\bar{x})^2}\right) \\ &=& \sigma^2\left(\frac{1}{n}+\frac{\bar{x}^2}{C_{XX}}\right), \end{eqnarray*}\] \[ \mbox{var}(\hat{\beta}_1)=\sigma^2 \{(\mathbf{X}^T\mathbf{X})^{-1}\}_{22}=\frac{\sigma^2\cdot n}{n\sum\limits_{i=1}^n x_i^2-\left(\sum\limits_{i=1}^n x_i\right)^2}=\frac{\sigma^2}{C_{XX}} \]

Notice that \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are weighted linear combination of \(\varepsilon_1,\ldots,\varepsilon_n\). If \(\varepsilon_1,\ldots,\varepsilon_n\) are normal random variables, then \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are also normal random variables. Their mean and variance determine their distribution function.  

  1. Distribution of \(\frac{n-2}{\sigma^2}\hat{\sigma}^2=\frac{SSE}{\sigma^2}\)

\[SSE = \sum\limits_{i=1}^n (Y_i-\hat{Y}_i)^2 = \varepsilon^T (I_{n\times n}-P_{\mathbf{x}})\varepsilon \] Since the rank of \(P_{\mathbf{x}}\) is 2, by the properties of Projection matrix \[I_{n\times n}-P_{\mathbf{x}}=Q^T\left[\begin{array}{cc} I_{(n-2)\times (n-2)} & \mathbf{0} \\ \mathbf{0} & \mathbf{0}_{2\times 2} \end{array} \right]Q\] where \(Q\) is an orthogonal matrix. Define \(\varepsilon^T Q^T= (\epsilon_1, \ldots, \epsilon_n)^T \hat{=}\epsilon^T\). We know that \[\mathrm{E} \epsilon= \mathbf{0}_{n\times 1}, \mbox{cov}(\epsilon)=Q^T\mathrm{cov}(\varepsilon)Q=Q^T\cdot \sigma^2 I_{n\times n}\cdot Q =\sigma^2 Q^TQ=\sigma^2 I_{n\times n} \] Hence \(\epsilon_1,\ldots,\epsilon_n\) are independent mean zero normal random variable with variance \(\sigma^2\) and \[\frac{SSE}{\sigma^2}=\frac{1}{\sigma^2}\varepsilon^T (I_{n\times n}-P_{\mathbf{x}})\varepsilon =\frac{1}{\sigma^2}\varepsilon^T Q^T\left[\begin{array}{cc} I_{(n-2)\times (n-2)} & \mathbf{0} \\ \mathbf{0} & \mathbf{0}_{2\times 2} \end{array} \right]Q\varepsilon =\frac{1}{\sigma^2}\sum\limits_{i=1}^{n-2}\epsilon^2_i=\sum\limits_{i=1}^{n-2} \left(\frac{\epsilon}{\sigma} \right)^2 =\sum\limits_{i=1}^{n-2} \epsilon_{\ast i}^2 \] where \(\epsilon_{i *}, i=1,\ldots, n-2\) are independent normal random variables, and by the definition \(\chi^2\) distribution, we have \[\frac{n-2}{\sigma^2} \hat{\sigma}^2=\frac{SSE}{\sigma^2} \sim \chi^2_{n-2}\]

For multiple regression models, by similar approach, we have \[ \frac{n-p-1}{\sigma^2}\hat{\sigma}^2=\frac{SSE}{n-p-1} \sim \chi^2_{n-p-1}\]

  4. \(\hat{\beta}\) and \(\hat{\sigma}^2\) are independent if \(\varepsilon_i, i=1,\ldots, n\) are i.i.d follow normal distribution