Anyone who runs linear regression on two-variables nowadays probably clicks a button or writes some code in a command line in some application—R, S plus, Gretl, or Microsoft Excel to name a few—and lets the method of ordinary least squares (OLS) do its magic. Seldom do we remember why we can use OLS to approximate our dependent variable. This post today discusses the why.
The math behind the following was made available to me through Damodar N. Gujarati’s .
First, some background: Let variables Y and X be given such that their population regression function (PRF) is of the form , where is the y-intercept of the line and is the slope. For each value of , is found using the PRF for these two variables, That is to say, the value of is its expected value conditioned on (positioned on the line), plus its residual value (the distance from the population line to the actual value).
In reality, we do not deal with entire populations of data. Rather, we spend our time working with samples of populations. Our task then is to estimate the PRF using the sample regression function (SRF), , where is an estmate of , is an estimate of , and is an estimate of .
Similarly, can be found: , where is the residual value between the SRF and the value of .
Notice that . Thus our goal is to make as small as possible to come as close as possible to the actual value of . It turns out however that we cannot simply make the sum of all residuals as small as possible. But, we gain a pretty good estimate of the PRF if is as small as possible. Realize that . Therefore, ; that is, is some function of and . So informally, the key to getting a good SRF lies in choosing good values for and .
For reasons omitted from this post (which can be found in ), the and values we choose are: . But how do we know these are the best estimations that will yield the best SRF? This is the “why” aspect I want to illustrate today. The Gauss-Markov theorem answers the question.
Gauss-Markov Theorem: By respecting the assumptions below, the estimators and , in the class of unbiased linear estimators, are best linear unbiased estimators of and respectively. That is, they have minimium variance.
The assumptions are as follows:
- The regression model is linear in the parameters.
- values are fixed in repeated sampling.
- Given the value of , the expected value of the random disturbance term is zero: .
- Given the value of , there is homoscedasticity of : , where var is variance.
- There is no autocorrelation between the disturbances: , where cov is covariance.
- There is zero covariance between and : .
- The number of observations must be greater than the number of the parameters to be estimated.
- There is variability in the values of : .
- The regression model is correctly specified—without specification bias or error.
- There is no perfect linear relationships among the explanatory variables (multicollinearity).
Proof: I illustrate the case (for the case follows similar reasoning). That is, I show is linear, unbiased, and has minimum variance. This proof will be broken into multiple posts. I prove linearity today:
Define , where . It follows then that, . Clearly, is linear since it is a linear function of .
EDIT: I realize that the reason as to why is linear is actually not so clear. I wish to explain this a little further: We can determine what the value of will be, since we know and . But, is assumed to be random, thus together, we can think of as a linear function of .
I will continue this proof in my next post!
 Familiarity with the basics of linear regression is assumed for this post.
 Gujarati, Damodar N. Basic Econometrics, 4th ed. McGraw-Hill, 2003.