Sunday, March 2, 2008

Lecture 8 - Residual Analysis - Checking the Normality Assumption

The next assumption in the LINE mnemonic after Linearity is Independence of Errors. We skipped that one momentarily because it's a bit more complex than the others. So we saved it for last. In the meantime, we looked at the next assumption: Normality of Error.

Checking the Normality Assumption
This assumption states that the error in the observation is distributed normally at each x-value. A larger error is less likely than a smaller error and the distribution of errors at any x follows the normal distribution.

Although we typically only have one observation at each x, if we assume that the distribution of the errors is the same at each x, we can simply plot all the errors (residuals) and check if they follow the normal distribution. We do this by running a normal probability plot of the residuals. Fortunately for us, Minitab has a built-in normal probability plot function.

Checking Normality Using Minitab
1. Open up your data worksheet. As usual, we'll use the site.mtw file for our example.
2. Select Stat-Regression-Regression from the menu bar.
3. Put Annual Sales in the Response box since it's the dependent (response) variable and put Square Feet in the Predictors box since it's the independent (predictor) variable.
4. Click the Graph button and under Residual Plots, check the Normal plot of residuals checkbox.
5. Click OK in the Graphs and the Regression dialogs.

Minitab creates the normal probability plot of the residuals. The y-axis of this graph is adjusted so that if the data are distributed normally, they will fall on a straight line on the graph. Minitab even draws a line through the residuals for us (presumably using the method of least-squares).

Drawing a conclusion from the graph
Review this graph and ask yourself: Do the residual points fall more-or-less on a straight line in the normal probability plot? If they do, you can conclude that the errors are distributed normally and the normality of errors assumption is valid. In our example, the normality plot of the residuals are pretty much linear, but I would be concerned about the upward trend at the far right end of the graph. (Click the graph to see it in more detail.)

No comments: