Wednesday, March 12, 2008

Final Exam Study Guide - Practice Questions - Part 2

In this post, I'll go over the answers to the "regular" questions from the last quarter's final. I'll also note which chapter the question is from.

Question 1 (Chapter 12): You would like to estimate the income of a person based on his age. The following data shows the yearly income (in $1,000) and age of a sample of seven individuals.
Income (in $1,000) Age
20 18
24 20
24 23
25 34
26 24
27 27
34 27
a. Develop the least squares regression equation.
b. Estimate the yearly income of a 30-year-old individual.
Answer:
a. In order to calculate b0 and b1, we need to first calculate the mean of X (age) and Y (income). For xbar, I calculated 24.71 and for ybar, I got 25.71. To calculate b1, we need to calculate xi-xbar and yi-ybar for each i:
Income Age xi-xbar yi-ybar (xi-xbar)(yi-ybar) (xi-xbar)2

20 18 -6.71 -5.71 38.31 45.02
24 20 -4.71 -1.71 8.05 22.18
24 23 -1.71 -1.71 2.92 2.92
25 34 9.29 -0.71 -6.60 86.30
26 24 -0.71 0.29 -0.21 0.50
27 27 2.29 1.29 2.95 5.24
34 27 2.29 8.29 18.98 5.24
The sum of the (xi-xbar)(yi-ybar) is 64.4. The sum of the (xi-xbar)2 is 167.4. Therefore, b1 is 64.4/167.4 = 0.38.
We can also calculate b0 = ybar - b1xbar = 25.71 - (0.38)(24.71) = 16.2.
Therefore, the regression equation is y = 16.2 + 0.38x.

b. Use the equation to estimate y for x=30:
y = 16.2 + 0.38(30) = 27.6, which is $27,600 annual income.
Question 2 (Chapter 12): Below you are given a partial computer output based on a sample of 8 observations, relating an independent variable (x) and a dependent variable (y).
              Coefficient Standard Error
Intercept 13.251 10.77
X 0.803 0.385

Analysis of Variance
SOURCE SS
Regression
Error (Residual) 41.674
Total 71.875
a. Develop the estimated regression line.
b. At α = 0.05, test for the significance of the slope.
c. Determine the coefficient of determination (R2).
Answer:
a. This one's a lot easier than #1. No calculations necessary, just the ability to pull b0 and b1 out of the computer output. They're the coefficients of the intercept and X. So the regression equation becomes:
y = 13.251 + 0.803x

b. The t score for the slope is t = b1/sb1.
From part a, we know that b1 = 0.803.
sb1 is given in the computer output as the standard error of x = 0.385.
Therefore, t = 0.803/0.385 = 2.086.
Looking at the t distribution table for n-2=6 and α/2=0.025, we find a critical t value of 2.447. Since the t score of 2.086 is less than 2.447, we do not reject the null hypothesis that there is no linear relationship.

c. r2 = SSR/SST. But SSR was conveniently removed from the computer output. We need to calculate it from SSR = SST-SSE = 71.875-41.674 = 30.201.
Therefore, r2 = 30.201/71.875 = 0.42.
Question 3 (Chapter 9): A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126.
a. Formulate the hypotheses that can be used to determine whether the mean of all account balances is significantly different from $1,150.
b. Let α = .05. Using the critical value approach what is your conclusion?
Answer:
a. Since we want to know if the mean is "significantly different" from $1,150, the null hypothesis is that it is $1,150.
H0: μ = 1150
H1: μ ≠ 1150

b. Since we don't have the population standard deviation, use the t test statistic.
t = (xbar-μ0)/(s/√n)
= (1200-1150)/(126/√81)
= 50/14
= 3.57
The critical value for t for 80 degrees of freedom and &alpha/2=0.025 is 1.990.
Since the t-value=3.57 is greater than the critical value of 1.990, we reject H0 and conclude that the mean is significantly different from $1,150.
Question 4 (Chapter 8): A statistician selected a sample of 16 accounts receivable and determined the mean of the sample to be $5,000 with a sample standard deviation of $400. He reported that the sample information indicated the mean of the population ranges from $4,739.80 to $5,260.20. He neglected to report what confidence level (1-a) he had used. Based on the above information, determine the confidence level that was used.
Answer: The statistician is reporting a confidence interval of 5000 ± 260.20. He only mentions the sample standard deviation (not the population std dev), so he must be using the t-distribution and the formula: xbar ± tn-1, α/2(s/√n).

So we have:
260.2 = t(s/√n)
260.2 = t (400/√16)
260.2 = 100t
t = 2.602

We look to the t distribution table and find that t15, α/2 = 2.602 is true for α/2 = 0.01. So α = 0.02 and the confidence level is 1-0.02 = 0.98 = 98%.
Question 5 (Chapter 12): The director of graduate studies at a college of business would like to predict the grade point index (GPI) of students in an MBA program based on their GMAT scores. A sample of 20 students is selected. The result of the regression is summarized in the following Minitab output.
Regression Analysis: GPI versus GMAT

The regression equation is
GPI = 0.300 + 0.00487 GMAT

Predictor Coef SE Coef T
Constant 0.3003 0.3616 0.83
GMAT 0.0048702 [ N ] [ M ]

S = 0.155870 R-Sq = 79.8%

Analysis of Variance

Source DF SS MS F P
Regression 1 1.7257 1.7257 71.03 0.000
Residual Error 18 0.4373 0.0243
Total 19 2.1631
a) Given that Σ(Xi-xbar)2 = 72757.2 , where X = GMAT, compute N.
b) Compute M and interpret the result. In particular do we reject the underlying hypothesis (which hypothesis) or not?
Answer:
a. N is what we usually call the standard error of the slope, sb1. (This is the hardest part of the problem - figuring out what's missing in the Minitab output.) From the formula sheet, we know:
sb1 = SXY/√SSX

We're given SSX, but we need to calculate SXY from the formula:
SXY = √(SSE/(n-2)).

We have SSE from the output: SSE = 0.4373. So,
SXY = √(0.4373/18) = 0.156

Therefore,
sb1 = 0.156/√72757.2 = 0.156/269.7 = 0.00058

b. M is the t-score for the slope which is given by:
t = b1/sb1
= 0.0048702/0.00058
= 8.4

The critical value for t for 18 degrees of freedom and α/2=0.005 is 2.878. Therefore, since our t-score is greater than the critical t-value, we would reject the null hypothesis, H0: μ=0.

No comments: