GSB420 - Business Statistics

Sunday, November 6, 2011

The New GSB420 Blog

After several months, even years, of inactivity, I've decided to revive and revise this blog and restructure it in a logical sequence, rather than continue with the reverse chronological order which is the standard format for blogs. My goal is to make it easier for readers to find the material that they're looking for.

Therefore, this will be the top-most entry in the blog and it will contain a table of contents with links to the articles in the logical order in which you will probably want to learn them - starting from basic principles and working towards more complex concepts. It will take me some time to complete this restructructing, so please be patient.

I've also decided to update the individual entries in the blog and remove references that are specific to the class when I took it in 2008. Those comments are no longer relevant.

I hope you enjoy this blog and benefit from it. If you have any questions, please feel free to email me at eliezerappleton at gmail dot com.

Lecture 1
Basic Definitions
Presentational Statistics
Descriptive Statistics
Quiz #1
Lecture 1 - Additional Notes and Research

Lecture 2
Using Standard Deviation
Shape of the Distribution
Correlating Two Sets of Data
More on Covariance and Correlation Coefficient
Sample vs. Population
Basic Probability
Conditional Probability
Independency
Pop quiz #2
Post-lecture 2 notes and research

Lecture 3
Bayes's Theorem
Counting Rules
Discrete Random Variables
Binomial Distribution
Poisson Distribution

Lecture 4
Continuous Random Variables
The Normal Distribution
The Standard Normal Distribution

Lecture 5
The Standard Normal Distribution (continued)
Checking for Normality
Sampling Distributions
Confidence Interval Estimation

Lecture 6
Confidence Interval for the Mean with Known Std Dev
Confidence Interval for the Mean with Unknown Std Dev
Confidence Interval for the Mean - Examples and Minitab
Determining Sample Size
Hypothesis Testing
One-Tail Hypothesis Testing

Lecture 7 - Linear Regression
Simple Linear Regression
The Least Squares Method
Assumptions in the Method of Least Squares
Coefficient of Determination

Lecture 8 - Residual Analysis
Definition of Residual Analysis
Checking Linearity
Checking the Normality Assumption
Checking the Equal Variance Assumption
Checking Independence of Errors
Inferences About the Regression Slope - Part 1
Inferences About the Regression Slope - Part 2
Confidence Interval for Ŷ

Friday, March 28, 2008

Next quarter (Spring 2008) I'll be taking ECO 509 - Business Conditions Analysis (aka Macroeconomics) with Professor Jaejoon Woo (Wed night section).

You can find the blog for ECO 509 at http://eco509.blogspot.com.

Friday, March 14, 2008

Final Exam Recap

Well, it's finally over! Here's a recap of my random thoughts on the final exam:

In general, the final was harder than I expected. I'm pretty sure I did well, but it was definitely harder than the midterm and harder than I thought it would be.
He hit us with 10 straight questions from Chapter 12 right out of the block! I expected to ease into it. The previous quarter's final was pretty linear - starting at chapter 7, then chapter 8, etc and not hitting chapter 12 until the last questions. I knew chapter 12 would be a big chunk of the exam, but I didn't expect him to lead off with it.
I think we were all pretty stumped by that one question (was it 9 or 10?) that had us calculate the intercept, b₀. I kept coming up with 100 for an answer, but it wasn't one of the choices. I saw Azzam go up and ask about something and figured it might be that. So I went up and asked also. I think we all breathed a sigh of relief when he made the change to the last two choices. BTW, I think he could have just changed Σ Y to be 100 and that would have made the intercept 40, which was one of the choices.
Probability of Type I errors? Sheesh! I didn't see that coming. The answer is that it's alpha, α.
I was surprised at the "regular" question that had us work out the regression from the raw data. I'm almost positive I made some arithmetic mistakes in calculating the Σ(X_i-Xbar)(Y_i-Ybar) or Σ(X_i-Xbar)² or one of the other calculations.
My answer for the "west nile" question was that we do not reject the null hypothesis, H₀ that the average # of cases is different than 3.
On the last question, part A, you had to assume (or somehow know through ESP or divine vision) that the level of confidence to use is 95%. You can calculate the t-score easily enough (I think I got something like 2.414), but draw any conclusions, you need to calculate the critical value for t_{α/2, n-1} which requires the level of confidence.
I knew there would be a Durbin-Watson question on the exam! My answer for that one was that there was not evidence of autocorrelation since the DW stat was greater than d_U and less than 2+d_L. I'm not sure I was using the right d_U because I wasn't sure if I should use α or α/2 on the DW table. I used α (which I think was 0.05).
As predicted, there were a few questions with Minitab output. No big surprises there.
In at least 2 of the questions (one multiple choice and one "regular"), he gave us the variance rather than the standard deviation. Tricky! I almost fell for that one.
One question asked us to determine sample size, given a confidence level, margin of error and standard deviation (or maybe variance). Using the formula, you calculate n=74.3 (something like that - not a round number). You had to know to round up, not truncate the decimal.
Higher confidence levels need wider intervals. You had to figure he would ask about that.
If p is low, H₀ must go! You could use that to answer one of the multiple choice question on p-value approach in hypothesis testing.

All in all, not a terrible test. Just harder than last quarter's final, IMHO. I'm anxious to see how I did. I think he said he may have them graded by Monday. The multiple choice is easy to grade. I think he gives the "regular" questions to a Teaching Assistant. HW5 grades have not been posted to Blackboard yet.

Thursday, March 13, 2008

Final Exam Study Guide - Last Minute Notes

Just a couple of last minute thoughts:

There were no example problems that used the Durbin-Watson statistic. That doesn't mean it won't be on the exam! The D-W table is part of the formula sheet, so I'm expecting a question on it.
Remember that if DW is less than d_L, there's autocorrelation. If it's between d_L and d_U, it's inconclusive. If it's between d_U and 2+d_L, there's no autocorrelation. I doubt we'll be asked about the range from 2-4.
Review how to read those Minitab outputs! There's bound to be at least one on the exam. Remember that in Minitab output, SS stands from Sum of Squares. S stands for S_YX. The Coefficient of the Intercept is b₀. The Coefficient of the other (independent) variable is b₁. SE stands for standard error.
There weren't any practice problems on the confidence interval for mean/individual Y. I would expect one of those since the formulas are on the sheet. He'll probably give us h_i and S_XY. Remember to use n-2 when looking up the value for t in this case.
Remember that most of the answers can be derived from the data in the question and the formula sheet. You're not really expected to memorize very much. If you can't figure it out, look at the formula sheet.
Remember to bring a copy of the formula sheet, a calculator and a #2 pencil. Don't laugh! I forgot a pencil for the midterm and ran out to Walgreen's a half hour before the exam.

Wednesday, March 12, 2008

Final Exam Study Guide - Practice Questions - Part 2

In this post, I'll go over the answers to the "regular" questions from the last quarter's final. I'll also note which chapter the question is from.

Question 1 (Chapter 12): You would like to estimate the income of a person based on his age. The following data shows the yearly income (in $1,000) and age of a sample of seven individuals.
Income (in $1,000) Age
20                 18
24                 20
24                 23
25                 34
26                 24
27                 27
34                 27
a. Develop the least squares regression equation.
b. Estimate the yearly income of a 30-year-old individual.

Answer:
a. In order to calculate b₀ and b₁, we need to first calculate the mean of X (age) and Y (income). For xbar, I calculated 24.71 and for ybar, I got 25.71. To calculate b₁, we need to calculate x_i-xbar and y_i-ybar for each i:

Income Age x_i-xbar y_i-ybar (x_i-xbar)(y_i-ybar) (x_i-xbar)²

20     18  -6.71   -5.71        38.31          45.02
24     20  -4.71   -1.71         8.05          22.18
24     23  -1.71   -1.71         2.92           2.92
25     34   9.29   -0.71        -6.60          86.30
26     24  -0.71    0.29        -0.21           0.50
27     27   2.29    1.29         2.95           5.24
34     27   2.29    8.29        18.98           5.24

The sum of the (x_i-xbar)(y_i-ybar) is 64.4. The sum of the (x_i-xbar)² is 167.4. Therefore, b₁ is 64.4/167.4 = 0.38.
We can also calculate b₀ = ybar - b₁xbar = 25.71 - (0.38)(24.71) = 16.2.
Therefore, the regression equation is y = 16.2 + 0.38x.

b. Use the equation to estimate y for x=30:
y = 16.2 + 0.38(30) = 27.6, which is $27,600 annual income.

Question 2 (Chapter 12): Below you are given a partial computer output based on a sample of 8 observations, relating an independent variable (x) and a dependent variable (y).
              Coefficient Standard Error
Intercept     13.251      10.77
X             0.803       0.385

Analysis of Variance
SOURCE            SS
Regression
Error (Residual)  41.674
Total             71.875
a. Develop the estimated regression line.
b. At α = 0.05, test for the significance of the slope.
c. Determine the coefficient of determination (R²).

Answer:
a. This one's a lot easier than #1. No calculations necessary, just the ability to pull b₀ and b₁ out of the computer output. They're the coefficients of the intercept and X. So the regression equation becomes:
y = 13.251 + 0.803x

b. The t score for the slope is t = b₁/s_b₁.
From part a, we know that b₁ = 0.803.
s_b₁ is given in the computer output as the standard error of x = 0.385.
Therefore, t = 0.803/0.385 = 2.086.
Looking at the t distribution table for n-2=6 and α/2=0.025, we find a critical t value of 2.447. Since the t score of 2.086 is less than 2.447, we do not reject the null hypothesis that there is no linear relationship.

c. r² = SSR/SST. But SSR was conveniently removed from the computer output. We need to calculate it from SSR = SST-SSE = 71.875-41.674 = 30.201.
Therefore, r² = 30.201/71.875 = 0.42.

Question 3 (Chapter 9): A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126.
a. Formulate the hypotheses that can be used to determine whether the mean of all account balances is significantly different from $1,150.
b. Let α = .05. Using the critical value approach what is your conclusion?

Answer:
a. Since we want to know if the mean is "significantly different" from $1,150, the null hypothesis is that it is $1,150.
H₀: μ = 1150
H₁: μ ≠ 1150

b. Since we don't have the population standard deviation, use the t test statistic.
t = (xbar-μ₀)/(s/√n)
= (1200-1150)/(126/√81)
= 50/14
= 3.57
The critical value for t for 80 degrees of freedom and &alpha/2=0.025 is 1.990.
Since the t-value=3.57 is greater than the critical value of 1.990, we reject H₀ and conclude that the mean is significantly different from $1,150.

Question 4 (Chapter 8): A statistician selected a sample of 16 accounts receivable and determined the mean of the sample to be $5,000 with a sample standard deviation of $400. He reported that the sample information indicated the mean of the population ranges from $4,739.80 to $5,260.20. He neglected to report what confidence level (1-a) he had used. Based on the above information, determine the confidence level that was used.

Answer: The statistician is reporting a confidence interval of 5000 ± 260.20. He only mentions the sample standard deviation (not the population std dev), so he must be using the t-distribution and the formula: xbar ± t_{n-1, α/2}(s/√n).

So we have:
260.2 = t(s/√n)
260.2 = t (400/√16)
260.2 = 100t
t = 2.602

We look to the t distribution table and find that t_{15, α/2} = 2.602 is true for α/2 = 0.01. So α = 0.02 and the confidence level is 1-0.02 = 0.98 = 98%.

Question 5 (Chapter 12): The director of graduate studies at a college of business would like to predict the grade point index (GPI) of students in an MBA program based on their GMAT scores. A sample of 20 students is selected. The result of the regression is summarized in the following Minitab output.
Regression Analysis: GPI versus GMAT

The regression equation is
GPI = 0.300 + 0.00487 GMAT

Predictor         Coef         SE Coef         T
Constant        0.3003          0.3616      0.83
GMAT         0.0048702           [ N ]     [ M ]

S = 0.155870 R-Sq = 79.8%

Analysis of Variance

Source             DF         SS         MS         F         P
Regression          1     1.7257     1.7257     71.03     0.000
Residual Error     18     0.4373     0.0243
Total              19     2.1631
a) Given that Σ(X_i-xbar)² = 72757.2 , where X = GMAT, compute N.
b) Compute M and interpret the result. In particular do we reject the underlying hypothesis (which hypothesis) or not?

Answer:
a. N is what we usually call the standard error of the slope, s_b₁. (This is the hardest part of the problem - figuring out what's missing in the Minitab output.) From the formula sheet, we know:
s_b₁ = S_XY/√SSX

We're given SSX, but we need to calculate S_XY from the formula:
S_XY = √(SSE/(n-2)).

We have SSE from the output: SSE = 0.4373. So,
S_XY = √(0.4373/18) = 0.156

Therefore,
s_b₁ = 0.156/√72757.2 = 0.156/269.7 = 0.00058

b. M is the t-score for the slope which is given by:
t = b₁/s_b₁
= 0.0048702/0.00058
= 8.4

The critical value for t for 18 degrees of freedom and α/2=0.005 is 2.878. Therefore, since our t-score is greater than the critical t-value, we would reject the null hypothesis, H₀: μ=0.

Monday, March 10, 2008

Final Exam Study Guide - Practice Questions

Question 1: A population has a standard deviation of 16. If a sample of size 64 is selected from this population, what is the probability that the sample mean will be within ±2 of the population mean?
a. 0.6826
b. 0.3413
c. -0.6826
d. Since the mean is not given, there is no answer to this question.

Answer:
We need to calculate the z-score for the ±2 interval. In order to do that, we need the standard error of the mean, σ/√n = 16/sqrt(64) = 2.
So when we're asked for the probability that the sample mean is ±2 from the population mean, it's asking for the probability of the mean being within 1 standard error. Even without looking it up in the table, we know that the answer must be A - both from our experience that 68% of the data fall within 1 std dev, and because the other answers are unreasonable.

Question 2: The fact that the sampling distribution of sample means can be approximated by a normal probability distribution whenever the sample size is large is based on the
a. central limit theorem
b. fact that we have tables of areas for the normal distribution
c. assumption that the population has a normal distribution
d. None of these alternatives is correct.

Answer: There's not much to say here. The statement is essentially the definition of the Central Limit Theorem, see page 213. The sample size must be approximately 30 for this to hold for all distributions.

Question 3: A population has a mean of 53 and a standard deviation of 21. A sample of 49 observations will be taken. The probability that the sample mean will be greater than
57.95 is
a. 0
b. .0495
c. .4505
d. .9505

Answer: Find the z-score of this mean: (57.95-53)/(21/sqrt(49)) = 4.95/3 = 1.65. So the question becomes: What's the probability of an observation being more than 1.65 std devs from the mean. You know it can't be much. It's greater than 0. Answer B is the only logical one. Of course, when we go to the cumulative normal distribution table, we find that 1.65 has 0.9505 area, so the area to the right of 1.65 is 0.0495.

Question 4: Suppose a sample of n = 50 items is drawn from a population of manufactured products and the weight, X, of each item is recorded. Prior experience has shown that the weight has a probability distribution with mu = 6 ounces and sigma = 2.5 ounces. Which of the following is true about the sampling distribution of the sample mean if a sample of size 50 is selected?
a) The mean of the sampling distribution is 6 ounces.
b) The standard deviation of the sampling distribution is 2.5 ounces.
c) The shape of the sample distribution is approximately normal.
d) All of the above are correct.

Answer:
A is true. Although when you take a single sample, its mean is not necessarily equal to the population mean, nonetheless, the mean of the sampling distribution (of all samples) will tend toward the population mean as n increases.
B is also not necessarily true. The standard deviation of the sample is not necessarily equal to the population standard deviation. It is usually smaller by a factor of 1/&radicn.
C is not true. The central limit theorem tells us that when the sample size is ≥30, the distribution of the sample mean is approximately normal. However, the shape of the sample distribution itself is not necessarily normal.
D is clearly not true since B and C are not true.

Question 5: The owner of a fish market has an assistant who has determined that the weights of catfish are normally distributed, with mean of 3.2 pounds and standard deviation of 0.8 pound. If a sample of 25 fish yields a mean of 3.6 pounds, what is the Z-score for this observation?
a) 18.750
b) 2.500
c) 1.875
d) 0.750

Answer:
When evaluating the sample mean,
z = (xbar-μ)/(σ/√n) Note: This formula is not on the sheet.
= (3.6-3.2)/(0.8/√25)
= 0.4/0.16
= 2.5
So, answer B is correct.

Question 6: A 95% confidence interval for a population mean is determined to be 100 to 120. If the confidence coefficient is reduced to 0.90, the interval for mu
a. becomes narrower
b. becomes wider
c. does not change
d. becomes 0.1

Answer: No calculations are necessary here. It's completely conceptual. The general rule is: A higher level of confidence requires a wider confidence interval. Therefore, if we reduce the level of confidence to 90%, the confidence interval can be narrower. Answer A is the correct answer.

Exhibit 8-3
The manager of a grocery store has taken a random sample of 100 customers. The average length of time it took these 100 customers to check out was 3.0 minutes. It is known that the standard deviation of the population of checkout times is 1 minute.

Question 7: Refer to Exhibit 8-3. The standard error of the mean equals
a. 0.001
b. 0.010
c. 0.100
d. 1.000

Answer: The standard error of the mean is:
σ/√n = 1/√100 = 1/10 = 0.1
The correct answer is C.

Question 8: Refer to Exhibit 8-3. With a .95 probability, the sample mean will provide a margin of error of
a. 1.96
b. 0.10
c. 0.196
d. 1.64

Answer: The margin of error is the plus/minus term in the confidence interval. In this case, since we know the population standard deviation, the margin of error term is:
z_α/2(σ/√n)
From the z-table, we find that z_0.025 = 1.96
Therefore,
margin of error, E = 1.96(1/√100) = 0.196
Answer C is correct.

Question 12: When the following hypotheses are being tested at a level of significance of α
H₀: μ ≥ 100 H_a: μ < 100
the null hypothesis will be rejected if the p-value is
a. < α
b. > α
c. > α/2
d. < α/2

Answer: First, we notice that this is a one-tailed hypothesis test. The rejection region is entirely to one side of the mean.
Our general rule is If p is low, H₀ must go. So, if p is less than α, we reject the null hypothesis. Answer A is correct.

Question 13: In order to test the following hypotheses at an α level of significance
H₀: μ ≤ 100 H_a: μ > 100
the null hypothesis will be rejected if the test statistic Z is
a. > Z_α
b. < Z_α
c. < -Z_α
d. > Z_α/2

Answer: We've got a one-tailed hypothesis again. This time, the rejection region is in the right-hand tail. Therefore, we reject H₀ if the test statistic is more extreme (i.e. further to the right) than the Z_α. So answer A is correct.

Question 14: Your investment executive claims that the average yearly rate of return on the stocks she recommends is more than 10.0%. She takes a sample to prove her claim. The correct set of hypotheses is
a. H₀: μ = 10.0% H_a: μ ≠ 10.0%
b. H₀: μ ≤ 10.0% H_a: μ > 10.0%
c. H₀: μ ≥ 10.0% H_a: μ < 10.0%

Answer: I don't really like this question because it sounds like she's making a claim based on a status quo of the return rate being > 10%. Since the null hypothesis is about the status quo, I'm tempted to pick answer C. Unfortunately, that's not the right way to look at it in this case.

Rather, since her claim is that the return is greater than 10%, which does not contain an equal sign, that must be the alternative hypothesis, H_a. Therefore, the null hypothesis, H₀, is μ ≤ 10%. Answer B is correct.

Question 15: A soft drink filling machine, when in perfect adjustment, fills the bottles with 12 ounces of soft drink. Any over filling or under filling results in the shutdown and readjustment of the machine. To determine whether or not the machine is properly adjusted, the correct set of hypotheses is
a. H₀: μ > 12 H_a: μ ≤ 12
b. H₀: μ ≤ 12 H_a: μ > 12
c. H₀: μ = 12 H_a: μ ≠ 12

Answer: This one's a gimme. The null hypothesis H₀ is that the machine is continuing to work properly and μ = 12. The alternative hypothesis, H_a is that it is filling with some other mean volume and μ ≠ 12. Correct answer is C.

Question 16: A two-tailed test is performed at 95% confidence. The p-value is determined to be 0.11.
The null hypothesis
a. must be rejected
b. should not be rejected
c. could be rejected, depending on the sample size
d. has been designed incorrectly

Answer: Since the level of significance is 5%, the combined area of the two-tailed rejection region is 0.05. I.e., 0.025 in either tail. The p-value is 0.11. We remember our mantra: If p is low, H₀ must go! But p is not lower than 0.05. Therefore, we do not reject H₀ and answer B is correct.

Question 17: For a one-tailed hypothesis test (upper tail) the p-value is computed to be 0.034. If the test is being conducted at 95% confidence, the null hypothesis
a. could be rejected or not rejected depending on the sample size
b. could be rejected or not rejected depending on the value of the mean of the sample
c. is not rejected
d. is rejected

Answer: Level of significance is 5% = 0.05. p is 0.034. Repeat after me: If p is low, H₀ must go! In this case, yes, p is lower than the level of significance and therefore H₀ is rejected. Answer D is correct.

Note: If this had been a two-tailed test, then the 0.05 rejection region would have been split between the two tails, each having 0.025. In that case, it's not clear whether p = 0.034 is lower than 0.025 unless we know whether p was calculated on one side (as we did in class) or on both sides (as is done in the textbook). I asked Prof. Selcuk about this in an email and he replied that he would avoid such ambiguous cases on the final exam.

Exhibit 9-1
n = 36
xbar = 24.6
S = 12
H₀: μ ≤ 20
H_a: μ > 20

Question 18: Refer to Exhibit 9-1. The test statistic (t-score of xbar) is
a. 2.3
b. 0.38
c. -2.3
d. -0.38

Answer: The formula (on the formula sheet) for the t test statistic is:
t = (xbar - μ₀)/(s/√n)
= (24.6-20)/(12/√36)
= 4.6/2 = 2.3
A is the correct answer.

Question 19: Refer to Exhibit 9-1. If the test is done at 95% confidence, the null hypothesis should
a. not be rejected
b. be rejected
c. Not enough information is given to answer this question.
d. None of these alternatives is correct.

Answer: This question is tricky because we don't know if it's a one-tail or two-tail test. First, assume it's a one-tail test, i.e. the entire rejection region is in one tail. Refer to the t distribution table and look up the t value for 35 degrees of freedom and a 0.05 area in the tail. We find that t value to be approximately 1.69. Our t test statistic is 2.3 which is greater than 1.69, indicating that we should reject the null hypothesis, H₀.

Just to be sure, let's assume that's it's a two-tail test, so the rejection region is only 0.025 on each side. Referring to the t distribution table again, we find the t value for 35 degrees of freedom and a 0.025 area is approximately 2.03. Again, our t test statistic is more extreme than the critical t value. Therefore, reject the null hypothesis, H₀.

Answer B is correct.

Question 20: In regression analysis if the dependent variable is measured in dollars, the independent variable
a. must also be in dollars
b. must be in some units of currency
c. can be any units
d. can not be in dollars

Answer: This is entirely conceptual. The dependent and independent variables are entirely independent of each other. Think of the site.mtw example that we were using extensively in class. The dependent variable was store sales (measured in dollars) and the independent variable was the size of the store (measured in square feet). The correct answer is C - the independent variable can be in any units.

Question 21: In a regression analysis, if SST=4500 and SSE=1575, then the coefficient of determination (R²) is
a. 0.35
b. 0.65
c. 2.85
d. 0.45

Answer: Since SST=SSE+SSR, SSR=4500-1575=2925. And R²=SSR/SST=2925/4500=0.65. Therefore, answer B is correct.

Question 22: Regression analysis was applied between sales (Y in $1,000) and advertising (X in $100), and the following estimated regression equation was obtained.
Y-hat = 80 + 6.2 X
Based on the above estimated regression line, if advertising is $10,000, then the point estimate for sales (in dollars) is
a. $62,080
b. $142,000
c. $700
d. $700,000

Answer: When a question is this easy, you know there's some sort of trick. Watch your units!! Since X is in hundreds of dollars, plug in 100 in the regression equation. Y = 80 + 6.2(100) = 700. Y is in thousands of dollars. Therefore, the point estimate for sales in dollars is $700,000 - answer D.

Question 23: If the coefficient of correlation is a positive value, then
a. the intercept must also be positive
b. the coefficient of determination (R2) can be either negative or positive, depending on the value of the slope
c. the regression equation could have either a positive or a negative slope
d. the slope of the line must be positive

Answer: We learned about the coefficient of correlation way back in Chapter 3. It's a measure of the strength of the linear relationship between x and y. Its values range from -1 to 1. Values close to -1 or 1 indicate a strong linear relationship, either negative or positive.

Answer A is incorrect because the coefficient of correlation tells us nothing about the intercept.
Answer B is incorrect because the coefficient of determination (r²) can only be positive. r² = SSR/SST and both SSR and SST are positive (since they're both sums of squares), so r² must be positive.
Answer C is incorrect because a positive coefficient of correlation indicates a positive relationship which would be modeled with a positive slope.
Answer D is correct.

Exhibit 14-10
The following information regarding a dependent variable Y and an independent variable X is
provided.
∑ X = 16 ∑ (x-xbar)(y-ybar) = -8
∑ Y = 28 ∑ (x-xbar)² = 8
n = 4

Question 24: Refer to Exhibit 14-10. The slope of the regression function is
a. -1
b. 1.0
c. 11
d. 0.0

Answer: On the formula sheet we have the formula for the regression slope, b₁:
b₁ = ∑ (x-xbar)(y-ybar) / ∑ (x-xbar)² = -8/8 = -1.
So answer A is correct.

Question 25: Refer to Exhibit 14-10. The intercept of the regression line is
a. -1
b. 1.0
c. 11
d. 0.0

Answer: Again, the formula sheet gives us the computation for the intercept, b₀:
b₀ = ybar - b₁xbar = (28/4) - (-1)(16/4) = 7 + 4 = 11.
So answer C is correct.

More answers to sample problems to come. (I'm kinda jumping around for now.)

Final Exam Study Guide - Analysis of Prior Exam Questions

Looking at last quarter's exam gives us some insight as to what to expect on our final. The most important piece is that it provides practice questions at the level we'll be expected to perform. It is extremely worthwhile to do these problems on your own and make sure you understand the answer.*

Another interesting insight that we gain from the sample exam is the distribution of questions. Here's what I came up for the number of questions per chapter and the number of points associated with those questions:


          Mult    Short   Total
Chapter   Choice  Answer  Points
7         6       0       12
8         5       1       20
9         8       1       26
12        6       3       42

We'll probably have a few questions from Chapter 6 thrown in, but those will probably be relatively easy compared to the more advanced material. These numbers tell me one thing for sure: Chapter 12 is really important!

*For the record: If you noticed that I didn't stick around for the in-class review off the sample final on Thursday, it's not because I think I know all this stuff! Just the opposite. Almost all of this material is new to me and I wanted to work through all the questions on my own without having heard the answer already solved by someone else.

GSB420 - Business Statistics

Sunday, November 6, 2011

The New GSB420 Blog

Friday, March 28, 2008

ECO 509 - Spring Quarter 2008

Friday, March 14, 2008

Final Exam Recap

Thursday, March 13, 2008

Final Exam Study Guide - Last Minute Notes

Wednesday, March 12, 2008

Final Exam Study Guide - Practice Questions - Part 2

Monday, March 10, 2008

Final Exam Study Guide - Practice Questions

Final Exam Study Guide - Analysis of Prior Exam Questions

Feedback? Questions?

Subscribe to this Blog

Tags

Blog Archive

Web Resources

My Profile