Sunday, February 10, 2008

Lecture 5 - Ch 7 - Sampling Distributions

This chapter starts us down the road to inferential statistics, i.e. making inferences about the entire population when we only have observations about a sampling or several samplings.

Our first step here is the Central Limit Theorem. The Central Limit Theorem says that if we take many large samples (usually >=30), then

  1. the distribution of the sample means is approximately normal
  2. the mean of the sample means will be equal to the mean of the population, and
  3. the standard deviation of the sample means will be sigma/sqrt(n), where sigma is the standard deviation of the population and n is the sample size.
The Central Limit Theorem holds for any distribution of data.

Therefore, if you don't know anything about the shape of the distribution, but you do know its mean and standard deviation, you can't say much about the probability of individual observations, but you can make inferences about sample sets, such as the mean and standard deviation of the sample set, by assuming (by the CLT) that the distribution of the sample set follows the normal distribution, N(mu,sigma/sqrt(n)).

Here are two videos from Kent Murdick, instructor of mathematics at the University of South Alabama. In the first, he gives a concise, 2 minute overview of the Central Limit Theorem.


In the second video (4 mins), he works through a nice example problem.



Questions for research and thought:
1. What's the proof the Central Limit Theorem? Who discovered it? It's not something that's intuitively obvious at all. See Wolfram's Mathworld which mentions a six-line proof by Kallenberg but does not reproduce it. Their "elementary" proof is considerably longer and involves inverse Fourier transforms. Oh boy!
2. Prof. Murdick added another stipulation in his video - that the sample size, n, must be small (<5%) href="http://en.wikipedia.org/wiki/Central_limit_theorem#_note-nsize">note in the Wikipedia article which references a few scholarly articles that appear to indicate that n>=30 may not be sufficient for the sample size.

Note: We will not be covering section 7.3 - Sampling distribution of the proportion. Sections 7.4 and 7.5 - Types of survey sampling methods and Evaluating survey worthiness - were not mentioned.

2 comments:

lowkeydad said...

I feel like I'm missing something... perhaps its because I'm tired right now... :) But, on some of these problems, where you are asked to find the probability of 3 separate possible #'s... ie: P(X < 1), P(1 < X < 3), P(X>3)... shouldn't the final tallies equal 1? Either I'm doing something way off base here, or the values DON'T equal 1. Maybe I'm getting lost in the z-score twilight zone... There are a couple problems like this in the homework.

Eliezer said...

Yes, you're right. If the 3 cases cover the entire sample space (as in your example), the sum of the probabilities would equal 1. However, none of the homework problems have cases that cover the entire sample space. There's always either overlap between the cases or portions of the sample space that are not covered, or both.

Look at them again and let me know if you think a particular hw problem is covers the sample space but the total probability doesn't equal 1.