Monday, February 4, 2008

Getting ready for the Central Limit Theorem

Since the material on sampling and the Central Limit Theorem (CLT) is more or less new to me, I started reading ahead over the weekend. After reading the lecture notes, our textbook and another text that I have (Understandable Statistics), I realize that the CLT and the whole discussion of the sampling distribution of the mean involves meta-statistics. That is, we're not examining the distribution of some random variable itself, but rather we're taking the mean of a bunch of samples of a random variable and looking at the distribution of those statistics.

Every time I think of meta-analysis, I think of the term I first read in Douglas Hofstadter's Pulitzer prize-winning magnum opus Godel, Escher, Bach: An Eternal Golden Braid - JOOTSying. JOOTS stands for "Jump Out Of The System". When you "jootsy", you leave the system itself, take a step back and look at how the system works. (BTW, that book and Hofstadter's monthly Metamagical Themas columns opened up my way of thinking to new angles and loops. It's not light reading, but it is very enjoyable. If you have a chance, you should settle down with it and learn about how the works of a great mathematician, a unique artist and a genius composer all share a common thread.)

So here, we're taking a step back and instead of analyzing the observations themselves, we're analyzing the mean of a bunch of samples and talking about what the mean of the means or the standard deviation of the means will be.

It also seems analogous to the basic idea of the first and second derivative in calculus. That is, the derivative is the slope of the curve. The second derivative is the slope of the curve defined by the first derivative. There too, we're using the same method (derivation) and applying it to the first application to get to a new level.

So what does the Central Limit Theorem say? Well, to sum it up with 3 points:

  1. If you have any distribution, and you take samples of size n and look at the mean of each sample, the distribution of the mean looks like the normal distribution (as you get larger n).
  2. The mean of the mean distribution will be close to the mean of the population
  3. The standard deviation of the mean distribution will be the standard deviation of the population divided by the square root of n.

No comments: