Friday, February 8, 2008

Lecture 5 - Ch 6b - Standard Normal Distribution

This week we did four things:
1. Reviewed the midterm briefly
2. Completed chapter 6 - normal distribution
3. Chapter 7 - Central Limit Theorem
4. Started chapter 8 - Confidence intervals

Midterm Review
As I suspected, there were questions on short-answer question #3b. You can see my explanation in a previous post. Note that the answer in the answer key posted on Blackboard is slightly incorrect. Rather than 0.117143, the correct answer is 0.1171145.

End of Chapter 6 - Standard Normal Distribution Examples
We pretty much completed chapter 6 in Lecture 4.

The two key steps in dealing with normally distributed data are:
1. Compute the z-score
2. Read the table and process the data.

Computing the z-score of any given x should be simple enough: z = x-mu/sigma, where mu is the mean and sigma is the standard deviation.

Reading the table is also straightforward once you know how to read them. First, calculate the z-score to two decimal places. (If the z-score is negative, ignore the negative sign.) Then find the row for the z-score that you're looking for truncated after the first place after the decimal. Then go across that row to the column for the value of the second decimal of the z-score. I had never worked with these tables before and it wasn't explained in class, so maybe this is obvious to you but it wasn't to me.

Solving single z-score problems
The table gives you the area under the curve from 0 to the z-score. Let's call that value tz. It represents the probablity that a given value is between the mean (o) and z. But that's not always what you want. For a single z-score (call it z), there are 4 possible scenarios for which you might want to know the probability:
1. P(x<z) where z is greater than 0
2. P(x>z) where z is greater than 0
3. P(x<z) where z is less than 0
4. P(x>z) where z is less than 0

1. P(x<z) where z is greater than 0. You want to know the area under the curve to the left of z. I.e. from -infinity to z. Well the green area from 0 to z is the value you just looked up in the table, call it tz. The red area to the left of 0 is 0.5. [Remember that the area under the entire curve is 1 and it's symmetric around 0, so the are to the left and right of 0 are both 0.5.] Therefore, the entire are to the left of z is simply the sum of those two areas. P(x<z) = 0.5 + tz

2. P(x>z) where z is greater than 0. In this case you want to find the area under the curve from z to infinity. I.e. the right hand tail. Since the area from 0 to infinity is 0.5 and the area from 0 to z is tz, you can just subtract the two to get your answer. P(x>z) = 0.5 - tz

3. P(x<z) where z is less than 0. Now we're dealing with z values less than the mean. In this case you want to know the area under the curve to the left of z, i.e. from -infinity to z. You do this similarly to the way you did #2 above. The area from -infinity to 0 is 0.5 and from z to 0 is tz. Subtract the two to get the area under the left hand tail. P(x<z) = 0.5-tz

4. P(x>z) where z is less than 0. Here you need to find the area from z to +infinity. Well, you know the area from z to 0 is tz and the area from 0 out to +infinity is 0.5. So just add them up to get the full area. P(x>z) = 0.5 + tz.

Solving problems with two z-scores
There are 3 possible scenarios where you might be given two values and asked to compute the probability that a given value falls between the two. First, compute the z-scores for each value, z1 and z2.

Now you'll be in one of the following situations:
1. z1 and z2>0
2. z1 and z2<0
3. z1<0 and and z2>0

1. & 2. Both of the first two scenarios can be solved in the same way. Find tz1 and tz2 from the table. Since you're interested in the area under the curve between the two z-scores, just calculate the difference between tz1 and tz2. That's your answer! (If the answer is negative, just ignore the negative sign.) P(z1<x<z2) = tz1-tz2

3. If z1 and z2 are on opposite sides of the mean, you need to find the total area from z1 to 0 and from 0 to z2. All you need to do here is to find tz1 and tz2 from the table and add them up. P(z1<x<z2) = tz1+tz2

I think these seven cases (4 with a single value and 3 with two values) cover just about every case that I've seen in examples. If you can think of an example that these don't cover, let me know.

Note: Make sure you understand how each one of these scenarios works. I find it easiest to draw out a graph (on paper or in my head) of what I'm trying to solve in order to figure out how to solve it. The hardest part of the problems is often figuring out which one of the scenarios you're dealing with and which formula to use. Once you've done that, it's just arithmetic.


lowkeydad said...

Intuitively I always want to go from left to right, which makes the negative z-scores problematic. This helped keep me straight. :) Thanks!

Eliezer said...

If you want to go from left to right, you can use the cumulative distribution function (CDF) to solve these problems. In fact, the textbook uses it in several of the examples. Use table e.2 instead of e.11 to get your values and you're set.

I blogged it this way because it's the way that was presented in class, but I agree that the left-to-right/CDF method is more user-friendly.