Tuesday, January 8, 2008

Lecture 1 - Ch 1 - Basic Definitions

Here are some basic definitions of terms used in the study of statistics:

When discussing statistics about a group, it's important to distinguish between the entire group and a subset of that group which we're able to look at in detail and then extrapolate from that detailed analysis of the sample to the larger group. The entire group is called the population. The subset which we're analyzing is the called the sample.

For example, we may want to know about the voting preferences among voters in the US. The population is all the people who are registered to vote in the upcoming election. Since we can't feasibly ask each and every one about their preferences, we ask a sample, perhaps just a few hundred or thousand voters, about their preferences and we extrapolate from the answers we get from the sample to the entire population.

Of course, you have to understand that this extrapolation involves a degree of uncertainty. Determining exactly what that uncertainty is will be the subject of a later lecture.

Population - the entire collection under discussion
Sample - a portion of the population selected for analysis

Parameter - a numerical measure that describes a characteristic of a population.
Statistic - a numerical measure that describes a characteristic of a sample.

Descriptive statistics - focuses on collection, summarizing and presenting a set of data.
Inferential statistics - uses sample data to draw conclusions about a population.

Categorical data - consist of categorical responses, such as yes/no, agree/disagree, Sunday/Monday/Tuesday..., etc.
Numerical data - consist of numerical responses. Numerical data can be either discrete or continuous (see below).

Discrete data - usually refers to integer responses, i.e. 0, 1, 2, 3, etc.
Continuous data - consist of responses that can be any numerical value.

No comments: