Introduction to Educational Statistics Joseph Stevens, Ph.D., University of Oregon (541) 346-2445, stevensj@uoregon.edu
WHAT IS STATISTICS? Statistics is a group of methods used to collect, analyze, present, and interpret data and to make decisions.
POPULATION VERSUS SAMPLE A population consists of all elements – individuals, items, or objects – whose characteristics are being studied. The population that is being studied is also called the target population.
POPULATION VERSUS SAMPLE cont. The portion of the population selected for study is referred to as a sample.
POPULATION VERSUS SAMPLE cont. A study that includes every member of the population is called a census. The technique of collecting information from a portion of the population is called sampling.
POPULATION VERSUS SAMPLE cont. A sample drawn in such a way that each element of the population has an equal chance of being selected is called a simple random sample.
TYPES OF STATISTICS Descriptive Statistics consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures.
TYPES OF STATISTICS Inferential Statistics consists of methods that use information from samples to make predictions, decisions or inferences about a population.
Basic Definitions A variable is a characteristic under study that assumes different values for different elements. A variable on which everyone has the same exact value is a constant.
Basic Definitions The value of a variable for an element is called an observation or measurement.
Basic Definitions A data set is a collection of observations on one or more variables. A distribution is a collection of observations or measurements on a particular variable.
TYPES OF VARIABLES Quantitative Variables Discrete Variables Continuous Variables Qualitative or Categorical Variables
Quantitative Variables cont. A variable whose values are countable is called a discrete variable. In other words, a discrete variable can assume only a limited number of values with no intermediate values.
Quantitative Variables cont. A variable that can assume any numerical value over a certain interval or intervals is called a continuous variable.
Categorical Variables A variable that cannot assume a numerical value but can be classified into two or more categories is called a categorical variable.
Scales of Measurement How much information is contained in the numbers? Operational Definitions and measurement procedures Types of Scales Nominal Ordinal Interval Ratio
Descriptive Statistics Variables can be summarized and displayed using: Tables Graphs and figures Statistical summaries: Measures of Central Tendency Measures of Dispersion Measures of Skew and Kurtosis
Measures of Central Tendency Mode – The most frequent score in a distribution Median – The score that divides the distribution into two groups of equal size Mean – The center of gravity or balance point of the distribution
Median The calculation of the median consists of the following two steps: Rank the data set in increasing order Find the middle number in the data set such that half of the scores are above and half below. The value of this middle number is the median.
Arithmetic Mean The mean is obtained by dividing the sum of all values by the number of values in the data set. Mean for sample data:
Example: Calculation of the mean Four scores: 82, 95, 67, 92
The Mean is the Center of Gravity 95 92 82 67
The Mean is the Center of Gravity X (X – X) 82 82 – 84 = -2 95 95 – 84 = +11 67 67 – 84 = -17 92 92 – 84 = +8 ∑(X – X) = 0
Comparison of Measures of Central Tendency
Measures of Dispersion Range Variance Standard Deviation
Range Highest value in the distribution minus the lowest value in the distribution + 1
Variance Measure of how different scores are on average in squared units: ∑(X – X)2 / N
Standard Deviation Returns variance to original scale units Square root of variance = sd
Other Descriptors of Distributions Skew – how symmetrical is the distribution Kurtosis – how flat or peaked is the distribution
Kinds of Distributions Uniform Skewed Bell-shaped or Normal Ogive or S-shaped
Normal distribution with mean μ and standard deviation σ x
Total area under a normal curve. The shaded area is 1.0 or 100% μ x
A normal curve is symmetric about the mean Each of the two shaded areas is .5 or 50% .5 .5 μ x
Areas of the normal curve beyond μ ± 3σ. Each of the two shaded areas is very close to zero μ – 3σ μ μ + 3σ x
Three normal distribution curves with the same mean but different standard deviations σ = 5 σ = 10 σ = 16 x μ = 50
Three normal distributions with different means but the same standard deviation σ = 5 σ = 5 σ = 5 µ = 20 µ = 30 µ = 40 x
Areas under a normal curve For a normal distribution approximately 68% of the observations lie within one standard deviation of the mean 95% of the observations lie within two standard deviations of the mean 99.7% of the observations lie within three standard deviations of the mean
μ – 3σ μ – 2σμ – σ μ μ + σ μ + 2σ μ + 3σ 99.7% 95% 68% μ – 3σ μ – 2σμ – σ μ μ + σ μ + 2σ μ + 3σ
Score Scales Raw Scores Percentile Ranks Grade Equivalents (GE) Standard Scores Normal Curve Equivalents (NCE) Z-scores T-scores College Board Scores
Converting an X Value to a z Value For a normal random variable X, a particular value of x can be converted to its corresponding z value by using the formula where μ and σ are the mean and standard deviation of the normal distribution of x, respectively.
The Logic of Inferential Statistics Population: the entire universe of individuals we are interested in studying Sample: the selected subgroup that is actually observed and measured (with sample size N) Sampling Distribution of a Statistic: a distribution of samples like ours
The Three Distributions Used in Inferential Statistics I. Population III. Sampling Distribution of the Statistic II. Sample