1 - 1 Statistics An Introduction
1 - 2 Learning Objectives 1.Define Statistics 2.Describe the Uses of Statistics 3.Distinguish Descriptive & Inferential Statistics 4. Define Population, Sample, Parameter, & Statistic 5. Identify data types
1 - 3 What is Statistics? The practice (science?) of data analysis Summarizing data and drawing inferences about the larger population from which it was drawn
1 - 4 Statistical Methods Statistical Methods Descriptive Statistics Inferential Statistics
1 - 5 Descriptive Statistics 1.Involves Collecting Data Collecting Data Presenting Data Presenting Data Characterizing Data Characterizing Data 2.Purpose Describe Data Describe Data X = 30.5 S 2 = Q1Q2Q3Q4 $
1 - 6 Inferential Statistics 1.Involves Estimation Estimation Hypothesis Testing Hypothesis Testing 2.Purpose Make Decisions About Population Based on Sample Characteristics Make Decisions About Population Based on Sample Characteristics Population?
1 - 7 Key Terms 1.Population (Universe) All Items of Interest All Items of Interest 2.Sample Portion of Population Portion of Population 3.Parameter Summary Measure about Population Summary Measure about Population 4.Statistic Summary Measure about Sample Summary Measure about Sample P in Population & ParameterP in Population & Parameter S in Sample & StatisticS in Sample & Statistic
1 - 8 Data Types Quantitative Discrete Discrete Continuous ContinuousQualitative Nominal (categorical) Nominal (categorical) Ordinal (rank ordered categories) Ordinal (rank ordered categories)
1 - 9 Sampling Representative sample Same characteristics as the population Same characteristics as the population Random sample Every subset of the population has an equal chance of being selected Every subset of the population has an equal chance of being selected
Review Descriptive vs. Inferential Statistics Vocabulary Population Population (Random, representative) sample (Random, representative) sample Parameter Parameter Statistic Statistic Data types
Methods for Describing Data
Learning Objectives 1.Describe Qualitative Data Graphically 2.Describe Numerical Data Graphically 3.Create & Interpret Graphical Displays 4.Explain Numerical Data Properties 5.Describe Summary Measures 6.Analyze Numerical Data Using Summary Measures
Data Presentation
Presenting Qualitative Data
Data Presentation
Student Specializations Specialization | Freq. Percent Cum HCI | HCI | IEMP | IEMP | LIS | LIS | Undecided | Undecided | Total | Total |
Student Specializations
Undergrad Majors UG major | Freq. Percent Cum. UG major | Freq. Percent Cum American Studies | American Studies | Cog Sci | Cog Sci | Comp Sci | Comp Sci | Economics | Economics | English | English | Environmental Engineering | Graphic Design | Graphic Design | Math | Math | Mechanical Engineering | Mechanical Engineering | Nutrition | Nutrition | Sci and Tech Policy | Sci and Tech Policy | Telecommunications | Telecommunications | Total | Total |
Favorite Colors color | Freq. Percent Cum. color | Freq. Percent Cum black | black | blue | blue | green | green | orange | orange | purple | purple | red | red | white | white | Total | Total |
Calculus Knowledge integrals | Freq. Percent Cum. integrals | Freq. Percent Cum | | | | | | | | | | Total | Total |
Presenting Numerical Data
Data Presentation
Student Age (Reported) Data Stem-and-leaf plot for age 2* | * | * | * | * | 4* | 5* | 5* | 6* | 6* | 7* | 6 7* | 6
Histogram
Starting Salaries (in $K) 3* | 8 3* | 8 4* | * | * | * | * | * | * | 5 7* | 5 8* | 0 8* | 0
Numerical Data Properties
Thinking Challenge... employees cite low pay -- most workers earn only $20, President claims average pay is $70,000! $400,000 $70,000 $50,000 $30,000 $20,000
Standard Notation MeasureSamplePopulation Mean x Stand. Dev. s Variance s 2 2 SizenN
Numerical Data Properties Central Tendency (Location) Variation (Dispersion) Shape
Numerical Data Properties & Measures Numerical Data Properties Mean Median Mode Central Tendency Range Variance Standard Deviation Variation Skew Shape Interquartile Range
Central Tendency
Numerical Data Properties & Measures Numerical Data Properties Mean Median Mode Central Tendency Range Variance Standard Deviation Variation Skew Shape Interquartile Range
What’s wrong with this? Measurements Middle measurement is 2, so that’s the median X X n XXX n i i n 1 12
Ages Mean = 29 Median = 27 2* | * | * | * | * | 4* | 5* | 5* | 6* | 6* | 7* | 6 7* | 6
Summary of Central Tendency Measures MeasureEquationDescription Mean X i /n Balance Point Median(n+1) Position Position 2 Middle Value When Ordered Modenone Most Frequent
Shape
Numerical Data Properties & Measures Numerical Data Properties Mean Median Mode Central Tendency Range Interquartile Range Variance Standard Deviation Variation Skew Shape
Shape 1.Describes How Data Are Distributed 2.Measures of Shape Skew = Symmetry Skew = Symmetry Right-SkewedLeft-SkewedSymmetric Mean =Median =Mode Mean Median Mode Mode Median Mean
Variation
Numerical Data Properties & Measures Numerical Data Properties Mean Median Mode Central Tendency Range Variance Standard Deviation Variation Skew Shape Interquartile Range
Quartiles 1.Measure of Noncentral Tendency 2.Split Ordered Data into 4 Quarters 3.Position of i-th Quartile 25%25%25%25% Q1Q1Q1Q1 Q2Q2Q2Q2 Q3Q3Q3Q3 Positionin g Point of Q i(n i 1) 4
Ages RangeQuartiles 2* | * | * | * | * | 4* | 5* | 5* | 6* | 6* | 7* | 6 7* | 6
Box Plots - Age and Salary Quartiles: 24, 27, 30 Inner fences: (15,39) Outer fences: (6, 48) Quartiles: 41K, 50K, 60K Inner fences: ?? Outer fences: ??
Variance & Standard Deviation 1.Measures of Dispersion 2.Most Common Measures 3.Consider How Data Are Distributed 4.Show Variation About Mean ( X or ) X = 8.3 = 8.3
Sample Variance Formula n - 1 in denominator! (Use N if Population Variance) S (X X) n (XX)(XX)(XX) n i i n n ...
Equivalent Formula
Another Equivalent Formula
Empirical Rule If x has a “symmetric, mound-shaped” distribution Justification: Known properties of the “normal” distribution, to be studied later in the course
Preview of Statistical Inference You observe one data point Make hypothesis about mean and standard deviation from which it was drawn Empirical Rule tells you how (un)likely the data point is If very unlikely, you are suspicious of the hypothesis about mean and standard deviation, and reject it If very unlikely, you are suspicious of the hypothesis about mean and standard deviation, and reject it
Summary of Variation Measures MeasureEquationDescription Range X largest -X smallest Total Spread Interquartile Range Q 3 -Q 1 Spread of Middle 50% Standard Deviation (Sample) XX n i 21 Dispersion about Sample Mean Standard Deviation (Population) X N iX 2 Dispersion about Population Mean Variance (Sample) (X i - X) 2 n Squared Dispersion about Sample Mean
Z-scores Number of standard deviations from the mean
Conclusion 1.Described Qualitative Data Graphically 2.Described Numerical Data Graphically 3.Created & Interpreted Graphical Displays 4.Explained Numerical Data Properties 5.Described Summary Measures 6.Analyzed Numerical Data Using Summary Measures