What is Statistics? Day 2.
Statistics Collecting data Organizing data Analyzing data Drawing conclusions
Population All individuals/objects we want to study
Sample Subset of population Reasonable size to study
Population/Sample After a major earthquake in California, insurance agents want to estimate the monetary value of damage to single-family homes in San Francisco. One hundred single-family homes in San Francisco were randomly selected for inspection. Describe the population and sample for this study. Population: All single-family homes in San Francisco Sample: 100 homes selected for inspection
Variable Characteristic we're studying
Data Observations of variable(s)
Let's Make Some Graphs!
Favorite Music Genre Univariate Data: One variable Genre Alternative Classical Country Frequency 16 4 15 Genre Pop Rap Rock Other Frequency 35 21 18 16 Univariate Data: One variable
U of M vs. MSU by Gender Bivariate Data: Two variables Male Female U of M 20 16 MSU 10 Neither 3 1 Bivariate Data: Two variables (Multivariate Data: More than 2 variables)
Bar Graph Bars do not touch Describe: Which category occurred most or least often? For bivariate data: double or segmented bar graph
Pie Chart Slice angle = proportion (%) 360° Describe: Which category occurred most or least often? End of day 1
Fastest Speed Driven 85 90 93 85 95 90 130 81 95 80 140 92 90 90 100 102 80 80 80 50 120 75 75 75 105 75 75 85 72 90 100 100 75 80 110 92 80 95 85 80 85 95 150 0 90 95 100 70 77 90 90 103 75 98 75 100 80 80 End of day 1
Categorical Variables Data is in categories Also called qualitative Ex.: Gender, Type of Car Use a bar graph or pie chart
Numerical Variables Data is in numbers Also called quantitative Ex: Speed, shoe size Must make sense to find the average Phone number: Not numerical!
Dotplot Dots (or X's, *'s, etc.) on a number line Comparative dotplot: One number line, multiple plots Sometimes you see other markings, like X's – still a dotplot!
Describing a univariate numerical graph Have students do Features of Distributions in groups first, then go through this
What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ?
Center Where is the middle of the data (roughly)?
What strikes you as the most distinctive feature(s) of the distribution of exam scores in class K?
Unusual things Gaps, clusters, anything else unusual Outliers: values that lie far away from the rest of the data Gaps, clusters, anything else unusual CUSS!
What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F? Class
Spread How spread out is the data? How much variability is there? Range = maximum – minimum
What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ?
Shape What overall shape is the distribution? Distributions Activity
Shapes of Distributions Day 3 – Intro with results of distributions activity
Symmetrical Sides are (more or less) mirror images Special type: bell curves
Uniform Every value has (more or less) equal frequency
Skewed (left or right) One side (tail) is longer than the other Skewness is fewness! Skewed left (negatively skewed) Skewed right (positively skewed) Show example of skewed left and skewed right
Bimodal (or multi-modal) Two (or more) separate peaks Go back to distributions activity & sort into the four types
***CONTEXT*** Descriptions must: Include the context Use statistical vocabulary "Bell curve"
Fastest Speed Driven 85 90 93 85 95 90 130 81 95 80 140 92 90 90 100 102 80 80 80 50 120 75 75 75 105 75 75 85 72 90 100 100 75 80 110 92 80 95 85 80 85 95 150 0 90 95 100 70 77 90 90 103 75 98 75 100 80 80 End of day 1
Height (inches) Females: 70 66 65 68 68 68 67 64 61 66 66 64 71 70 70 66 65 68 68 68 67 64 61 66 66 64 71 70 63 63 67 63 67 70 64 66 62 69 66 66 Males: 70 73 73 65 75 67 73 73 71 66 68 68 74 65 73 75 71 70 71 72 70 71 75 72 72 60 71 69 69 74 71 71 66 End of day 1
More numerical graphs
Stemplot (stem & leaf plot) Stem = 1st digit, Leaves = rest of digits Leaves in increasing order Commas only with double-digit leaves Include a key Can split stems with long leaves Back to back stemplot: two sets of data Show example of split stems.
Price per ounce for various brands of shampoo at a grocery store: 0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23 Do on board
PLAN test scores for a sample of sophomores: 12 13 22 21 15 13 18 28 19 21 23 17 16 19 12 20 27 21 13 25 14 25 14 20 Do on board
Total tobacco exposure time (in seconds) for Disney G-rated movies: 223 176 548 37 158 51 299 37 11 165 74 9 2 6 23 206 9 Total tobacco exposure time (in seconds) for other studios’ G-rated movies: 205 162 6 1 117 5 91 155 24 55 17 Do on board
Histogram Bar graph for numerical data Bars touch Data is grouped into classes (intervals) y-axis options: frequency (how many data points in each class) relative frequency (percent of data in each class) Two types: Discrete: Bars are centered over discrete values Continuous: Bars cover a class (interval) of values Draw a picture of each type with survey data
Discrete (numerical) data Listable set of numbers We're counting
Continuous (numerical) data Any value in the variable's domain is possible We're measuring
Identify the type of variable: Income of adults in your city Color of M&M candies selected at random from a bag Number of speeding tickets each student in AP Statistics has received Area code of an individual Birth weight of female babies born at a large hospital over the course of a year Numerical (Continuous) Categorical Numerical (Discrete) Categorical Numerical (Continuous)
Number of Pieces of Gum Chewed Per Day: 0 3 0 1 3 3 1 1 2 1 1 0 5 0 3 1 1 0 2 2 4 0 2 3 1 0 0 1 3 4 2 4 0 2 1 3 2 1 2 0 2 1 7 1 1 1 3 3 2 2 6 2 0 2 1 Do on board
2011 Life Expectancies at Birth in Each of the 54 Countries in Africa: 50 64 62 57 76 39 48 53 53 52 71 74 61 49 78 63 60 49 49 55 55 75 64 58 50 52 60 73 58 74 75 48 54 53 59 64 58 52 52 61 61 50 54 52 63 61 59 56 57 63 56 52 55 53 Do on board
Frequency Table for Life Expectancies: Class Frequency Relative Cumulative 30-39 40-49 50-59 60-69 70-79 Do on board
Cumulative Relative Frequency Plot Ogive ("oh-jive") Adds up the percent of data you've counted as you move left to right Shows percentile: Percent of individuals at or below a certain value Quartile: Every 25% of the data 1st Quartile (Q1) = 25th percentile 2nd Quartile (Median) = 50th percentile 3rd Quartile (Q3) = 75th percentile Interquartile Range (IQR) = Example on notes outline – life expectancy in African nations Q3 – Q1