What is Statistics?
Statistics 4 Working with data 4 Collecting, analyzing, drawing conclusions
Descriptive statistics 4 Organizing & summarizing data 4 Graphs, tables, etc.
Inferential statistics 4 Making conclusions based on a sample
Population 4 All individuals/objects we want to study
Sample 4 Subset of population 4 Reasonable size to study
Population/Sample After a major earthquake in California in 1994, representatives of the insurance industry wanted to estimate the monetary loss due to damage to single-family homes in Northridge, CA. From the set of all single-family homes in Northridge, 100 homes were selected for inspection. Describe the population and sample for this study. Population: All single-family homes in Northridge Sample: 100 homes selected for inspection
Variable 4 Characteristic we're studying 4 Value varies from person to person
Data 4 Observations of variable(s)
Types of variables "To be is to be the value of a variable." – Willard Van Orman Quine
Categorical variables 4 Categories of the population 4 Also called qualitative –Ex.: Type of car you drive
Numerical variables 4 Data is in numbers 4 Also called quantitative –Ex: Shoe size 4 Must make sense to find the average! –Phone number: Not numerical!
Discrete (numerical) 4 Listable set of values 4 Usually counts of items
Continuous (numerical) 4 Any value in the variable's domain is possible 4 Usually measurements
Identify the type of variable: 1. Income of adults in your city 2. Color of M&M candies selected at random from a bag 3. Number of speeding tickets each student in AP Statistics has received 4. Area code of an individual 5. Birth weight of female babies born at a large hospital over the course of a year Numerical Categorical (Continuous) (Discrete) (Continuous)
Classification by number of variables 4 Univariate Data: Describes a single characteristic 4 Bivariate Data: Describes two characteristics 4 Multivariate Data: Describes more than two characteristics (beyond the scope of AP Stats)
Graphs for categorical data
Bar Graph 4 Bars do not touch 4 Categorical variable is typically on x-axis 4 To describe: Which category occurred most/least often? 4 Bivariate categorical data sets: Can make double bar graph or segmented bar graph
Using class survey data, graph: Handedness & Shoes: double bar graph Speed & Gender: segmented bar graph
Pie chart / Circle graph 4 To make: Each slice = proportion 360° 4 To describe: Which category occurred most/least often
Graphs for numerical data
Dotplot 4 Put dots on a number line 4 Comparative dotplots: use the same axis for multiple groups
Describing a univariate numerical graph
What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ?
Center 4 Where does the middle of the data fall? 4 3 types of central tendency: –mean, median, mode
What strikes you as the most distinctive feature(s) of the distribution of exam scores in class K? K
Unusual things 4 Outliers: values that lie far away from the rest of the data 4 Gaps, clusters, anything else unusual
What strikes you as the most distinctive difference among the distributions of scores in classes D, E, & F? Class
Spread 4 How spread out is the data? 4 3 measures of variability: –range, standard deviation, IQR
What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ?
Shape 4 What overall shape is the distribution? 4 4 options
Shapes of Distributions
Symmetrical 4 Sides are (more or less) mirror images –Special type: bell-shaped
Uniform 4 Every value has (more or less) equal frequency (height)
Skewed (left or right) 4 One side (tail) is longer than the other 4 Skewness is fewness! –Skewed left = negatively skewed –Skewed right = positively skewed
Bimodal (multi-modal) 4 Two (or more) separate peaks
***CONTEXT*** 4 Descriptions must: –Include the context –Use statistical vocabulary "Bell curve"
More numerical graphs
4 Stem = 1 st digit, Leaves = rest of digits –Leaves in increasing order –Commas with double-digit leaves 4 Include a key 4 Can split stems when you have long leaves 4 Comparative stemplot shows two sets of data back to back Stemplot (stem & leaf plot) Would a stemplot be a good graph for the number of pieces of gum chewed per day by AP Stats students? Would a stemplot be a good graph for the number of pairs of shoes owned by GBHS students?
1. Price per ounce for various brands of dandruff shampoo at a local grocery store: Can we make a stemplot with this data?
2. Tobacco use in G-rated movies: Total tobacco exposure time (in seconds) for Disney movies: Total tobacco exposure time (in seconds) for other studios’ movies: Can we make a stemplot with both sets of data at once?
4 Bar graph for numerical data 4 Bars touch 4 Shows frequency (how many data) or relative frequency (percent of data) 4 Two types: –Discrete: Bars are centered over discrete values –Continuous: Bars cover a class (interval) of values Histogram
Cumulative Relative Frequency Plot 4 Also called ogive ("oh-jive") 4 Adds up the percent of data you've covered as you move left to right 4 Shows percentile: Percent of individuals at or below a certain value 4 Quartile: Every 25% of the data –1 st Quartile (Q1) = 25 th percentile –3 rd Quartile (Q3) = 75 th percentile –Special name for Q2: 4 Interquartile Range (IQR) = Median Q3 – Q1