When You See (This), You Think (That) AP Stats Exam Review
Describe the distribution When you see…. Describe the distribution
You think S O C Shape (say what the shape is) Outliers (Note: Report results with and without outliers if any exist) C Center Mean and Median (and maybe mode--think if it’s important based on context) Spread Range, Standard Deviation, IQR (REMEMBER HOW TO CALCULATE IT!!!!!)
When you see…. Five Number Summary
You think…. Minimum Lower Quartile (Q1) Median Upper Quartile (Q3) Maximum
When you see…. Are there any outliers?
You think…. THE OUTLIER TEST!
Transforming a distribution When you see…. Transforming a distribution
You think…. Adding/subtracting a constant to every value of a distribution: Shifts measures of center by that amount Does not change measures of spread Does not change shape of the distribution Multiplying/dividing a constant to every value of a distribution: Multiplies/Divides measures of center by that constant Multiplies/Divides most measures of spread by that constant (range, IQR, standard deviation) Does not change the shape of the distribution
When you see…. Normally distributed OR Normal Distribution
You think…. DRAW A PICTURE!
Describe the association in the scatterplot When you see…. Describe the association in the scatterplot
You think…. Form Strength Direction Are there any outliers?
When you see…. Residual
You think…. Residual = Actual - Predicted
Interpret the slope of the least- squares line When you see…. Interpret the slope of the least- squares line
You think…. For every (unit increase in the x variable), the predicted (y variable) increases/decreases by (coefficient for slope).
Residual Plot for a Linear Regression When you see…. Residual Plot for a Linear Regression
NO PATTERN MEANS NO PROBLEM! You think…. NO PATTERN MEANS NO PROBLEM!
Coefficient of Determination (R-squared) When you see…. Coefficient of Determination (R-squared)
You think…. Two possible explanations: _____% of the variation in (response variable) can be explained by the variation in (predictor variable). _____% of the variation in (response variable) can be accounted for by the linear relationship with (predictor variable).
When you see…. Perform A Simulation
You think…. Specify how to model a component outcome using equally likely/random digits. Specify how to simulate trials. Pull it all together to run the simulation. Analyze the response variable.
Are the two events independent? When you see…. Are the two events independent?
You think…. Show that
Probability Distribution OR Probability Model When you see…. Probability Distribution OR Probability Model
List of every outcome and its probability When you think…. List of every outcome and its probability
Expected Value of a Probability Distribution When you see…. Expected Value of a Probability Distribution
You think….
Standard Deviation of a Probability Distribution When you see…. Standard Deviation of a Probability Distribution
You think….
Transforming a Random Variable When you see…. Transforming a Random Variable
You think…. Adding/subtracting a constant to a random variable: Shifts the expected value by that amount Does not change measures of spread (both standard deviation and variance) Multiplying/dividing a constant to a random variable: Multiplies/Divides measures of center by that constant Multiplies/Divides most measures of spread by that constant (range, IQR, standard deviation) Multiplies/Divides variance by the squared constant
Adding/Subtracting Random Variables When you see…. Adding/Subtracting Random Variables
You think…. Variances always add!
Binomial Distribution When you see…. Binomial Distribution
You think…. Check assumptions/conditions (BINS!) Use binompdf(n,p,x) or binomcdf(n,p,x) in your calculator
When you see…. “What is the probability that the first success happens before the _____th trial?”
Geometric probability distribution! You think…. Geometric probability distribution! Use geometricpdf(p, x) or geometriccdf(p, x) in your calculator
When you see…. “Can you approximate this binomial distribution with a normal distribution?”
Check np and nq (they should be at least 10) You think…. Check np and nq (they should be at least 10)
Sampling Distribution When you see…. Sampling Distribution
You think….
Construct a confidence interval When you see…. Construct a confidence interval
You think….
When you see…. Margin of Error
You think…. Margin of Error = Critical Value x Standard Error
“Is there evidence that…” When you see…. “Test the claim that…” OR “Is there evidence that…”
Create a Null and an Alternative – be sure to define all variables You think…. Hypothesis Testing Create a Null and an Alternative – be sure to define all variables
What does a p-value even mean in the first place? When you see…. What does a p-value even mean in the first place?
You think…. A p-value is the probability of the observed result occurring GIVEN THAT THE NULL HYPOTHESIS IS TRUE (not the other way around).
Type I and Type II Errors When you see…. Type I and Type II Errors
You think…. Type I Error The null hypothesis is true but you reject it Also known as a false positive α is the Type I Error Probability Type II Error The null hypothesis is false but you fail to reject it β is the Type II Error Probability
Power of a Hypothesis Test When you see…. Power of a Hypothesis Test
You think…. Power = 1 - β Increasing α will decrease β (and vice versa) Things that increase power: Increasing sample size Increasing α Decreasing β
Degrees of Freedom for Inferences When you see…. Degrees of Freedom for Inferences
You think…. One-Proportion and Two-Proportion: None! One-Sample: n - 1 Two-Sample: (from calculator or technology) Paired Data: # of differences - 1 Chi-Square Goodness-of-Fit: # of categories - 1 Chi-Square Test for Independence/Homogeneity: (R - 1)(C - 1) Regression: n - 2
Assumptions/Conditions for One-Proportion Inferences When you see…. Assumptions/Conditions for One-Proportion Inferences
You think…. Independent Random/Representative 10% Condition At least 10 successes and 10 failures (np≥10 and nq≥10)
Assumptions/Conditions for Two-Proportion Inferences When you see…. Assumptions/Conditions for Two-Proportion Inferences
You think…. Independent Are data values independent of each other? Are the two groups independent of each other? Randomization—check either Data are sampled or generated at random OR Data are representative of population 10% Condition Total sample size is no more than 10% of the population size Enough successes and failures At least 10 successes and 10 failures for each group Based on the pooled proportion (but OK if each group has at least 10 successes/failures)
Hypothesis Test for the Difference of Two Proportions When you see…. Hypothesis Test for the Difference of Two Proportions
You think…. I should use a pooled proportion! I should calculate the standard error by using the pooled proportion for both groups! I shouldn’t use the pooled proportion if I want to calculate a confidence interval. I should NOT flip out right now.
Assumptions/Conditions for One-Sample Inferences for a Mean When you see…. Assumptions/Conditions for One-Sample Inferences for a Mean
You think…. Independent Random/Representative 10% Condition Nearly Normal Condition (check histogram)
Hypothesis Test or Confidence Interval for the Difference of Two Means When you see…. Hypothesis Test or Confidence Interval for the Difference of Two Means
You think…. Should I pair data?
When you see…. Assumptions/Conditions for Two-Sample Inferences for a Difference in Means
You think…. Data are independent Data in each group are independent of each other The two groups are independent of each other Random/Representative Data in each group are selected from a random or are representative of the population 10% Condition Data are no more than 10% of the population Data in each group come from a distribution that is unimodal and symmetric (AKA the nearly normal condition)
Assumptions/Conditions for Paired Data (Mean of Differences) When you see…. Assumptions/Conditions for Paired Data (Mean of Differences)
You think…. Data are independent Differences (usually people) are independent of each other Random/Representative Pairs are selected from a random or are representative of the population 10% Condition Data are no more than 10% of the population Differences histogram is unimodal and symmetric (AKA the nearly normal condition)
Assumptions/Conditions for Chi-Square Goodness-of-Fit Test When you see…. Assumptions/Conditions for Chi-Square Goodness-of-Fit Test
You think…. Data are counts for the categories of a categorical variable Counts in cells are independent of each other Data are either randomly selected or representative of the population of interest Expected counts should be at least 5 individuals in each cell (use null model to get expected counts) Data are less than 10% of the population
When you see…. Assumptions/Conditions for Chi-Square Test for Independence/Homogeneity
You think…. Data are counts for the categories of a categorical variable Counts in cells are independent of each other Data are either randomly selected or representative of the population of interest Expected counts should be at least 5 individuals in each cell (use null model to get expected counts) Data are less than 10% of the population
Assumptions/Conditions for Inferences with Regression When you see…. Assumptions/Conditions for Inferences with Regression
You think…. Linear Enough Independence Normal Population Assumption Check is scatterplot is linear enough If not linear enough, considering re-expressing data to make scatterplot more nearly linear Independence Individual observations are independent of each other Check the 10% condition when sampling without replacement Normal Population Assumption Residuals follow a normal model (make sure residual plot for unimodal and symmetric) Can also check normal probability plot for normality Equal Variance Assumption Check residual plot for patterns NO PATTERN, NO PROBLEM! (But don’t get too crazy with looking for patterns) Random Data come from a well-designed random sample or randomized experiment