Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

STAT 101 Dr. Kari Lock Morgan
Hypothesis Testing: Intervals and Tests
Sampling: Final and Initial Sample Size Determination
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Describing Data: One Variable
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Describing Data: One Quantitative Variable
Stat 301 – Day 15 Comparing Groups. Statistical Inference Making statements about the “world” based on observing a sample of data, with an indication.
Monday, 4/29/02, Slide #1 MA 102 Statistical Controversies Monday, 4/29/02 Today: CLOSING CEREMONIES!  Discuss HW #3  Review for final exam  Evaluations.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Stat 301 – Day 9 Fisher’s Exact Test Quantitative Variables.
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Statistics: Unlocking the Power of Data Lock 5 1 in 8 women (12.5%) of women get breast cancer, so P(breast cancer if female) = in 800 (0.125%)
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
STAT 250 Dr. Kari Lock Morgan
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two.
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
Estimation: Sampling Distribution
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/18/12 Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/11/12 Describing Data: Two Variables SECTIONS 2.1, 2.4, 2.5 Two categorical.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Confidence Intervals: Bootstrap Distribution
Determination of Sample Size: A Review of Statistical Theory
Statistics: Unlocking the Power of Data Lock 5 Bootstrap Intervals Dr. Kari Lock Morgan PSU /12/14.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
Describing Data: Two Variables
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Estimation: Confidence Intervals SECTION 3.2 Confidence Intervals (3.2)
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Midterm Review IN CLASS. Chapter 1: The Art and Science of Data 1.Recognize individuals and variables in a statistical study. 2.Distinguish between categorical.
Howard Community College
Review.
Describing Data: Two Variables
Synthesis and Review for Exam 1
Statistics 200 Objectives:
Confidence Intervals: Sampling Distribution
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introduction to Inference
Inferences and Conclusions from Data
Stat 217 – Day 28 Review Stat 217.
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introductory Statistics
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1

Statistics: Unlocking the Power of Data Lock 5 DISCLAIMER This review session will NOT cover everything you need to know for the exam; it will just hit on the main points

Statistics: Unlocking the Power of Data Lock 5 The Big Picture Population Sample Sampling Statistical Inference Interval estimation Descriptive statistics

Statistics: Unlocking the Power of Data Lock 5 Sampling Population Sample GOAL: Select a sample that is similar to the population HOW? RANDOM SAMPLE!

Statistics: Unlocking the Power of Data Lock 5 Observational Studies A third variable that is associated with both the explanatory variable and the response variable is called a confounding variable There are almost always confounding variables in observational studies Observational studies can almost never be used to establish causation

Statistics: Unlocking the Power of Data Lock 5 Randomized Experiments In a randomized experiment the explanatory variable for each unit is determined randomly, before the response variable is measured Because the explanatory variable is randomly assigned, it is not associated with any other variables. Confounding variables are eliminated!!! Randomized experiments make it possible to infer causation!

Statistics: Unlocking the Power of Data Lock 5 Was the sample randomly selected? Possible to generalize to the population Yes Should not generalize to the population No Was the explanatory variable randomly assigned? Possible to make conclusions about causality Yes Can not make conclusions about causality No Chapter 1: Data Collection

Statistics: Unlocking the Power of Data Lock 5 Chapter 2: Descriptive Statistics In order to make sense of data, we need ways to summarize and visualize it Summarizing and visualizing variables and relationships between two variables is often known as descriptive statistics (also known as exploratory data analysis) Type of summary statistics and visualization methods depend on the type of variable(s) being analyzed (categorical or quantitative)

Statistics: Unlocking the Power of Data Lock 5 Variable(s)VisualizationSummary Statistics Categoricalbar chart, pie chart frequency table, relative frequency table, proportion Quantitativedotplot, histogram, boxplot mean, median, max, min, standard deviation, z-score, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart two-way table, difference in proportions, odds ratio Quantitative vs Categorical side-by-side boxplotsstatistics by group, difference in means Quantitative vs Quantitative scatterplotcorrelation

Statistics: Unlocking the Power of Data Lock 5 Chapter 3: Confidence Intervals A confidence interval for a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples A 95% confidence interval will contain the true parameter for 95% of all samples

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals The parameter is fixed The statistic is random (depends on the sample) The interval is random (depends on the statistic) 95% of 95% confidence intervals will capture the truth

Statistics: Unlocking the Power of Data Lock 5 Confidence Intervals Sample Bootstrap Sample... Calculate statistic for each bootstrap sample Bootstrap Distribution Standard Error (SE): standard deviation of bootstrap distribution Margin of Error (ME) (95% CI: ME = 2×SE) statistic ± ME Bootstrap Sample

Statistics: Unlocking the Power of Data Lock 5 Percentile Method

Statistics: Unlocking the Power of Data Lock 5 Question of the Day What happens when you switch to organic food?

Statistics: Unlocking the Power of Data Lock 5 The Organic Effect The Organic Effect: What Happens When you Switch to Organic FoodThe Organic Effect: What Happens When you Switch to Organic Food. 9/15/15.

Statistics: Unlocking the Power of Data Lock 5 The Organic Effect A study took a Swedish family who ate conventionally (non-organic) and had them eat only organic food for 2 weeks The resulting video publicizing the study has had over 30 million views between Youtube, Facebook, and other sites Here we look at data from the original study Magner, J., Wallberg, P., Sandberg, J., and Cousins, A.P. (2015). Human exposure to pesticides from food: A pilot study. IVL Swedish Environmental Research Institute. January Human exposure to pesticides from food: A pilot study

Statistics: Unlocking the Power of Data Lock 5 Data Measurements on pesticide levels taken the week before switching to organic (while on regular non-organic diet) Ate only organic for two weeks Measurements on pesticide levels taken again the second of those two weeks Explanatory variable: before or after switching to organic food Response variables:  Pesticide detected or not  Pesticide concentration (μg / g crt)

Statistics: Unlocking the Power of Data Lock 5 Generalize? Can we generalize to a population of all people? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Study Design Is this an experiment or an observational study? a) Experiment b) Observational study

Statistics: Unlocking the Power of Data Lock 5 Causality? Can we make conclusions about causality? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Variables We first want to analyze detection of pesticides (pesticide detected or not) before and after eating organic. This is a question involving a) One categorical variable b) Two categorical variables c) One quantitative variable d) Two quantitative variables e) One quantitative and one categorical variable

Statistics: Unlocking the Power of Data Lock 5 Visualization In analyzing detection of pesticides (pesticide detected or not) before and after eating organic, how might we visualize this data? a) Bar chart b) Histogram c) Side-by-side boxplots d) Segmented bar charts e) Scatterplot

Statistics: Unlocking the Power of Data Lock 5 Detection of Pesticides There were 12 pesticides measured 20 times each, before and after eating organic, yielding 240 measurements in each group. Before eating organic, 111 measurements detected pesticide. After eating organic, only 24 measurements detected pesticide. Create a two-way table summarizing this data.

Statistics: Unlocking the Power of Data Lock 5 Detection of Pesticides

Statistics: Unlocking the Power of Data Lock 5 Parameter and Statistic In analyzing detection of pesticides (pesticide detected or not) before and after eating organic, what would be an appropriate parameter and statistic for inference? a) Single proportion b) Difference in proportions c) Single mean d) Difference in means e) Correlation

Statistics: Unlocking the Power of Data Lock 5 Detection of Pesticides p 1 – p 2 : Proportion detecting pesticides before eating organic – proportion detecting pesticides after eating organic. Calculate the relevant statistic. DetectedNot DetectedTOTAL Before eating organic After eating organic TOTAL

Statistics: Unlocking the Power of Data Lock 5 Bootstrap Distribution This is a bootstrap distribution based on 1000 simulations. Approximate a 99% confidence interval. a) (0.33, 0.4) b) (0.3, 0.42) c) (0.27, 0.45) d) (0.25, 0.5)

Statistics: Unlocking the Power of Data Lock 5 Interpretation Interpret the confidence interval in context.

Statistics: Unlocking the Power of Data Lock 5 Variables We’ll next analyze the response variable concentration of pesticide (measured in μg / g creatinine) to see if it differs before and after eating organic. This is a question involving a) One categorical variable b) Two categorical variables c) One quantitative variable d) Two quantitative variables e) One quantitative and one categorical variable

Statistics: Unlocking the Power of Data Lock 5 Visualization In analyzing concentration of pesticides before and after eating organic, how might we visualize this data? a) Bar chart b) Histogram c) Side-by-side boxplots d) Segmented bar charts e) Scatterplot

Statistics: Unlocking the Power of Data Lock 5 Pesticides

Statistics: Unlocking the Power of Data Lock 5

Parameter and Statistic In analyzing pesticide concentration (for any one pesticide) before and after eating organic, what would be an appropriate parameter and statistic for inference? a) Single proportion b) Difference in proportions c) Single mean d) Difference in means e) Correlation

Statistics: Unlocking the Power of Data Lock 5 Difference in Means Pesticide 24D PBA DC CCC ETU Mepiquat Propamocarb TCP Which one do you want a confidence interval for?

Statistics: Unlocking the Power of Data Lock 5 95% Confidence Interval Use bootstrap distribution from StatKey to create a 95% confidence interval using the standard error method. Interpret the confidence interval in context.

Statistics: Unlocking the Power of Data Lock 5 Variables Actually, this data is paired (before and after measurements on the same people), so it makes sense to look at the differences in pesticide concentration, before – after. This is a question involving a) One categorical variable b) Two categorical variables c) One quantitative variable d) Two quantitative variables e) One quantitative and one categorical variable

Statistics: Unlocking the Power of Data Lock 5 Visualization In analyzing the differences in concentration of pesticides before and after eating organic, how might we visualize this data? a) Bar chart b) Histogram c) Side-by-side boxplots d) Segmented bar charts e) Scatterplot

Statistics: Unlocking the Power of Data Lock 5 Histograms

Statistics: Unlocking the Power of Data Lock 5 Parameter and Statistic In analyzing the differences in concentration of each pesticide before and after eating organic, what would be an appropriate parameter and statistic for inference? a) Single proportion b) Difference in proportions c) Single mean d) Difference in means e) Correlation

Statistics: Unlocking the Power of Data Lock 5 3-PBA Insecticide found in grains, fruits, vegetables Difference in 3-PBA concentration before and after:

Statistics: Unlocking the Power of Data Lock 5 Bootstrap for 3-PBA Based on this bootstrap distribution with 1000 bootstrap statistics, the standard error is closest to… a) 1 b) 4 c) 7 d) 10

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval Using the statistic and standard error, calculate a 95% confidence interval and interpret in context.

Statistics: Unlocking the Power of Data Lock 5 EXAM 1 In-class on Friday, 10/9 Covers everything we have done in class or lab so far (Chapters 1 – 3 in book, except not 2.6) Bring a non-cell phone calculator and a one- sided page of notes (8 ½ x 11 paper) Optional review problems posted on WileyPlus (doing lots of problems is the best way to study!)

Statistics: Unlocking the Power of Data Lock 5 Exam Topics: Ch 1-3 Chapter 1: Data Collection  Data (cases, variables, etc.)  Sampling  Observational studies and confounding  Randomized experiments Chapter 2: Descriptive Statistics (except not 2.6)  Summary statistics for one or two variable(s)  Graphical displays for one or two variable(s)  Conditional probability Chapter 3: Confidence Intervals  Sampling distributions  Confidence intervals  Margin of error  Standard error and 95% confidence intervals  Bootstrapping  Percentile method for any level of confidence