STAT 101 Dr. Kari Lock Morgan Exam 2 Review.

Slides:



Advertisements
Similar presentations
Panel at 2013 Joint Mathematics Meetings
Advertisements

STAT 101 Dr. Kari Lock Morgan
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan ANOVA SECTION 8.1 Testing for a difference in means across multiple categories.
Hypothesis Testing I 2/8/12 More on bootstrapping Random chance
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Statistics: Unlocking the Power of Data Lock 5 Inference Using Formulas STAT 101 Dr. Kari Lock Morgan Chapter 6 t-distribution Formulas for standard errors.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Final Review Session.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Stat 217 – Week 10. Outline Exam 2 Lab 7 Questions on Chi-square, ANOVA, Regression  HW 7  Lab 8 Notes for Thursday’s lab Notes for final exam Notes.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Chapter 9 Hypothesis Testing.
Simple Linear Regression and Correlation
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Simple Linear Regression Analysis
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Statistics: Unlocking the Power of Data Lock 5 1 in 8 women (12.5%) of women get breast cancer, so P(breast cancer if female) = in 800 (0.125%)
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Hypotheses STAT 101 Dr. Kari Lock Morgan SECTION 4.1 Statistical test Null and alternative.
Synthesis and Review 3/26/12 Multiple Comparisons Review of Concepts Review of Methods - Prezi Essential Synthesis 3 Professor Kari Lock Morgan Duke University.
Chapter 13: Inference in Regression
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 250 Dr. Kari Lock Morgan Chapter 5 Normal distribution Central limit theorem Normal.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Essential Synthesis SECTION 4.4, 4.5, ES A, ES B
Statistics: Unlocking the Power of Data Lock 5 Synthesis STAT 250 Dr. Kari Lock Morgan SECTIONS 4.4, 4.5 Connecting bootstrapping and randomization (4.4)
Using Lock5 Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Introducing Inference with Bootstrapping and Randomization Kari Lock Morgan Department of Statistical Science, Duke University with.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 2.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Synthesis and Review for Exam 1.
Synthesis and Review for Exam 1
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
CHAPTER 12 More About Regression
Chapter 11: Inference for Distributions of Categorical Data
When You See (This), You Think (That)
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Presentation transcript:

STAT 101 Dr. Kari Lock Morgan Exam 2 Review

Exam Details Wednesday, 4/2 Closed to everything except two double-sided pages of notes and a non-cell phone calculator page of notes should be prepared by you – no sharing Okay to use materials from class for your page of notes Best ways to prepare: #1: WORK LOTS OF PROBLEMS! Make a good page of notes Read sections you are still confused about Come to office hours and clarify confusion Cumulative, but emphasis is on material since Exam 1 (Chapters 5-9, we skipped 8.2 and 9.2)

Practice Problems Practice exam online (under resources) Solutions to odd essential synthesis and review problems online (under resources) Solutions to all odd problems in the book on reserve at Perkins

Office Hours and Help Monday 3 – 4pm: Prof Morgan, Old Chem 216 Monday 4–6pm: Stephanie Sun, Old Chem 211A Tuesday 3–5pm (extra): Prof Morgan, Old Chem 216 Tuesday 5-7pm: Wenjing Shi, Old Chem 211A Tuesday 7-9pm: Mao Hu, Old Chem 211A REVIEW SESSION: 5–6 pm Tuesday, Social Sciences 126

Stat Education Center Reminder: the Stat Education Center in Old Chem 211A is open Sunday – Thurs 4pm – 9pm with stat majors and stat PhD students available to answer questions

Two Options for p-values We have learned two ways of calculating p-values: The only difference is how to create a distribution of the statistic, assuming the null is true: Simulation (Randomization Test): Directly simulate what would happen, just by random chance, if the null were true Formulas and Theoretical Distributions: Use a formula to create a test statistic for which we know the theoretical distribution when the null is true, if sample sizes are large enough

Two Options for Intervals We have learned two ways of calculating intervals: Simulation (Bootstrap): Assess the variability in the statistic by creating many bootstrap statistics Formulas and Theoretical Distributions: Use a formula to calculate the standard error of the statistic, and use the normal or t-distribution to find z* or t*, if sample sizes are large enough

Pros and Cons Simulation Methods PROS: Methods tied directly to concepts, emphasizing conceptual understanding Same procedure for every statistic No formulas or theoretical distributions to learn and distinguish between Minimal math needed CONS: Need entire dataset (if quantitative variables) Need a computer Newer approach

Pros and Cons Formulas and Theoretical Distributions PROS: Only need summary statistics Only need a calculator More commonly used CONS: Plugging numbers into formulas does little for conceptual understanding Many different formulas and distributions to learn and distinguish between Harder to see the big picture when the details are different for each statistic Doesn’t work for small sample sizes Requires more math and background knowledge

Accuracy The accuracy of simulation methods depends on the number of simulations (more simulations = more accurate) The accuracy of formulas and theoretical distributions depends on the sample size (larger sample size = more accurate) If the sample size is large and you have generated many simulations, the two methods should give essentially the same answer

Data Collection Was the explanatory variable randomly assigned? Was the sample randomly selected? Yes No Yes No Possible to generalize to the population Should not generalize to the population Possible to make conclusions about causality Can not make conclusions about causality

Variable(s) Visualization Summary Statistics Categorical bar chart, pie chart frequency table, relative frequency table, proportion Quantitative dotplot, histogram, boxplot mean, median, max, min, standard deviation, z-score, range, IQR, five number summary Categorical vs Categorical side-by-side bar chart, segmented bar chart two-way table, difference in proportions Quantitative vs Categorical side-by-side boxplots statistics by group, difference in means Quantitative vs Quantitative scatterplot correlation, simple linear regression

Confidence Interval A confidence interval for a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples A 95% confidence interval will contain the true parameter for 95% of all samples

Hypothesis Testing How unusual would it be to get results as extreme (or more extreme) than those observed, if the null hypothesis is true? If it would be very unusual, then the null hypothesis is probably not true! If it would not be very unusual, then there is not evidence against the null hypothesis

p-value The p-value is the probability of getting a statistic as extreme (or more extreme) as that observed, just by random chance, if the null hypothesis is true The p-value measures evidence against the null hypothesis

Hypothesis Testing State Hypotheses Calculate a test statistic, based on your sample data Create a distribution of this test statistic, as it would be observed if the null hypothesis were true Use this distribution to measure how extreme your test statistic is

Distribution of the Sample Statistic Sampling distribution: distribution of the statistic based on many samples from the population Bootstrap Distribution: distribution of the statistic based on many samples with replacement from the original sample Randomization Distribution: distribution of the statistic assuming the null hypothesis is true Normal, t,2, F: Theoretical distributions used to approximate the distribution of the statistic

Sample Size Conditions For large sample sizes, either simulation methods or theoretical methods work If sample sizes are too small, only simulation methods can be used

Using Distributions For confidence intervals, you find the desired percentage in the middle of the distribution, then find the corresponding value on the x-axis For p-values, you find the value of the observed statistic on the x-axis, then find the area in the tail(s) of the distribution

Confidence Intervals

Confidence Intervals Return to original scale with

Hypothesis Testing

General Formulas When performing inference for a single parameter (or difference in two parameters), the following formulas are used:

General Formulas For proportions (categorical variables) with only two categories, the normal distribution is used For inference involving any quantitative variable (means, correlation, slope), if categorical variables only have two categories, the t distribution is used

Standard Error The standard error is the standard deviation of the sample statistic The formula for the standard error depends on the type of statistic (which depends on the type of variable(s) being analyzed)

Standard Error Formulas Parameter Distribution Standard Error Proportion Normal Difference in Proportions Mean t, df = n – 1 Difference in Means t, df = min(n1, n2) – 1 Correlation t, df = n – 2

Multiple Categories These formulas do not work for categorical variables with more than two categories, because there are multiple parameters For one or two categorical variables with multiple categories, use 2 tests (goodness of fit for one categorical variable, test for association for two) For testing for a difference in means across multiple groups, use ANOVA

Chi-Square Test for Goodness of Fit State null hypothesized proportions for each category, pi. Alternative is that at least one of the proportions is different than specified in the null. Calculate the expected counts for each cell as npi . Make sure they are all greater than 5 to proceed. Calculate the 2 statistic: Compute the p-value as the area in the tail above the 2 statistic, for a 2 distribution with df = (# of categories – 1) Interpret the p-value in context.

Chi-Square Test for Association H0 : The two variables are not associated Ha : The two variables are associated Calculate the expected counts for each cell: Make sure they are all greater than 5 to proceed. Calculate the 2 statistic: Compute the p-value as the area in the tail above the 2 statistic, for a 2 distribution with df = (r – 1)  (c – 1) Interpret the p-value in context.

Analysis of Variance Analysis of Variance (ANOVA) compares the variability between groups to the variability within groups Total Variability Variability Between Groups Variability Within Groups

ANOVA Table Source Groups Error Total df k-1 n-k n-1 Sum of Squares SSG SSE SST Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) F Statistic MSG MSE p-value Use Fk-1,n-k

Simple Linear Regression Simple linear regression estimates the population model with the sample model:

Inference for the Slope Confidence intervals and hypothesis tests for the slope can be done using the familiar formulas: Population Parameter: 1, Sample Statistic: 𝛽 1 Use t-distribution with n – 2 degrees of freedom

Intervals A confidence interval has a given chance of capturing the mean y value at a specified x value (the point on the line) A prediction interval has a given chance of capturing the y value for a particular case at a specified x value (the actual point)

Conditions for SLR Inference based on the simple linear model is only valid if the following conditions hold: Linearity Constant Variability of Residuals Normality of Residuals

Inference Methods http://prezi.com/c1xz1on-p4eb/stat-101/