Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.

Slides:



Advertisements
Similar presentations
Simple Linear Regression Conditions Confidence intervals Prediction intervals Section 9.1, 9.2, 9.3 Professor Kari Lock Morgan Duke University.
Advertisements

Chapter 12 Inference for Linear Regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Objectives (BPS chapter 24)
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/25/12 Sections , Single Mean t-distribution (6.4) Intervals.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)
Simple Linear Regression Analysis
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Simple Linear Regression Analysis
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Statistics: Unlocking the Power of Data Lock 5 Inference for Proportions STAT 250 Dr. Kari Lock Morgan Chapter 6.1, 6.2, 6.3, 6.7, 6.8, 6.9 Formulas for.
Correlation & Regression
Active Learning Lecture Slides
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Inference for Quantitative Variables 3/12/12 Single Mean, µ t-distribution Intervals and tests Difference in means, µ 1 – µ 2 Distribution Matched pairs.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Online social network size is reflected in human brain structure R. Kanai, B. Bahrami, R. Roylance and G. Rees published online 19 October 2011 Proceedings.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
CHAPTER 12 More About Regression
Inference for Least Squares Lines
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Inference for Regression
STAT 250 Dr. Kari Lock Morgan
CHAPTER 12 More About Regression
CHAPTER 26: Inference for Regression
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Inferences for Regression
Inference for Regression
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for slope Conditions for inference

Statistics: Unlocking the Power of Data Lock 5 Question of the Day Is the size of certain regions of your brain associated with the size of your social network?

Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain Data from 40 students at City College London How to measure brain size? How to measure social network size? Source: R. Kanai, B. Bahrami, R. Roylance and G. Ree (2011). Online social network size is reflected in human brain structure, Proceedings of the Royal Society B: Biological Sciences. 10/19/11.Online social network size is reflected in human brain structure

Statistics: Unlocking the Power of Data Lock 5 Measuring Brain Size Structural Magnetic Resonance Imaging (MRI) Voxel-based morphometry (VBM) to compute regional grey matter volume based on T1-weighted anatomical MRI scans Brain regions found significant in initial study  Amygdala (emotion and emotional memory)  Middle temporal gyrus (social perception)  Entorhinal cortex (memory and navigation)  Superior temporal sulcus (perception of others) Response: normalized z-score of grey matter density for these brain regions

Statistics: Unlocking the Power of Data Lock 5 Brain Regions Image from Do our Brains Determine our Facebook Friend Count? ( our Brains Determine our Facebook Friend Count?

Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain How to measure size of social network?  How many were present at your 18th or 21st birthday party?  If you were going to have a party now, how many people would you invite?  What is the total number of friends in your phonebook?  Write down the names of the people to whom you would send a text message marking a celebratory event. How many people is that?  Write down the names of people in your phonebook you would meet for a chat in a small group (one to three people). How many people is that?  How many friends have you kept from school and university whom you could have a friendly conversation with now?  How many friends do you have on ‘Facebook’?  How many friends do you have from outside school or university?  Write down the names of the people of whom you feel you could ask a favor and expect to have it granted. How many people is that?

Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain r = Is the association significant?

Statistics: Unlocking the Power of Data Lock 5 ParameterDistributionStandard Error Proportion Normal Difference in Proportions Normal Meant, df = n – 1 Difference in Meanst, df = min(n 1, n 2 ) – 1 Correlationt, df = n – 2 Standard Error Formulas

Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain Is the grey matter volume of these regions of the brain significantly correlated with number of Facebook friends? From n = 40 people, we find r =.436. Is this significant?

Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain This provides strong evidence that the grey matter density of these regions of the brain and number of Facebook friends are positively correlated. 1. State hypotheses: 2. Check conditions: 3. Calculate test statistic: 4. Compute p-value: 5. Interpret in context: p-value =

Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain Should you go out and add more Facebook friends to increase the size of your brain? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 R2R2 R 2 is the proportion of the variability in the response variable, Y, that is explained by the explanatory variable, X For simple linear regression, R 2 = r 2 (R 2 is just the sample correlation squared)

Statistics: Unlocking the Power of Data Lock 5 R2R2 How much does the variability in Y decrease if you know X?

Statistics: Unlocking the Power of Data Lock 5 Regression in Minitab Stat -> Regression -> Fitted Line Plot = 0.19

Statistics: Unlocking the Power of Data Lock 5 Prediction Should you use this equation to predict the normalized size of these regions of your brain? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Sample to Population Everything we have done so far with regression is based solely on sample data Now, we will extend from the sample to the population Statistical inference!

Statistics: Unlocking the Power of Data Lock 5 Intercept Slope Simple Linear Model Random error

Statistics: Unlocking the Power of Data Lock 5 Inference for the Slope Test for whether the slope is significantly different from 0 (whether there is any linear relationship between x and y): Confidence interval for the true slope

Statistics: Unlocking the Power of Data Lock 5 Inference for the Slope

Statistics: Unlocking the Power of Data Lock 5 Regression in Minitab Stat -> Regression -> Regression -> Fit Regression Model

Statistics: Unlocking the Power of Data Lock 5 Inference for Slope Is the slope significantly different from 0? (a) Yes (b) No n = 40 Give a 95% confidence interval for the true slope.

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test

Statistics: Unlocking the Power of Data Lock 5 Regression in Minitab Stat -> Regression -> Regression -> Fit Regression Model

Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! They are equivalent ways of testing for a linear association between two quantitative variables.

Statistics: Unlocking the Power of Data Lock 5 Confidence Interval

Statistics: Unlocking the Power of Data Lock 5 Multiple Testing?

Statistics: Unlocking the Power of Data Lock 5 False Positive (Type I Error) Protection To further protect against Type I errors, they performed two independent analysis on two separate samples (n = 125, then n = 40)

Statistics: Unlocking the Power of Data Lock 5 Real-World Network Size What about real-world network size?

Statistics: Unlocking the Power of Data Lock 5 Inference based on the simple linear model is only valid if the following conditions hold: 1)Linearity 2)Constant Variability of Residuals 3)Normality of Residuals 4)Independence Conditions

Statistics: Unlocking the Power of Data Lock 5 The relationship between x and y is linear (it makes sense to draw a line through the scatterplot) Linearity

Statistics: Unlocking the Power of Data Lock 5 Dog Years 1 dog year = 7 human years Linear: human age = 7×dog age Charlie From “The old rule-of-thumb that one dog year equals seven years of a human life is not accurate. The ratio is higher with youth and decreases a bit as the dog ages.” LINEAR ACTUAL A linear model can still be useful, even if it doesn’t perfectly fit the data.

Statistics: Unlocking the Power of Data Lock 5 “All models are wrong, but some are useful” -George Box

Statistics: Unlocking the Power of Data Lock 5 Residuals (errors) The errors are normally distributed The average of the errors is 0 The standard deviation of the errors is constant for all cases Conditions for residuals: Check with a histogram (Always true for least squares regression) Constant spread of points around the line

Statistics: Unlocking the Power of Data Lock 5 Regression in Minitab Is the association approximately linear? a)Yes b)No Is the spread of the points around the line approximately constant? a)Yes b)No

Statistics: Unlocking the Power of Data Lock 5 Histogram of Residuals Are the residuals approximately normally distributed? a)Yes b)No

Statistics: Unlocking the Power of Data Lock 5 Non-Constant Variability

Statistics: Unlocking the Power of Data Lock 5 Non-Normal Residuals

Statistics: Unlocking the Power of Data Lock 5 Cases must be independent of each other (one case’s values does not affect another case’s values) Most common violation of this: data over time What would make the independence condition satisfied or violated in the social network and brain size data? Independence

Statistics: Unlocking the Power of Data Lock 5 If the association isn’t linear: don’t use simple linear regression If variability is not constant, residuals are not normal, or cases not independent: The model itself is still valid, but inference may not be accurate If you want to do something more fancy so the conditions are met… take STAT 462! Conditions not Met?

Statistics: Unlocking the Power of Data Lock 5 1) Plot your data! Association approximately linear? Outliers? Constant variability? 2) Fit the model (least squares) 3) Use the model Interpret coefficients Make predictions 4) Look at histogram of residuals (normal?) 5) Inference (extend to population) Inference on slope (interval and test) Simple Linear Regression

Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 9.1 Do HW 9.1 (due Friday, 12/4)