Statistics for Social and Behavioral Sciences Session #6: The Regression Line C’ted (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad.

Slides:



Advertisements
Similar presentations
AP STATS: Warm-Up Do Math SAT scores help to predict Verbal SAT scores. Make a scatter plot. Find the least squares regression and r and r-squared. Also.
Advertisements

Statistics for Social and Behavioral Sciences Session #16: Confidence Interval and Hypothesis Testing (Agresti and Finlay, from Chapter 5 to Chapter 6)
Statistics for Social and Behavioral Sciences Part IV: Causality Randomized Experiments, ANOVA Chapter 12, Section 12.1 Prof. Amine Ouazad.
Correlation and regression Dr. Ghada Abo-Zaid
Chapter 4 The Relation between Two Variables
Chapter 3 Bivariate Data
Statistics for Social and Behavioral Sciences Session #11: Random Variable, Expectations (Agresti and Finlay, Chapter 4) Prof. Amine Ouazad.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Statistics for Social and Behavioral Sciences Session #9: Linear Regression and Conditional distribution Probabilities (Agresti and Finlay, Chapter 9)
Correlation Chapter 9.
Describing the Relation Between Two Variables
Statistics for the Social Sciences
Chapter 10 Relationships between variables
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
The Simple Regression Model
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
SIMPLE LINEAR REGRESSION
Business Statistics - QBM117 Least squares regression.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Correlation and Regression Analysis
Statistics for Social and Behavioral Sciences Part IV: Causality Association and Causality Session 22 Prof. Amine Ouazad.
Introduction to simple linear regression ASW, Economics 224 – Notes for November 5, 2008.
Midterm 1 Well done !! Mean 80.23% Median 84.6% Standard deviation of ppt. 5 th percentile is 53.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Statistics for Social and Behavioral Sciences Session #5: The Regression Line (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Session #15: Interval Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Session #17: Hypothesis Testing: The Confidence Interval Method and the T-Statistic Method (Agresti and Finlay,
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Statistics for Social and Behavioral Sciences Session #14: Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Correlation & Regression
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression R squared, F test, Chapter 11 Prof. Amine Ouazad.
Statistics for Social and Behavioral Sciences Part IV: Causality Inference for Slope and Correlation Section 9.5 Prof. Amine Ouazad.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Stat 13, Tue 5/29/ Drawing the reg. line. 2. Making predictions. 3. Interpreting b and r. 4. RMS residual. 5. r Residual plots. Final exam.
Statistics for Social and Behavioral Sciences Part IV: Causality Comparison of two groups Chapter 7 Prof. Amine Ouazad.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Linear Regression Day 1 – (pg )
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Statistics for Social and Behavioral Sciences Session #19: Estimation and Hypothesis Testing, Wrap-up & p-value (Agresti and Finlay, from Chapter 5 to.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Regression Chapter 5 January 24 – Part II.
The simple linear regression model and parameter estimation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
CHAPTER 29: Multiple Regression*
Regression Models - Introduction
Lecture Notes The Relation between Two Variables Q Q
Correlation and Simple Linear Regression
Objectives (IPS Chapter 2.3)
Simple Linear Regression and Correlation
Regression Models - Introduction
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Statistics for Social and Behavioral Sciences Session #6: The Regression Line C’ted (Agresti and Finlay, Chapter 9) Prof. Amine Ouazad

Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks This is where we talk about Zmapp and Ebola! Firenze or Lebanese Express? Where we are right now! Describing associations between two variables

Last Session From a scatter plot to a linear relationship – A linear relationship is a model, imperfect. – A linear relationship implies constant gradients. – A linear relationship helps predict/extrapolate, interpolate to fill missing statistics. Finding the regression line – The regression line minimizes the sum of squared errors. – The formula for a and b are essential to learn.

Outline 1.The Regression Line (C’ted) – Last time’s recap – Why we call it regression 2.Warning: Correlation is not causation – Spurious relationships – Being agnostic about causality: correlation 3.How well does the linear model perform? Next session:Bivariate analysis Chapter 9 of A&F, continued

Finding the regression line Which line is the right one? A line is entirely determined by the choice of  and . An essential formula. Notice the difference between b and , between a and . x is the explanatory variable y is the response variable If y increases when x increases, then b>0 If y decreases when x increases, then b<0

Why do we call this regression? “Regression towards mediocrity in Hereditary Stature”, Sir Francis Galton, What are y,x,b here? Sir F. Galton

Understanding Galton: Questions A little exercise to understand Sir Francis: 1.What is the data? How many observations? What is y? What is x? 2.Write the assumed linear relationship between y and x. 3.Can you express the mean of y? (as a function of the mean of x) 4.Take the difference between child i’s height and children’s mean height. 5.How does it relate to the difference between child i’s parents’ midheight and the the mean of parents’ midheight? I use mean and average interchangeably in this course. Same formula Sir F. Galton

Outline 1.The Regression Line (C’ted) – Last time’s recap – Why we call it regression 2.Warning: Correlation is not causation – Spurious relationships – Being agnostic about causality: correlation 3.How well does the linear model perform? Next session:Bivariate analysis Chapter 9 of A&F, continued

“More than a fifth of people on unemployment benefits have a criminal record, government figures have revealed. The new data showed an estimated 22 per cent of all people claiming out of work claimants - such as Jobseeker’s Allowance - were made by people who had been to prison or convicted of an offence in the previous 12 years.” Chris Grayling, the Justice Secretary, is pushing through reforms which aim to provide more support to offenders who are released from jail back into the community. Jeremy Wright, the justice minister, said: “We are committed to delivering long-needed changes that will see all offenders released from prison receive targeted support to finally turn themselves around and start contributing to society.”

Unemployment and Crime “The figures also showed 44 per cent of offenders were claiming benefits a month after being convicted, cautioned or released from jail.” “More than half of offenders - 54 per cent - released from prison were claiming out-of-work benefits one month later, gradually decreasing to 42 per cent two years after.” “In all, 214,000 people claiming out-of-work benefits had been to prison at least once in the previous 12 years, or 4 per cent of the total.” “Previous data published in 2011 estimated the proportion of criminal claimants was slightly higher, at 26 per cent, but a Ministry of Justice spokesman said the sets of figures were not directly comparable.” Chris Grayling, Justice Secretary (UK)

Association is not causation What Drives Obesity? Is higher obesity due to the rise in driving? Perhaps. It’s an intriguing hypothesis. But our friends at The Economist should know better than to report nonsensical correlations. Here’s the evidence they cite (drawn from this entirely unconvincing research paper published in Transport Policy): Looks impressive, right? (Well, apart from putting the explanatory variable on the vertical axis.) But before concluding that there’s anything here, let’s try a different variable, instead—my age:

Reading is an important skill, and elementary school teachers have observed that the reading ability of their students tends to increase with their shoe size. To help boost reading skills, should policymakers offer prizes to scientists to devise methods to increase the shoe size of elementary school children? Obviously, the tendency for shoe size and reading ability to increase together does not mean that big feet cause improvements in reading skills. Older children have bigger feet, but they also have more developed brains. This natural development of children explains the simple observation that shoe size and reading ability have a tendency to increase together—that is, they are positively correlated. But clearly there is no relationship: bigger shoe size does not cause better reading ability. In economics, correlations are common. But identifying whether the correlation between two or more variables represents a causal relationship is rarely so easy. Countries that trade more with the rest of the world also have higher income levels—but does this mean that trade raises income levels? People with more education tend to have higher earnings, but does this imply that education results in higher earnings? Knowing precise answers to these questions is important. If additional years of schooling caused higher earnings, then policymakers could reduce poverty by providing more funding for education. If an extra year of education resulted in a $20,000 a year increase in earnings, then the benefits of spending on education would be a lot larger than if an extra year of education caused only a $2 a year increase. Economists need statisticians

Association is not causation The response variable may be the explanatory variable and vice verse (reverse causation). There may be other factors that affect the response variable, other than the explanatory variable. ☞ Multivariate statistics coming up in week 12. Univariate statistics Inspecting the distribution of one variable. Am I taller than the average? Than the median? What percentile of the distribution do I belong to? Bivariate statistics Discovering associations between 2 variables. What is the relationship between parents’ height and children’s height? What is the relationship between unemployment and crime? Multivariate statistics Uncovering causality: looking at the impact of multiple explanatory variables on one response variable What factors cause crime? Poverty, unemployment, guns, police headcounts? Weeks 1 and 2 Now and next week Week 12

The correlation of two variables The correlation of two variables is: The correlation does not make an assumption about the direction of causality (The slope does) It is, however, related to the slope: Slope Correlation Standard dev. of x Standard dev. of y A sum of N observations: fortunately a computer will usually do it (Stata)

An Example: Unemployment and Murders – The Sequel Standard deviation of Unemployed Persons: 5, Standard deviation of Murders:20.44 Regression line: we find b = and a = The correlation r(Unemployed, Murders) is: Self-check?

Properties of the correlation The correlation is a number between -1 and 1, sometimes (but rarely) expressed as a percentage. If two variables have a correlation of 1, we say that they are perfectly correlated… – Example: student expenses in USD are perfectly correlated with student expenses in AED. – y is exactly a+b x, with b>0. If two variables have a correlation of -1, the two variables are exactly such that y = a + b x, with b<0. – Example: Number of days to New Year’s eve, Number of days from New Year’s eve.

Outline 1.The Regression Line (C’ted) – Last time’s recap – Why we call it regression 2.Warning: Correlation is not causation – Spurious relationships – Being agnostic about causality: correlation 3.How well does the linear model perform? Next session:Bivariate analysis Chapter 9 of A&F, continued

How “good” are our predictions? The regression line. y x Aouch: we make errors. The actual y i And the predicted y i, noted: The regression line minimizes the sum of the squared errors: Remember the formula for b and a. When does a model predict y perfectly? When does the model have no predictive power?

Playing with the R Squared The R Squared is : Answers the question(s): – “What fraction of the variance of the response variable is explained by the explanatory variable?” – “What percentage of the variance of the response variable is explained by the explanatory variable?” Measures the fit of the linear model. The R squared is also the square of the correlation between x and y ! R 2 =r 2

An Example: Unemployment and Murders – The Sequel

The variance of the predicted number of murders is: The variance of the actual number of murders : The R Squared is: Not bad !! Side question: what is the variance of the errors (residuals)? Follow my lead, it’s easierRemember: variance(y) = variance(prediction) + variance(error)

Wrap up Finding the regression line (Sir Galton) – The regression line minimizes the sum of squared errors. – The formulas for a and b are essential. Association is not causation – Does x cause y or does y cause x? – Is there any other factor that may cause y? – Being agnostic about the direction of causality: the correlation r. How good are my predictions? How good is my model? – Use the R Squared, know its formula. – The variance is the square of the standard deviation.

Next session: Minority Report continues Don’t forget: Midterm 1 coming up in week 5 (exact date coming soon from the Registrar Mary Downes). Online Quiz #3 starting tonight at 9pm, due Tuesday at 9am. Sunday recitation on: “The Regression Line: ‘Education and Economic Growth.’” In chapter 9, read everything except Section 9.5 (Inferences for the Slope) For help: Amine Ouazad Office 1135, Social Science building Office hour: Wednesday from 4 to 5pm. GAF: Irene Paneda Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.