Correlation and Regression History 5011. Sir Francis Galton 1822-1911 Geographer, meteorologist, tropical explorer, inventor of fingerprint identification,

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Correlation and regression Dr. Ghada Abo-Zaid
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation.
Chapter 3 Bivariate Data
Scatter Diagrams and Linear Correlation
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
LSRL Least Squares Regression Line
Elementary Statistics Larson Farber 9 Correlation and Regression.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
2.1Scatterplots A scatterplot displays two quantitative variables for a set of data.
Characteristics of Scatterplots
Regression and Correlation
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Lecture 24: Thurs., April 8th
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Multiple Linear Regression
Ch 2 and 9.1 Relationships Between 2 Variables
Lecture 17: Correlations – Describing Relationships Between Two Variables 2011, 11, 22.
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
Regression and Correlation
Correlation and Linear Regression
Correlation and Regression
Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data.
Linear Regression and Correlation
Simple Linear Regression
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Bivariate Distributions Overview. I. Exploring Data Describing patterns and departures from patterns (20%-30%) Exploring analysis of data makes use of.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Correlation and Regression PS397 Testing and Measurement January 16, 2007 Thanh-Thanh Tieu.
1 Examining Relationships in Data William P. Wattles, Ph.D. Francis Marion University.
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Correlation & Regression
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
April 1 st, Bellringer-April 1 st, 2015 Video Link Worksheet Link
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Least Squares Regression Lines Text: Chapter 3.3 Unit 4: Notes page 58.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Chapter 3 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Chapter 5 LSRL. Bivariate data x – variable: is the independent or explanatory variable y- variable: is the dependent or response variable Use x to predict.
Unit 4 LSRL.
LSRL.
Least Squares Regression Line.
Chapter 5 LSRL.
Chapter 4 Correlation.
Chapter 3.2 LSRL.
Least Squares Regression Line LSRL Chapter 7-continued
Chapter 5 LSRL.
Chapter 5 LSRL.
Chapter 5 LSRL.
Presentation transcript:

Correlation and Regression History 5011

Sir Francis Galton Geographer, meteorologist, tropical explorer, inventor of fingerprint identification, eugenicist, half- cousin of Charles Darwin and best- selling author.

Galton Obsessed with measurement Tried to measure everything from the weather to female beauty Invented correlation and regression

Sweet Peas Galton’s experiment with sweet peas (1875) led to the development of initial concepts of linear regression. Sweet peas could self-fertilize: “daughter plants express genetic variations from mother plants without contribution from a second parent.”

Sweet Peas (2) Distributed packets of seeds to 7 friends Uniformly distributed sizes, split into 7 size groups with 10 seeds per size. There was substantial variation among packets. 7 sizes 10 seeds per size 7 friends = 490 seeds Friends were to harvest seeds from the new generation of friends and return them to Galton.

Sweet Peas (3) Plotted the weights of daughter seeds against weights of mother seeds. Hand fitted a line to the data Slope of the line connecting the means of different columns is equivalent to regression slope.

Sweet Pea Scatterplot

Sweet Pea Scatterplot with Regression Line

Slope indicates strength of association Galton drew his line by hand, and estimated that for every thousandth of an inch of increased size of parents, the daughters size was affected by.33 thousandths.

Things Galton Noticed Mother plants of a given seed size tended to have daughter seeds of pretty similar sizes Extremely large mother seeds grew into plants whose daughter seeds were generally not so large Extremely small mother seeds grew into plants whose daughter seeds were generally not so small Galton put a name on the loss of extremity: “Regression to the mean”

Karl Pearson ( ) Formalized Galton’s method Invented least squares method of determining regression line Pearson with Galton, c. 1900

Least squares idea: Choose the line that minimizes the sum of the squares of the deviations of each observation from the regression line

Scatterplots Example 1 Example 2 Example 3 Example 4 Example 5

Correlations: Positive Large values of X associated with large values of Y, small values of X associated with small values of Y. e.g. IQ and SAT Large values of X associated with small values of Y & vice versa e.g. SPEED and ACCURACY Negative

Correlation does not imply causality  Two variables might be associated because they share a common cause.  For example, SAT scores and College Grade are highly associated, but probably not because scoring well on the SAT causes a student to get high grades in college.  Being a good student, etc., would be the common cause of the SATs and the grades.

Intervening and confounding factors There is a positive correlation between ice cream sales and drownings. There is a strong positive association between Number of Years of Education and Annual Income  In part, getting more education allows people to get better, higher-paying jobs.  But these variables are confounded with others, such as socio-economic status

Regression and correlation will not capture nonlinear relationships

SPEED (KM/H) MILEAGE* * Mileage: liters / 100 km FUEL CONSUMPTION & SPEED Speed And Mileage Have A Curved Relationship. Using r would be inappropriate r =

Bivariate regression equation Y=bx+c b=slope c=intercept

Adding another variable

Multiple regression equation Y=b 1 x 1 +b 2 x 2 + c b 1 =slope of 1 st variable X 1 =1 st variable B 2 =slope of 2 nd variable X 2 =second variable c=intercept

Another view

Dummy variables All examples have been continuous variables, but most historical data is categorical We can recode categorical variables into dummy variables Race becomes codes white (0,1) black (0,1) Asian (0,1) One category must be omitted

Dummy variable scatter plot (FEV is forced expiratory volume)

Bivariate regression results ß Variable (coefficient F-test SMOKE Y-Intercept The slope coefficient of 0.71 for SMOKE and FEV suggests hat FEV increases 0.71 units (on the average) as we go from SMOKE = 0 (non- smoker) to SMOKE = 1 (smoker). This is surprisingly given what we know about smoking. How can a positive relation between SMOKE and FEV exist given what we know about the physiological effects of smoking? The answer lies in understanding the confounding effects of AGE on SMOKE and FEV. In this child and adolescent population, nonsmokers are younger than nonsmokers (mean ages: 9.5 years vs years, respectively). It is therefore not surprising that AGE confounds the relation between SMOKE and FEV. So what are we to do?

Dummy variable scatterplot (3-D)

Multiple regression results

Intergenerational coresidence of the aged, : Independent variables

State-level regression results