Correlation, linear regression 1. Scatterplot Relationship between two continouous variables 2.

Slides:



Advertisements
Similar presentations
Scatterplots and Correlation
Advertisements

Correlation and regression II. Regression using transformations. 1.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 4 The Relation between Two Variables
Chapter 6: Exploring Data: Relationships Lesson Plan
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Statistics for the Social Sciences
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Correlation and Regression Analysis
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Correlation and Regression Analysis
Correlation & Regression Math 137 Fresno State Burger.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Correlation & Regression
Descriptive Methods in Regression and Correlation
Linear Regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Correlation Scatter Plots Correlation Coefficients Significance Test.
Correlation and regression 1: Correlation Coefficient
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Bivariate Data When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 10 Correlation and Regression
Objectives (IPS Chapter 2.1)
Summarizing Bivariate Data
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.1 Scatterplots.
The Practice of Statistics
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Biostatistics, statistical software VI. Relationship between two continuous variables, correlation, linear regression, transformations. Relationship between.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster.
Chapter 7 Scatterplots, Association, and Correlation.
Notes Chapter 7 Bivariate Data. Relationships between two (or more) variables. The response variable measures an outcome of a study. The explanatory variable.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Part II Exploring Relationships Between Variables.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Correlation & Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Chapter 10 Correlation and Regression
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
CHAPTER 3 Describing Relationships
Chapters Important Concepts and Terms
Honors Statistics Review Chapters 7 & 8
Review of Chapter 3 Examining Relationships
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Correlation, linear regression 1

Scatterplot Relationship between two continouous variables 2

3

Scatterplot Other examples 4

Example II. Imagine that 6 students are given a battery of tests by a vocational guidance counsellor with the results shown in the following table: Variables measured on the same individuals are often related to each other. 5

Let us draw a graph called scattergram to investigate relationships. Scatterplots show the relationship between two quantitative variables measured on the same cases. In a scatterplot, we look for the direction, form, and strength of the relationship between the variables. The simplest relationship is linear in form and reasonably strong. Scatterplots also reveal deviations from the overall pattern. 6

Creating a scatterplot When one variable in a scatterplot explains or predicts the other, place it on the x-axis. Place the variable that responds to the predictor on the y-axis. If neither variable explains or responds to the other, it does not matter which axes you assign them to. 7

8 Possible relationships positive correlation negative correlation no correlation

Describing linear relationship with number: the coefficient of correlation (r). Also called Pearson coefficient of correlation 9 Correlation is a numerical measure of the strength of a linear association. The formula for coefficient of correlation treats x and y identically. There is no distinction between explanatory and response variable. Let us denote the two samples by x 1,x 2,…x n and y 1,y 2,…y n, the coefficient of correlation can be computed according to the following formula

Karl Pearson Karl Pearson (27 March 1857 – 27 April 1936) established the discipline of mathematical statistics. /wiki/Karl_Pearson 10

Properties of r Correlations are between -1 and +1; the value of r is always between -1 and 1, either extreme indicates a perfect linear association. -1r 1.-1r 1. a) If r is near +1 or -1 we say that we have high correlation. b) If r=1, we say that there is perfect positive correlation. If r= -1, then we say that there is a perfect negative correlation. c) A correlation of zero indicates the absence of linear association. When there is no tendency for the points to lie in a straight line, we say that there is no correlation (r=0) or we have low correlation (r is near 0 ). 11

12 Calculated values of r positive correlation, r=0.9989negative correlation, r= no correlation, r=

Scatterplot Other examples 13 r=0.873 r=0.018

Correlation and causation a correlation between two variables does not show that one causes the other. 14

Correlation by eye This applet lets you estimate the regression line and to guess the value of Pearson's correlation. Five possible values of Pearson's correlation are listed. One of them is the correlation for the data displayed in the scatterplot. Guess which one it is. To see the correct value, click on the "Show r" button. 15

Effect of outliers Even a single outlier can change the correlation substantially. Outliers can create  an apparently strong correlation where none would be found otherwise,  or hide a strong correlation by making it appear to be weak. 16 r=-0.21 r=0.74 r=0.998r=-0.26

Correlation and linearity Two variables may be closely related and still have a small correlation if the form of the relationship is not linear. 17 r=2.8 E-15 (= ) r=0.157

Correlation and linearity Four sets of data with the same correlation of

Coefficient of determination The square of the correlation coefficient multiplied by 100 is called the coefficient of determination. It shows the percentages of the total variation explained by the linear regression. Example. The correlation between math aptitude and language aptitude was found r =0,9989. The coefficient of determination, r 2 = So 91.7% of the total variation of Y is caused by its linear relationship with X. 19

When is a correlation „high”? What is considered to be high correlation varies with the field of application. The statistician must decide when a sample value of r is far enough from zero, that is, when it is sufficiently far from zero to reflect the correlation in the population. 20

Testing the significance of the coefficient of correlation The statistician must decide when a sample value of r is far enough from zero to be significant, that is, when it is sufficiently far from zero to reflect the correlation in the population. (details: lecture 8.) 21

Prediction based on linear correlation: the linear regression When the form of the relationship in a scatterplot is linear, we usually want to describe that linear form more precisely with numbers. We can rarely hope to find data values lined up perfectly, so we fit lines to scatterplots with a method that compromises among the data values. This method is called the method of least squares. The key to finding, understanding, and using least squares lines is an understanding of their failures to fit the data; the residuals. 22

Residuals, example 1. 23

Residuals, example 2. 24

Residuals, example 3. 25

Prediction based on linear correlation: the linear regression A straight line that best fits the data: y=bx + a or y= a + bx is called regression line Geometrical meaning of a and b. b: is called regression coefficient, slope of the best-fitting line or regression line; a: y-intercept of the regression line. The principle of finding the values a and b, given x 1,x 2,…x n and y 1,y 2,…y n. Minimising the sum of squared residuals, i.e. Σ( y i -(a+bx i ) ) 2 → min 26

Residuals, example (x 1,y 1 ) b*x 1 +a y 1 -(b*x 1 +a) y 2 -(b*x 2 +a) y 6 -(b*x 6 +a)

28

Equation of regression line for the data of Example 1. y=1.016·x+15.5 the slope of the line is Prediction based on the equation: what is the predicted score for language for a student having 400 points in math? y predicted =1.016 · =

Computation of the correlation coefficient from the regression coefficient. There is a relationship between the correlation and the regression coefficient: where s x, s y are the standard deviations of the samples. From this relationship it can be seen that the sign of r and b is the same: if there exist a negative correlation between variables, the slope of the regression line is also negative. 30

SPSS output for the relationship between age and body mass 31 Coefficient of correlation, r=0.018 Equation of the regression line: y=0.078x

SPSS output for the relationship between body mass at present and 3 years ago 32 Coefficient of correlation, r=0.873 Equation of the regression line: y=0.795x

Regression using transformations Sometimes, useful models are not linear in parameters. Examining the scatterplot of the data shows a functional, but not linear relationship between data. 33

Example A fast food chain opened in Each year from 1974 to 1988 the number of steakhouses in operation is recorded. The scatterplot of the original data suggests an exponential relationship between x (year) and y (number of Steakhouses) (first plot) Taking the logarithm of y, we get linear relationship (plot at the bottom) 34

Performing the linear regression procedure to x and log (y) we get the equation log y = x that is y = e x =e e x = 1.293e x is the equation of the best fitting curve to the original data. 35

36 log y = xy = 1.293e x

Types of transformations Some non-linear models can be transformed into a linear model by taking the logarithms on either or both sides. Either 10 base logarithm (denoted log) or natural (base e) logarithm (denoted ln) can be used. If a>0 and b>0, applying a logarithmic transformation to the model 37

Exponential relationship ->take log y Model: y=a*10 bx Take the logarithm of both sides: lg y =lga+bx so lg y is linear in x 38

Logarithm relationship ->take log x Model: y=a+lgx so y is linear in lg x 39

Power relationship ->take log x and log y Model: y=ax b Take the logarithm of both sides: lg y =lga+b lgx so lgy is linear in lg x 40

Log10 base logarithmic scale

Logarithmic papers 42 Semilogarithmic paper log-log paper

Reciprocal relationship ->take reciprocal of x Model: y=a +b/x y=a +b*1/x so y is linear in 1/x 43

Example from the literature 44

45

46 Example 2. EL HADJ OTHMANE TAHA és mtsai: Osteoprotegerin: a regulátor, a protektor és a marker. Orvosi Hetilap 2008 ■ 149. évfolyam, 42. szám ■ 1971–1980.

Useful WEB pages    =related =related  statistics/#Correlationsb statistics/#Correlationsb  ear/LogarithmicCurve.htm ear/LogarithmicCurve.htm 

The origin of the word „regression”. Galton: Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute 1886 Vol.15,

49