Correlation and linear regression

Slides:



Advertisements
Similar presentations
Analysis of variance and statistical inference.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Lecture 14 Non-parametric hypothesis testing The ranking of data The ranking of data eliminates outliers and non- linearities. In most cases it reduces.
Variance and covariance M contains the mean Sums of squares General additive models.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Regression and Correlation
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Linear Regression/Correlation
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Variance and covariance Sums of squares General linear models.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Lecture 5 Correlation and Regression
Correlation & Regression
Lecture 16 Correlation and Coefficient of Correlation
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Regression and Correlation Methods Judy Zhong Ph.D.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Introduction to Linear Regression
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 16 Data Analysis: Testing for Associations.
Discussion of time series and panel models
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Lecture 10: Correlation and Regression Model.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Introduction to Biostatistics and Bioinformatics Regression and Correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
CORRELATION ANALYSIS.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
Multiple Regression Models
CORRELATION ANALYSIS.
Presentation transcript:

Correlation and linear regression Lecture 12 Correlation and linear regression The least squares method of Carl Friedrich Gauß. OLRy y = ax + b Dy2 Dy

Correlation coefficient Covariance Variance Correlation coefficient Slope a and coefficient of correlation r are zero if the covariance is zero. Coefficient of determination

Relationships between macropterous, dimorphic and brachypterous ground beetles on 17 Mazurian lake islands Positive correlation; r =r2= 0.41 The regression is weak. Macropterous species richness explains only 17% of the variance in brachypterous species richness. We have some islands without brachypterous species. We really don’t know what is the independent variable. There is no clear cut logical connection. Positive correlation; r =r2= 0.67 The regression is moderate. Macropterous species richness explains only 45% of the variance in dimorphic species richness. The relationship appears to be non-linear. Log-transformation is indicated (no zero counts). We really don’t know what is the independent variable. There is no clear cut logical connection.

Negative correlation; r =r2= -0.48 The regression is weak. Island isolation explains only 23% of the variance in brachypterous species richness. We have two apparent outliers. Without them the whole relationship would vanish, it est R2  0. Outliers have to be eliminated fom regression analysis. We have a clear hypothesis about the logical relationships. Isolation should be the predictor of species richness. No correlation; r =r2= 0.06 The regression slope is nearly zero. Area explains less than 1% of the variance in brachypterous species richness. We have a clear hypothesis about the logical relationships. Area should be the predictor of species richness.

The matrix perspective X is not quadratic. It doesn’t possess an inverse

Variance Covariance

is square and symmetric Covariances Variances The covariance matrix is square and symmetric

Non-linear relationships Ground beetles on Mazurian lake islands Linear function Logarithmic function Power function The species – individuals relationship are obviously non-linear. The power function has the highest R2 and explains therefore most of the variance in species richness. The coefficient of determination is a measure of goodness of fit. Intercept Slope

Having more than one predictor Describe species richness in dependence of numbers of individuals, area, and isolation of islands. We need a clear hypothesis about dependent and independent predictors. Use a block diagram. Individuals Area Isolation Species

Collinearity Predictors are not independent. Numbers of individuals depends on area and degree of isolation. We need linear relationships Individuals Area Isolation Species We use ln transformed variables of species, area, and individuals. Check for multicollinearity using a correlation matrix. We check for non-linearities using plots. The correlation between area and individuals is highly significant. The probability of H0 = 0.004. Of the predictors area and individuals are highly correlated. In linear regression analysis correlations of predictors below 0.7 are acceptable.

The final data for our analysis The predictor variables have to contain different information. If X is singular no inverse exists The vector Y contains the response variable The matrix X contains the effect (predictor) variables Multiple linear regression The model

The probability that R2 is zero is only 0.01%. With 99.9% R2 > 0 and hence statistically significant. The model explains 78.6 % of variance in species richness. 21.4% of avriance remains unexplained. The probabilities that the coefficients deviate from zero. Isolation is not a significant predictor.

What distance to minimize? OLRy Dy2 Dx2 OLRx Model I regression

RMA Dx Dy Reduced major axis regression is the geometric average of aOLRy and aOLRx Model II regression

Past standard output of linear regression Reduced major axis Parameters and standard errors Parametric probability for r = 0 Permutation test for statistical significance Both tests indicate that Brach and Macro are not significantly correlated. The RMA regression slope is insignificant. We don’t have a clear hypothesis about the causal relationships. In this case RMA is indicated.

Permutation test for statistical significance Observed r S N2.5 = 25 S N2.5 = 25 Lower CL m > 0 Upper CL Calculating confidence limits Randomize 1000 times x or y. Calculate each time r. Plot the statistical distribution and calculate the lower and upper confidence limits. Rank all 1000 coefficients of correlation and take the values at rank positions 25 and 975.

The RMA regression has a much steeper slope. This slope is often intuitively better. Upper CL The coefficient of correlation is independent of the regression method Lower CL In OLRy regression insignificance of slope means also insignificance of r and R2. The 95% confidence limit of the regression slope mark the 95% probability that the regression slope is within these limits. The lower CL is negative, hence the zero slope is with the 95% CL.

Outliers should be eliminated from regression analysis. OLRy Outliers have an overproportional influence on correlation and regression. Dy2 Dy Outliers should be eliminated from regression analysis. rPearson = 0.79 Normal correlation on ranked data Instead of the Pearson coefficient of correlations use Spearman’s rank order correlation. rSpearman = 0.77

Home work and literature Refresh: Coefficient of correlation Pearson correlation Spearman correlation Linear regression Non-linear regression Model I and model II regression RMA regression Prepare to the next lecture: F-test F-distribution Variance Literature: Łomnicki: Statystyka dla biologów http://statsoft.com/textbook/