Linear correlation and linear regression + summary of tests

Slides:



Advertisements
Similar presentations
Two-sample tests. Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated?Alternative to the chi- square test if.
Advertisements

Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Pitfalls of Hypothesis Testing. Hypothesis Testing The Steps: 1. Define your hypotheses (null, alternative) 2. Specify your null distribution 3. Do an.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Linear correlation and linear regression. Continuous outcome (means) Outcome Variable Are the observations independent or correlated? Alternatives if.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Tests for Continuous Outcomes II. Overview of common statistical tests Outcome Variable Are the observations correlated? Assumptions independentcorrelated.
Regression and Correlation Methods Judy Zhong Ph.D.
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Simple Linear Regression
Regression and Correlation. Bivariate Analysis Can we say if there is a relationship between the number of hours spent in Facebook and the number of friends.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
X Treatment population Control population 0 Examples: Drug vs. Placebo, Drugs vs. Surgery, New Tx vs. Standard Tx  Let X =  cholesterol level (mg/dL);
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Lecture 10: Correlation and Regression Model.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1 Virtual COMSATS Inferential Statistics Lecture-25 Ossam Chohan Assistant Professor CIIT Abbottabad.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Statistics Correlation and regression. 2 Introduction Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment.
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
The simple linear regression model and parameter estimation
More than two groups: ANOVA and Chi-square
Regression Analysis AGEC 784.
Statistics for Managers using Microsoft Excel 3rd Edition
Tests for Continuous Outcomes II
Correlation & Linear Regression
Regression Analysis Week 4.
Scatter Plots of Data with Various Correlation Coefficients
Ass. Prof. Dr. Mogeeb Mosleh
CORRELATION AND MULTIPLE REGRESSION ANALYSIS
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Linear correlation and linear regression + summary of tests Week 12 Regression Theory

Recall: Covariance

Interpreting Covariance cov(X,Y) > 0 X and Y are positively correlated cov(X,Y) < 0 X and Y are inversely correlated cov(X,Y) = 0 X and Y are independent

Correlation coefficient Pearson’s Correlation Coefficient is standardized covariance (unitless):

Recall dice problem… Var(x) = = 2.916666 Var(y) = 5.83333 Cov(xy) = 2.91666 R2=“Coefficient of Determination” = SSexplained/TSS  Interpretation of R2: 50% of the total variation in the sum of the two dice is explained by the roll on the first die. Makes perfect intuitive sense!

Correlation Measures the relative strength of the linear relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker any positive linear relationship

Scatter Plots of Data with Various Correlation Coefficients Y Y Y X X X r = -1 r = -.6 r = 0 Y Y Y X X X r = +1 r = +.3 r = 0 Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear Correlation Linear relationships Curvilinear relationships Y Y X X Y Y X X Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear Correlation Strong relationships Weak relationships Y Y X X Y Y Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear Correlation No relationship Y X Y X Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Linear regression http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html In correlation, the two variables are treated as equals. In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.

What is “Linear”? Remember this: Y=mX+B? m B

What’s Slope? A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

Simple linear regression P=.22; not significant intercept The linear regression model: Love of Math = 5 + .01*math SAT score slope

Prediction If you know something about X, this knowledge helps you predict something about Y. (Sound familiar?…sound like conditional probabilities?)

Linear Regression Model Y’s are modeled… Yi= 100*X + random errori Follows a normal distribution Fixed – exactly on the line

Assumptions (or the fine print) Linear regression assumes that… 1. The relationship between X and Y is linear 2. Y is distributed normally at each value of X 3. The variance of Y at every value of X is the same (homogeneity of variances) Why? The math requires it—the mathematical process is called “least squares” because it fits the regression line by minimizing the squared errors from the line (mathematically easy, but not general—relies on above assumptions).

Expected value of y: Expected value of y at level of x: xi=

Residual We fit the regression coefficients such that sum of the squared residuals were minimized (least squares regression).

Residual Residual = observed value – predicted value 33.5 weeks

Residual Analysis: check assumptions The residual for observation i, ei, is the difference between its observed and predicted value Check the assumptions of regression by examining the residuals Examine for linearity assumption Examine for constant variance for all levels of X (homoscedasticity) Evaluate normal distribution assumption Evaluate independence assumption Graphical Analysis of Residuals Can plot residuals vs. X

Residual Analysis for Linearity x x x x residuals residuals  Not Linear Linear Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Residual Analysis for Homoscedasticity x x x x residuals residuals  Constant variance Non-constant variance Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

Residual Analysis for Independence Not Independent  Independent X residuals X residuals X residuals Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

As a linear regression… Slope represents the difference in means between odd and even groups. Diff is significant. Intercept represents the mean value in the even-day group. It is significantly different than 0—so the average Eng SAT score is not 0. Standard Parameter Estimate Error t Value Pr > |t| Intercept 657.5000000 23.66105065 27.79 <.0001 OddDay 81.7307692 32.81197359 2.49 0.0204

Multiple Linear Regression More than one predictor… =  + 1*X + 2 *W + 3 *Z Each regression coefficient is the amount of change in the outcome variable that would be expected per one-unit change of the predictor, if all other variables in the model were held constant.  

ANOVA is linear regression! A categorical variable with more than two groups: E.g.: groups 1, 2, and 3 (mutually exclusive) =  (=value for group 1) + 1*(1 if in group 2) + 2 *(1 if in group 3) This is called “dummy coding”—where multiple binary variables are created to represent being in each category (or not) of a categorical variable

Other types of multivariate regression Multiple linear regression is for normally distributed outcomes Logistic regression is for binary outcomes Cox proportional hazards regression is used when time-to-event is the outcome

Overview of statistical tests The following table gives the appropriate choice of a statistical test or measure of association for various types of data (outcome variables and predictor variables) by study design. e.g., blood pressure= pounds + age + treatment (1/0) Continuous outcome Binary predictor Continuous predictors

Alternative summary: statistics for various types of outcome data Outcome Variable Are the observations independent or correlated? Assumptions independent correlated Continuous (e.g. pain scale, cognitive function) Ttest ANOVA Linear correlation Linear regression Paired ttest Repeated-measures ANOVA Mixed models/GEE modeling Outcome is normally distributed (important for small samples). Outcome and predictor have a linear relationship. Binary or categorical (e.g. fracture yes/no) Difference in proportions Relative risks Chi-square test Logistic regression McNemar’s test Conditional logistic regression GEE modeling Chi-square test assumes sufficient numbers in each cell (>=5) Time-to-event (e.g. time to fracture) Kaplan-Meier statistics Cox regression n/a Cox regression assumes proportional hazards between groups

Continuous outcome (means) Outcome Variable Are the observations independent or correlated? Alternatives if the normality assumption is violated (and small sample size): independent correlated Continuous (e.g. pain scale, cognitive function) Ttest: compares means between two independent groups ANOVA: compares means between more than two independent groups Pearson’s correlation coefficient (linear correlation): shows linear correlation between two continuous variables Linear regression: multivariate regression technique used when the outcome is continuous; gives slopes Paired ttest: compares means between two related groups (e.g., the same subjects before and after) Repeated-measures ANOVA: compares changes over time in the means of two or more groups (repeated measurements) Mixed models/GEE modeling: multivariate regression techniques to compare changes over time between two or more groups; gives rate of change over time Non-parametric statistics Wilcoxon sign-rank test: non-parametric alternative to the paired ttest Wilcoxon sum-rank test (=Mann-Whitney U test): non-parametric alternative to the ttest Kruskal-Wallis test: non-parametric alternative to ANOVA Spearman rank correlation coefficient: non-parametric alternative to Pearson’s correlation coefficient

Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated? Alternative to the chi-square test if sparse cells: independent correlated Binary or categorical (e.g. fracture, yes/no) Chi-square test: compares proportions between more than two groups Relative risks: odds ratios or risk ratios Logistic regression: multivariate technique used when outcome is binary; gives multivariate-adjusted odds ratios McNemar’s chi-square test: compares binary outcome between correlated groups (e.g., before and after) Conditional logistic regression: multivariate regression technique for a binary outcome when groups are correlated (e.g., matched data) GEE modeling: multivariate regression technique for a binary outcome when groups are correlated (e.g., repeated measures) Fisher’s exact test: compares proportions between independent groups when there are sparse data (some cells <5). McNemar’s exact test: compares proportions between correlated groups when there are sparse data (some cells <5).

Time-to-event outcome (survival data) Outcome Variable Are the observation groups independent or correlated? Modifications to Cox regression if proportional-hazards is violated: independent correlated Time-to-event (e.g., time to fracture) Kaplan-Meier statistics: estimates survival functions for each group (usually displayed graphically); compares survival functions with log-rank test Cox regression: Multivariate technique for time-to-event data; gives multivariate-adjusted hazard ratios n/a (already over time) Time-dependent predictors or time-dependent hazard ratios (tricky!)