Correlation and Linear Regression

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and regression
Forecasting Using the Simple Linear Regression Model and Correlation
Correlation and Linear Regression.
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Describing Relationships Using Correlation and Regression
Scatter Diagrams and Linear Correlation
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
CORRELATON & REGRESSION
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Regression and Correlation
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Chapter 9: Correlation and Regression
Quantitative Business Analysis for Decision Making Simple Linear Regression.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Correlation & Regression
Linear Regression.
Introduction to Linear Regression and Correlation Analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Correlation and regression 1: Correlation Coefficient
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Chapter 6 & 7 Linear Regression & Correlation
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Correlation & Regression
Examining Relationships in Quantitative Research
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
TODAY we will Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Correlation & Regression Analysis
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
CORRELATION ANALYSIS.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 2 Bivariate Data Scatterplots.   A scatterplot, which gives a visual display of the relationship between two variables.   In analysing the.
Inference for Least Squares Lines
Topic 10 - Linear Regression
Chapter 14: Correlation and Regression
Understanding Research Results: Description and Correlation
Correlation and Regression
CHAPTER 29: Multiple Regression*
STEM Fair Graphs.
Simple Linear Regression
Adequacy of Linear Regression Models
Warsaw Summer School 2017, OSU Study Abroad Program
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Correlation and Linear Regression Microbiology 3053 Microbiological Procedures

Correlation Correlation analysis is used when you have measured two continuous variables and want to quantify how consistently they vary together The stronger the correlation, the more likely to accurately estimate the value of one variable from the other Direction and magnitude of correlation is quantified by Pearson’s correlation coefficient, r Perfectly negative (-1.00) to perfectly positive (1.00) No relationship (0.00)

Correlation The closer r = |1|, the stronger the relationship R=0 means that knowing the value of one variable tells us nothing about the value of the other Correlation analysis uses data that has already been collected Archival Data not produced by experimentation Correlation does not show cause and effect but may suggest such a relationship

Correlation ≠ Causation There is a strong, positive correlation between the number of churches and bars in a town smoking and alcoholism (consider the relationship between smoking and lung cancer) students who eat breakfast and school performance marijuana usage and heroin addiction (vs heroin addiction and marijuana usage)

Visualizing Correlation Scatterplots are used to illustrate correlation analysis Assignment of axes does not matter (no independent and dependent variables) Order in which data pairs are plotted does not matter In strict usage, lines are not drawn through correlation scatterplots

Correlations

Linear Regression Used to measure the relationship between two variables Prediction and a cause and effect relationship Does one variable change in a consistent manner with another variable? x = independent variable (cause) y = dependent variable (effect) If it is not clear which variable is the cause and which is the effect, linear regression is probably an inappropriate test

Linear Regression Calculated from experimental data Independent variable is under the control of the investigator (exact value) Dependent variable is normally distributed Differs from correlation, where both variables are normally distributed and selected at random by investigator Regression analysis with more than one independent variable is termed multiple (linear) regression

Linear Regression Best fit line based on the sum of the squares of the distance of the data points from the predicted values (on the line)

Linear Regression y = a + bx where a = y intercept (point where x = 0 and the line passes through the y-axis) b = slope of the line (y2-y1/x2-x1) The slope indicates the nature of the correlation Positive = y increases as x increases Negative = y decreases as x increases 0 = no correlation Same as Pearson’s correlation No relationship between the variables

Correlation Coefficient (r) Shows the strength of the linear relationship between two variables, symbolized by r The closer the data points are to the line, the closer the regression value is to 1 or -1 r varies between -1 (perfect negative correlation) to 1 (perfect positive correlation) 0 - 0.2 no or very weak association 0.2 -0.4 weak association 0.4 -0.6 moderate association 0.6 - 0.8 strong association 0.8 - 1.0 very strong to perfect association null hypothesis is no association (r = 0) Salkind, N. J. (2000) Statistics for people who think they hate statistics. Thousand Oaks, CA: Sage

Coefficient of Determination (r2) Used to estimate the extent to which the dependent variable (y) is under the influence of the independent variable (x) r2 (the square of the correlation coefficient) Varies from 0 to 1 r2 = 1 means that the value of y is completely dependent on x (no error or other contributing factors) r2 < 1 indicates that the value of y is influenced by more than the value of x

Coefficient of Determination A measurement of the proportion of variance of y explained by its dependence on x Remainder (1 - r2) is the variance of y that is not explained by x (i.e., error or other factors) e.g., if r2 = 0.84, it shows a strong, positive relationship between the variables and shows that the value of x is used to predict 84% of the variability of y (and 16% is due to other factors) r2 can be calculated for correlation analysis by squaring r but Not a measure of variation of y explained by variation in x Variation in y is associated with the variance of x (and vice versa)

Assumptions of Linear Regression Independent variable (x) is selected by investigator (not random) and has no associated variance For every value of x, values of y have a normal distribution Observed values of y differ from the mean value of y by an amount called a residual. (Residuals are normally distributed.) The variances of y for all values of x are equal (homoscedasticity) Observations are independent (Each individual in the sample is only measured once.)

Linear Regression Data The numbers alone do not guarantee that the data have been fitted well! Anscombe, F. J. 1973. Graphs in Statistical Analysis. The American Statistician 27(1):17-21.

Linear Regression Data

Linear Regression Data Figure 1: Acceptable regression model with observations distributed evenly around the regression line Figure 2: Strong curvature suggests that linear regression may not be appropriate (an additional variable may be required)

Linear Regression Data Figure 3: A single outlier alters the slope of the line. The point may be erroneous but if not, a different test may be necessary Figure 4: Actually a regression line connecting only two points. If the rightmost point was different, the regression line would shift.

What if we’re not sure if linear regression is appropriate?

Residuals “Funnel” shaped and may be bowed Variance appears random Homoscedastic Heteroscedastic “Funnel” shaped and may be bowed Suggests that a transformation and inclusion of additional variables may be warranted Variance appears random Good regression model Helsel, D.R., and R.M. Hirsh. 2002. Statistical Methods in Water Resources. USGS (http://water.usgs.gov/pubs/twri/twri4a3/)

Outliers Values that appear very different from others in the data set Rule of thumb: an outlier is more than three standard deviations from mean Three causes Measurement or recording error Observation from a different population A rare event from within the population Outliers need to be considered and not simply dismissed May indicate important phenomenon e.g., ozone hole data (outliers removed automatically by analysis program, delaying observation about 10 years)

Outliers Helsel, D.R., and R.M. Hirsh. 2002. Statistical Methods in Water Resources. USGS (http://water.usgs.gov/pubs/twri/twri4a3/)

When is Linear Regression Appropriate? Data should be interval or ratio The dependent and independent variables should be identifiable The relationship between variables should be linear (if not, a transformation might be appropriate) Have you chosen the values of the independent variable? Does the residual plot show a random spread (homoscedastic) and does the normal probability plot display a straight line (or does a histogram of residuals show a normal distribution)?

(Normal Probability Plot of Residuals) The normal probability plot indicates whether the residuals follow a normal distribution, in which case the points will follow a straight line. Expect some moderate scatter even with normal data. Look only for definite patterns like an "S-shaped" curve, which indicates that a transformation of the response may provide a better analysis. (from Design Expert 7.0 from Stat-Ease)

(Histogram of Residuals Distribution)

The Michaelis-Menton equation to describe enzyme activity: Lineweaver-Burk Plot The Michaelis-Menton equation to describe enzyme activity: is linearized by taking its reciprocal: where: y = 1/vo x = 1/[S] a = 1/Vmax b = Km/Vmax

Mock Enzyme Experiment

Mock Enzyme Experiment

Mock Enzyme Experiment

Mock Enzyme Experiment

Mock Enzyme Experiment

Mock Enzyme Experiment

Mock Enzyme Experiment

Mock Enzyme Experiment