Download presentation
1
SIMPLE LINEAR REGRESSION AND CORRELLATION
By Mpembeni RNM, School of Public Health and Social Sciences, Dept of Epidemiology and Biostatistics MUHAS
2
LEARNING OBJECTIVES After successful completion of this session, you should be able to: Describe the correlation coefficient Describe the linear regression model Understand and check model assumptions Understand meaning of regression coefficients
3
ANALYSING RELATIONSHIPS BETWEEN TWO OR MORE QUANTITATIVE VARIABLES
Two commonly used Methods are: Correlation linear regression Multiple Linear Regression
4
CORRELATION The (Pearson's) correlation coefficient, r measures the closeness (strength) of the linear association i.e. the closeness with which the points lie along the straight line r is a bivariate correlation coefficient summarizing the magnitude and direction of the relationship between two variables
5
Characteristics of r Ranges between -1 and +1
r = 0: No linear relationship r = 1 perfect positive relationship r = -1 perfect negative relationship
6
Interpretation of r If r > 0: variables are positively correlated. i.e as x increases, y tends to increase, while as x decreases, y tends to decrease If r < 0: variables are said to be negatively correlated. i.e as x increases, y tends to decrease, while as x decreases, y tends to increase
7
Little or No Correlation: -0.3 to 0.3
Rule of thumb for r Correlation Strong Weak Positive up and right 0.7 to 1.0 0.3 to 0.7 Negative down and left -1.0 to -0.7 -0.7 to -0.3 Little or No Correlation: -0.3 to 0.3
8
SCATTER DIAGRAM First step in investigating the relationship between two variables Two related variables - plotted on a graph in the form of points or dots Each point on the diagram represents a pair of values, one based on X-scale and the other based on Y-scale. X-scale refer to the explanatory or independent variable and the Y-scale refer to the response or dependent variable. Diagram shows visually the shape and degree of closeness of the relationship
9
Head circumference and Gestational age of 100 LBW babies
10
Scatter Plot From the scatter plot, there is a trend of head circumference to increase with increasing gestational age
11
Strong positive correlation
12
Weak negative correlation
13
No correlation
14
CORRELATION COEFFICIENT
r=∑(X-Xˉ)(Y-ӯ) √∑(X-X͞) 2∑(Y-Ῡ)2 = ∑XY-(∑X)(∑Y)/n √∑x2-(∑x)2/n ∑y2(∑y)2/n
15
Example: Association between Body weight and Plasma volume
16
Calculation of r ∑xy – (∑x∑y)/n = 1615.295 – 535 x 24.02/8=8.96
∑x2 –(∑x)2/n = /8 = ∑y2-∑y2/n = – /8 = 0.678 r = 8.96 √( x 0.678) = 0.76
17
STRENGHT OF THE ASSOCIATION BETWEEN WEIGHT AND PLASMA VOLUME
How strong is the association?
18
Simple Linear Regression
The two quantitative variables should be defined: y refers to the dependent variable (AKA response or outcome variable) x the independent variable (AKA explanatory or predictor variable)
19
Simple linear regression
The objective of the analysis is to see whether a change in an independent variable, x, is associated with a change in the dependent variable, y, Be able to predict the value of the dependent variable given the value of the independent variable Eg Age and Weight of a child under five years of age.
20
EXAMPLE Data on body weight and plasma volume of eight healthy men.
The objective of the analysis is to see whether a change in plasma volume is associated with a change in body weight.
21
ASSOCIATION BETWEEN QUANTITATIVE VARIABLES
22
.Scatter Diagram of Body weight and Plasma Volume
23
Body weight and plasma volume
There is a trend of plasma volume to increase with increasing body weight
24
LINEAR REGRESSION When Linear relationship exists, can summarize the relationship by a line drawn through the scatter of points. any straight line drawn on a graph can be represented by the equation: y = a + bx where y refers to the values of the response (dependent) variable x to values of the explanatory (independent) variable.
25
LINEAR REGRESSION The constant 'a' is the intercept, the point at which the line crosses the y-axis. That is, the value of y when x = 0. The coefficient of x variable ('b') is the slope of the line. It tells us the average change (increase or decrease) in y due to a unit change in x. b is also called the regression coefficient.
26
METHOD OF LEAST SQUARES
A mathematical technique to fit a straight line to a set of points i.e is used to estimate a and b
27
LINEAR REGRESSION Numerator =Sxy= Denominator = = Sxx
28
LINEAR REGRESSION The resultant line is called the regression line, which estimates the average value of y for a given value of x.
29
Calculating the least Square Estimates Example – data on plasma volume and body weight
30
Example
31
Example Regression line: Plasma volume = x body weight
32
ESTIMATION Once you have the value of a and b, you can substitute various values of x into the equation for the line, solve for the corresponding values of y. Eg what would be the plasma volume for an adult with 62 kgs? 77 kgs?
33
Regression line
34
INFERENCES FOR REGRESSION COEFFICIENTS
Just like in any other estimate, the standard error for the regression coefficient can be calculated. Can test the hypothesis whether b differs significantly from b0 using a t test The t value and the corresponding p-value are all shown in the output table.
35
Evaluation of the model
The coefficient of Determination, R2 which is the square of the Pearsons Correllation Coefficent, r, is used to assess how best the model fits the data. This is the proportion of the variability among the observed values of y that is explained by the linear regression of y on x
36
Model Evaluation If for example R2 is it implies that almost 61% of the variation among the observed values of y is explained by its linear relationship with the independent variable
37
EXERCISE Using the provided data set ( LBW babies)
Correlate birth weight with Gestational age What is the Correlation Coefficient between bweight and Gestage? Regression of Birth weight on gestational age. What is the equation of the line? What is the estimated birth weight for a baby with 42 weeks of gestation?, 36 weeks? What proportion of the variability of birth weight is explained by gestational age?
38
Model Unstandardized Coefficients Standardized Coefficients t Sig.
Coefficients(a) Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) gestage a Dependent Variable: birthwt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.