Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Forecasting Using the Simple Linear Regression Model and Correlation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
REGRESSION AND CORRELATION
Ch. 14: The Multiple Regression Model building
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Simple Linear Regression Analysis
Linear Regression/Correlation
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Regression and Correlation
Lecture 5 Correlation and Regression
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 11 Simple Regression
Simple Linear Regression Models
CHAPTER 15 Simple Linear Regression and Correlation
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Simple Linear Regression One reason for assessing correlation is to identify a variable that could be used to predict another variable If that is your.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Simple linear regression Tron Anders Moger
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Chapter 16 Multiple Regression and Correlation
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Linear Regression and Correlation Analysis
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Stats Club Marnie Brennan
Linear Regression/Correlation
Introduction to Regression
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

SECTION 6.1 Correlation versus linear regression

Learning Outcome: Distinguish the relationship between correlation and linear regression

Correlation and Regression are both measures of association Some Terms for “association” variables: Variable 1:“x” variable independent variable predictor variable exposure variable Variable 2:“y” variable dependent variable outcome variable

Correlation Coefficient Computation Form: Pearson correlation (“r”) where x and y are the sample means of X and Y, s x and s y are the sample standard deviations of X and Y. Co-variation

Introduction to Linear Regression Like correlation, the data are pairs of independent (e.g. “X”) and dependent (e.g. “Y” variables {(x i,y i ): i=1,...,n}. However, here we seek to predict values of Y from X. The fitted equation is written: y = b 0 + b 1 x where y is the predicted value of the response (e.g. blood pressure) obtained by using the equation. This equation of the line best represents the association between the independent variable and the dependent variable The residuals are the differences between the observed and the predicted values: {(y i – y i ): i=1,…,n}

Introduction to Linear Regression r = 0.76 Best fitting line Minimize distance between predicted and actual values

Introduction to Linear Regression y = b 0 + b 1 x y = predicted value of response (outcome) variable b 0 = constant: the intercept (the value of y when x = 0). b 1 = constant: coefficient for slope of regression line – the expected change in y for a one-unit change in x Note: unlike the correlation coefficient, b is unbounded. x i = values of independent (predictor) variable for subject i

9 SECTION 6.2 Least squares regression and predicted values

Learning Outcomes: Describe the theoretical basis of least squares regression Calculate and interpret predicted values from a linear regression model

Introduction to Linear Regression y = b 0 + b 1 x In the above equation, the values of the slope (b 1 ) and intercept (b 0 ) represent the line that best predicts Y from X. More precisely, the goal of regression is to minimize the sum of the squares of the vertical distances of the points from the line. i.e. minimize ∑(y – y) 2 This is frequently done by the method of “least squares” regression.

Least squares estimates: s y b 1 =r s x b 0 = Y – b 1 X Example: We wish to estimate total cholesterol level (y) from BMI (x) Assume r xy = 0.78; Y = 205.9s y = 30.8 X = 27.4s x = 3.7 s y 30.8 b 1 =r = 0.78= 6.49 s x 3.7 b 0 = Y – b 1 X = – 6.49(27.4) = The equation of the regression line is: y = (BMI)

Least squares estimates: (Practice) s y b 1 =r s x b 0 = Y – b 1 X Example: We wish to estimate systolic blood pressure (y) from BMI (x) Assume r xy = 0.46; Y = 133.8s y = 18.4 X = 26.6s x = 3.5 s y b 1 =r = s x b 0 = Y – b 1 X = The equation of the regression line is: y =

Least squares estimates: (Practice) s y b 1 =r s x b 0 = Y – b 1 X Example: We wish to estimate systolic blood pressure (y) from BMI (x) Assume r xy = 0.46; Y = 133.8s y = 18.4 X = 26.6s x = 3.5 s y 18.4 b 1 =r = 0.46= 2.42 s x 3.5 b 0 = Y – b 1 X = – 2.42(26.6) = The equation of the regression line is: y = (BMI)

Least squares estimates: (Practice) The equation of the regression line is: y = (BMI) Predict systolic blood pressure for the following 3 individuals: Person 1 has BMI of 26.4 Person 2 has BMI of 28.9 Person 3 has BMI of 34.8 y 1 = y 2 = y 3 =

Least squares estimates: (Practice) The equation of the regression line is: y = (BMI) Predict systolic blood pressure for the following 3 individuals: Person 1 has BMI of 26.4 Person 2 has BMI of 28.9 Person 3 has BMI of 34.8 y 1 = (26.4)=133.3 y 2 = (28.9)=139.4 y 3 = (34.8)=153.6

17 SECTION 6.3 Assumptions and sources of variation in linear regression

18 Learning Outcomes: Describe the assumptions required for valid use of the linear regression model Describe the partitioning of sum of squares in the linear regression model

Introduction to Linear Regression Some assumptions for linear regression:  Dependent variable Y has a linear relationship to the independent variable X This includes checking whether the dependent variable is approximately normally distributed.  Independence of the errors (no serial correlation)

Y = (age) R = 0.597

IDXY R0.573

IDXY R0.573

IDXY LOG_ Y

IDXY LOG_ Y

Fundamental Equations for Regression Coefficient of determination (r 2 )  Proportion of variation in Y “explained by the regression on X explained variationSSR SSE R 2 = =-----= total variationSST SST

Example: Fundamental Equations for Regression IDYX NMean SD Sum Y X y = b 0 + b 1 x y = (x) r = 0.42

Example: Fundamental Equations for Regression IDYXY(Y i - Y) 2 (T)(T)(R)(R)(E)(E) NMean SD Sum SST=132SSR=23SSE=109 R0.42 R2R y = (x) SST = 132, df T = 11 SSR = 23, df R = 1 SSE = 109, df E = 10 SSR R 2 =-----= 0.18 SST

Practice: Fundamental Equations for Regression IDYXY(Y i - Y) 2 (T)(T)(R)(R)(E)(E) NMean Sum SST=_____SSR=_____SSE=_____ y = (x) SST = _____, df T = ____ SSR = ______, df R = ____ SSE = ______, df E = ____ SSR R 2 =-----= _______ SST Complete the entries in the table below to determine SST, SSR, SSE, and R 2

Practice: Fundamental Equations for Regression IDYXY(Y i - Y) 2 (T)(T)(R)(R)(E)(E) NMean Sum SST=80.5SSR=19.1SSE=61.4 y = (x) SST = 80.5, df T = 9 SSR = 19.1, df R = 1 SSE = 61.4, df E = 8 SSR R 2 =-----= 0.24 SST

30 SECTION 6.4 Multiple linear regression model

31 Learning Outcome: Calculate and interpret predicted values from the multiple regression model

Multiple Linear Regression  Extension of simple linear regression to assess the association between 2 or more independent variables and a single continuous dependent variable.  The multiple linear regression equation is:  Each regression coefficient represents the change in y relative to a one unit change in the respective independent variable holding the remaining independent variables constant.  The R 2 from the multiple linear regression model represents percentage of variation in the dependent variable “explained” by the set of predictors.

Multiple Linear Regression Example: Predictors of systolic blood pressure: Independent Variable Regression Coefficient tp-value Intercept BMI (per 1 unit) Age (in years) Male gender Treatment for hypertension y = (BMI) (age) (male) (tx-hypertension)

Practice: Estimate systolic blood pressure for the following persons: Independent Variable Regression Coefficient tp-value Intercept BMI (per 1 unit) Age (in years) Male gender (1=yes) Treatment for hypertension (1=yes) Person 1: BMI=27.9; age=54; female; on treatment for hypertension Person 2: BMI=34.9; age=66; male; on treatment for hypertension Person 3: BMI=24.8; age=47; female; not on treatment for hypertension y 1 = y 2 = y 3 =

Practice: Estimate systolic blood pressure for the following persons: Independent Variable Regression Coefficient tp-value Intercept BMI (per 1 unit) Age (in years) Male gender (1=yes) Treatment for hypertension (1=yes) Person 1: BMI=27.9; age=54; female; on treatment for hypertension Person 2: BMI=34.9; age=66; male; on treatment for hypertension Person 3: BMI=24.8; age=47; female; not on treatment for hypertension y 1 = (27.9) (54) (0) (1) = y 2 = (34.9) (66) (1) (1) = y 3 = (27.9) (54) (0) (0) = 113.1

Framingham Risk Calculation (10-Year Risk): Dependent Variable: 10-year risk of CVD Independent Variables: Age, gender, total cholesterol, HDL cholesterol, smoker, systolic BP On medication for BP

37 SECTION 6.5 SPSS for linear regression analysis

38 Learning Outcome: Analyze and interpret linear regression models using SPSS

SPSS Analyze Regression Linear Dependent Variable Independent Variable(s) Statistics ---Estimates ---Confidence intervals ---Model fit ---Partial correlations ---Descriptives Example: Dependent variable:HDL Cholesterol Independent variable:BMI

y = – 0.442(BMI)

SPSS Analyze Regression Linear Dependent Variable Independent Variable(s) Statistics ---Estimates ---Confidence intervals ---Model fit ---Partial correlations ---Descriptives Example: Dependent variable:HDL Cholesterol Independent variable(s):BMI, gender (1=male, 2=female)

y = – 0.481(BMI) (female)

SPSS Analyze Regression Linear Dependent Variable Independent Variable(s) Statistics ---Estimates ---Confidence intervals ---Model fit ---Partial correlations ---Descriptives Example: Dependent variable:HDL Cholesterol Independent variable(s):BMI, gender, age

y = – 0.464(BMI) (female) (age)

Practice: Estimate HDL cholesterol levels for the following persons: Person 1: BMI=25.7; female; age=60 Person 2: BMI=36.9; male; age=66 Person 3: BMI=31.8; female; age=51 y 1 = y 2 = y 3 =

Practice: Estimate HDL cholesterol levels for the following persons: Person 1: BMI=25.7; female; age=60 Person 2: BMI=36.9; male; age=66 Person 3: BMI=31.8; female; age=51 y 1 = – 0.464(25.7) (1) (60) = 51.8 y 2 = – 0.464(36.9) (0) (66) = 36.9 y 3 = – 0.464(31.8) (1) (51) = 47.5