Biostatistics course Part 16 Lineal regression Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.

Slides:



Advertisements
Similar presentations
Biostatistics course Part 13 Effect measures in 2 x 2 tables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences.
Advertisements

Biostatistics course Part 9 Comparison between two means Dr. Sc Nicolas Padilla Raygoza Department Nursing and Obstetrics Division Health Sciences and.
Biostatistics course Part 6 Normal distribution Dr. en C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and.
Biostatistics course Part 11 Comparison of two proportions Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression and correlation methods
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
CHAPTER 24: Inference for Regression
Correlation and regression
Biostatistics course Part 4 Probability Dr. C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engioneering.
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
9. SIMPLE LINEAR REGESSION AND CORRELATION
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
The Basics of Regression continued
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Correlation and Linear Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Biostatistics course Part 15 Correlation Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering.
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.
Biostatistics course Part 8 Inferences of a mean Dr. Sc Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Course on Biostatistics Part 1 What is statistics? Dr. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences and Engineering.
Biostatistics course Part 5 Binomial distribution
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Biostatistics course Part 3 Data, summary and presentation Dr. en C. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division of Health Sciences.
Biostatistics course Part 12 Association between two categorical variables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Biostatistics course Part 7 Introduction to inferential statistics Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics, Division Health.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Biostatistic course Part 10 Inferences from a proportion Dr. Sc. Nicolas Padilla Raygoza Department dof Nursing and Obstetrics Division Health Sciences.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Understanding Standards Event Higher Statistics Award
Correlation and Simple Linear Regression
Correlation and Regression
Stats Club Marnie Brennan
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Presentation transcript:

Biostatistics course Part 16 Lineal regression Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering Campus Celaya Salvatierra University of Guanajuato

Biosketch Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on Pediatrics. Postgraduate Diploma on Epidemiology, London School of Hygiene and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International University. Doctorate Sciences with aim in Epidemiology, Atlantic International University. Associated Professor B, Department of Nursing and Obstetrics, Division of Health Sciences and Engineering, University of Guanajuato, Campus Celaya Salvatierra, Mexico.

Competencies The reader will know how plot a regression line He (she) will apply a hypothesis test on regression line He (she) will know how make ANOVA analysis

Introduction When one thinks that one variable depends on the other, it must quantify the relationship between them. In doing so, we can estimate the value of a variable, if we know the value of the other. This method is called regression.

Lineal regression Scatter plot show the relationship between age and systolic arterial tension from 37 women. Arterial tension change with age.

Plotting a regression line Our objective is to draw a line that best describes the relationship between X and Y. You can draw a line with a ruler, that joint the points, but is unlikely to get an unique line and each gives a different description of the relationship between X and Y.

Each vertical distance is the difference between the observed value for the dependent variable (in the y-axis) and the value of the line for the corresponding value of the x-axis. The vertical distance between the observed and the layout is known as residual. We call each of the residuals e1. Residuals e 1 Plotting a regression line

The line that better describe the data is best known as a regression line. Gives an estimate of the average value of y for each x value. In general, we say that is a regression of y on x. We may think of the regression line as a line joining the mean values for each value of x. Plotting a regression line

The mathematical expression for the regression line equation is: y = α + βx where α is the intersection of the line with the y axis, and β is the slope of the line. Least-squares regression line gives a better line with an intercept and a slope determined. Plotting a regression line

We can work on the slope of the line taking two points along the line. For example, take the points 1 and 2 in the chart below. Point 1 has the values x = 4, y = 16 Point 2 has the values x = 8, y = Plotting a regression line

This graph corresponds to a fixed value of a = 10 and a value of b different. Shows three lines corresponding to a fixed value of a and a different value of y. This graph corresponds to a value fixed by a different value of a. a= Plotting a regression line

Interpreting a regression line Once we obtain the regression line, we can use to give a summary of the relationship between explanatory and response variables (independent, dependent). We can say: For one unit increase in x, y increases by a certain value (the value of b). y = a + bx

y = x Interpreting a regression line

Inferences from a regression line So far we have only seen the description of the relationship between two variables with a regression line, where a (the intercept) and b (slope) are estimated from the data points of the sample. The regression equation describing the relationship between two variables in the population is written: y = a + bx Thus, a is an estimate of α and b is an estimate of β. Population Sample Intercept α a Slope β b

The regression line gives an estimate of the relationship between two variables xy, and in the population. In the same way that we used the findings to make inferences about means and proportions, using the regression line to draw conclusions about the relationship between two variables in the population. If we take samples of the population, of each sample we can obtain a regression line drawn by the method of least squares. In the population there is a linear relationship between two variables and each sample can be slightly different. Inferences from a regression line

In the sample y = a + bx. In the population y = α + βx. There are three assumptions underlying the linear regression method: 1. The response variable, y, has a normal distribution for each x 2. Variability of y should be the same through x 3. The relationship between x y must be linear. Inferences from a regression line

The slope b is of fundamental interest in the regression analysis. Gives us the most important information about the relationship between x y, this is, the change average in y for a unit change in x. Obtained the standard error of b, we can calculate confidence intervals and testing hypotheses about b. Inferences from a regression line

Example The regression equation for the relationship between height and gestational age is: Height = x gestational age at birth

Example When these values were analyzed using a computer program the following values for the intercept, slope and their standard errors were calculated: a = 97.9, b = 0.215, SE(a) = 3.20, SE(b) = Note that when gestational age was 0, height is 97.9 cm. Is this possible?

Confidence intervals for b The graph suggests a reasonable linear relationship between stature and gestational age at birth. Is it because of the value of b that we obtained in these 21 children? We can estimate the confidence interval for b to obtain a range of values that we can be confident contains the true slope β. A confidence interval at 95% for the slope b is computed using the distribution t. b ± t 0.05 ES(b) where t is the value with n-2 degrees of freedom in table of t distribution at 0.05 level.

For the relationship between height and gestational age: b = 0.215, n - 2 = = 19, t 19, 0.05 = 2,093, ES(b) = Then the confidence interval 95% for b is: to This suggests that the true slope in the population is not zero. Confidence intervals for b

Hypothesis test for b We can calculate the test hypotheses about the true slope β, the slope of the linear relationship between two variables in the population. Null hypothesis The null hypothesis is that the slope in the population is zero. This is implicit when we say that there is no linear relationship between height and gestational age. Ho: b = 0 Alternative hypothesis The alternative hypothesis is that the slope in the population is not zero. If this is true, we can say that there is a linear relationship between height and gestational age. H1: b ≠ 0

To test the null hypothesis, we divide the estimate of b with its standard error and compare the results in the t distribution with n - 2 degrees of freedom. In this example, b = 0.215, ES(b) = Now, referring to the tables of t distribution with (n - 2) = (21 - 2) = 19 degrees of freedom, the p-value is 0.01 <P <0.02. What we conclude from this result? We reject the null hypothesis and say that there is evidence that the slope of the relationship between stature and gestational age in the population is not zero. Hypothesis test for b

Analysis of variance (ANOVA) Evaluation of a regression analysis involving the comparison of the variance of the residuals and the variation in the data explained by the regression line. This can be displayed in a table of analysis of variance. This analysis is called ANOVA.

Regression The graph shows the relationship between x Y, with four points. Draws the regression line and analyzed the different parts of the variation of xy, to evaluate the regression Line of null hypothesis Residuals for total sum of squares 3.5 – 2.5 – Analysis of variance (ANOVA)

The difference between the total sum of squares and the sum of the squares of the residuals (the variation that remains after it is drawn a line through the points) is the variation that is explained by the regression of y on x. In the example: The sum of the squares of the residuals is 4 The total sum of squares is 49. Analysis of variance (ANOVA)

What is the sum of squares regression? The plotted regression line explains the proportion of the variability in the response variable while indicating that the residual amount of unexplained variability. A regression line that describes the data and explains the most variation is preferable. Analysis of variance (ANOVA)

The sum of squares show how much of the variation is explained by the regression line and how much is explained by the residuals. This is shown in an analysis of variance using the ANOVA table. Analysis of variance (ANOVA)

Analysis of variance (ANOVA) table Analysis of variance (ANOVA) Source Sum of squares Degree of freedom Mean sum of squares F p-value Regression Residual Total 49 3 The approach of variance analysis is to compare the two sources of variation (regression and residual) to know better explains the variation in the response variable. To do this, we use a test that compares the change in regression and residual variation, known as the F test

The reason for using an F test is that the ratio of two variances has a sampling distribution known as distribution F. The sum of squares due to regression line has a degree of freedom. The sum of squares due to the residual variance (unexplained) is n-2 degrees of freedom. To take into account the degrees of freedom, we calculate the mean of the sum of squares, dividing the sum of squares between the degrees of freedom. Mean of the sum of squares = sum of squares / degrees of freedom Analysis of variance (ANOVA)

We can estimate the value of F as the ratio of the means sum of squares: F = Mean sum of squares (regression) / mean sum of squares (residual)= 45 / 2 = 22.5 The F test based on ANOVA is an alternative way to test the null hypothesis, β = 0. It is equivalent to the square of the t test on the slope b. The F test and t test were to test the null hypothesis that x has no relationship with y. The value of F is referred to tables of F distribution with 1 and n- 2 degrees of freedom to obtain the corresponding value of p. p = Analysis of variance (ANOVA)

What we concluded the value of p? The p value tells us the probability of observing a linear relationship in the sample if the null hypothesis were true and there was no linear relationship in the population. Thus, for a low p-value we reject the null hypothesis and say that there is a linear relationship in the population and the regression line trace well the data. Analysis of variance (ANOVA)

R 2 We have worked in almost all terms of an ANOVA table. It remains only to calculate the percentage of the total variance explained by the regression line. It is a way of assessing how well a general regression line trace data. How much of the total variation of the response variable can be explained by the regression line? We call this value R² and it is calculated as the ratio of the sum of squares of the regression divided by the total sum of squares. R 2 = regression sum of squares / Total sum of squares x100 Analysis of variance (ANOVA)

Assumptions for the regression Remember that the assumptions underlying the linear regression method: The response variable must be normally distributed Variability in y should be the same across all values of x There should be a linear relationship between x y. When is valid to use the regression?

Precautions It is possible to obtain a regression line of any graph points scattered but a linear regression should be applied only where there is a linear relationship. A linear association between two variables does not mean that one causes the other. May be necessary to adjust for potential confounders. When is valid to use the regression?

Bibliography 1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001: Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.