Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Advertisements

Chapter 3 Bivariate Data
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
The Simple Regression Model
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 2 – Simple Linear Regression - How. Here is a perfect scenario of what we want reality to look like for simple linear regression. Our two variables.
Lectures 4a,4b: Exponential & Logarithmic Functions 1.(4.1) Exponential & Logarithmic Functions in Biology 2.(4.2) Exponential & Logarithmic Functions:
Correlation & Regression Math 137 Fresno State Burger.
Linear Regression Analysis
STATISTICS ELEMENTARY C.M. Pascual
Linear Regression and Correlation
Descriptive Methods in Regression and Correlation
Linear Regression.
Relationship of two variables
Slide Copyright © 2008 Pearson Education, Inc. Chapter 4 Descriptive Methods in Regression and Correlation.
ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Biostatistics Unit 9 – Regression and Correlation.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
2-5 Using Linear Models Make predictions by writing linear equations that model real-world data.
Chapter 6 & 7 Linear Regression & Correlation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Modeling a Linear Relationship Lecture 47 Secs – Tue, Apr 25, 2006.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
Regression Regression relationship = trend + scatter
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Linear Regression. Determine if there is a linear correlation between horsepower and fuel consumption for these five vehicles by creating a scatter plot.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
Modeling a Linear Relationship Lecture 44 Secs – Tue, Apr 24, 2007.
5.7 Scatter Plots and Line of Best Fit I can write an equation of a line of best fit and use a line of best fit to make predictions.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Unit 4 Lesson 3 (5.3) Summarizing Bivariate Data 5.3: LSRL.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman Correlation and Regression.
The simple linear regression model and parameter estimation
Correlation & Regression
CHAPTER 3 Describing Relationships
Chapter 5 STATISTICS (PART 4).
CHAPTER 10 Correlation and Regression (Objectives)
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation 4.Correlation 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation 4.Correlation

Motivation Visual analysis of data, though useful, is limited We use other methods to determine if there is a sufficiently strong relationship between two measurements to possibly eliminate the necessity of making both measurements – In many fish, there is a very strong relationship between length and body mass, so that measuring the length alone can provide a very good estimate of the individual’s weight – Organisms with determinant growth patterns may also have a very strong relationship between their size and age, so that measurements of individual size may be useful in estimating an individual’s age – In these cases an objective is to determine an equation which allows you to estimate one measurement from another measurement that might be easier to make. Visual analysis of data, though useful, is limited We use other methods to determine if there is a sufficiently strong relationship between two measurements to possibly eliminate the necessity of making both measurements – In many fish, there is a very strong relationship between length and body mass, so that measuring the length alone can provide a very good estimate of the individual’s weight – Organisms with determinant growth patterns may also have a very strong relationship between their size and age, so that measurements of individual size may be useful in estimating an individual’s age – In these cases an objective is to determine an equation which allows you to estimate one measurement from another measurement that might be easier to make. 1. Introduction

Linear Relationships A linear relationship between two measurements is the simplest description of a strong relationship – Knowing one measurement allows you to find the other by just looking at the associated point on the line Linear relationships arise in many biological situations, particularly if the data are rescaled (see ch. 4) to account for factors that interact in a non-linear manner Recall the slope-intercept form for a line: we will need to go this direction A linear relationship between two measurements is the simplest description of a strong relationship – Knowing one measurement allows you to find the other by just looking at the associated point on the line Linear relationships arise in many biological situations, particularly if the data are rescaled (see ch. 4) to account for factors that interact in a non-linear manner Recall the slope-intercept form for a line: we will need to go this direction 1. Introduction

Bivariate data are measurements made on two variables per observation Bivariate data are often displayed as ordered pairs (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) or in a table: A regression is a formula that describes the relationship between variables. A linear regression is one in which the relationship can be expressed in the equation of a line. In this case, we find a line of best fit to describe the relationship. Bivariate data are measurements made on two variables per observation Bivariate data are often displayed as ordered pairs (x 1, y 1 ), (x 2, y 2 ), …, (x n, y n ) or in a table: A regression is a formula that describes the relationship between variables. A linear regression is one in which the relationship can be expressed in the equation of a line. In this case, we find a line of best fit to describe the relationship. 2. Bivariate Data xx1x1 x2x2 … xnxn yy1y1 y2y2 …ynyn

Freehand Linear Fit 1.See if the data appear to be linearly related – Make a scatter plot of the data. Choose your axes so that the range is approximately the same as that in your data – Which is the horizontal axis and which is the vertical? If one is the dependent (e.g. body weight depends on age, rather than the reverse), then choose the one that is dependent (weight) as the vertical axis and the independent one (age) as the horizontal axis If you do not have any reason to expect that one of the measurements is dependent on the other, then choose one and go with it – Eyeball the data and see if there appears to be any relationship. If the scatter plot looks like the points might be described approximately by a line then it is reasonable to proceed with fitting a line 1.See if the data appear to be linearly related – Make a scatter plot of the data. Choose your axes so that the range is approximately the same as that in your data – Which is the horizontal axis and which is the vertical? If one is the dependent (e.g. body weight depends on age, rather than the reverse), then choose the one that is dependent (weight) as the vertical axis and the independent one (age) as the horizontal axis If you do not have any reason to expect that one of the measurements is dependent on the other, then choose one and go with it – Eyeball the data and see if there appears to be any relationship. If the scatter plot looks like the points might be described approximately by a line then it is reasonable to proceed with fitting a line 3. Linear Analysis of Data

Freehand Linear Fit 2.Draw a freehand line that estimates the line of best fit; that is, draw a line through the data, trying to minimize the vertical distance between the line and the points: vertical orthogonal 3.Using two points ON THE LINE (not necessarily data points), find the equation of the line. 2.Draw a freehand line that estimates the line of best fit; that is, draw a line through the data, trying to minimize the vertical distance between the line and the points: vertical orthogonal 3.Using two points ON THE LINE (not necessarily data points), find the equation of the line. 3. Linear Analysis of Data

Example 3.1 (Freehand Linear Fit) Given the following set of bivariate data, use the freehand method to find a linear equation that fits the data. Step 1: Make a scatterplot: The data seem reasonably linearly related, so we go to the next step. Given the following set of bivariate data, use the freehand method to find a linear equation that fits the data. Step 1: Make a scatterplot: The data seem reasonably linearly related, so we go to the next step. 3. Linear Analysis of Data x y475811

Example 3.1 (Freehand Linear Fit) Step 2: Draw a line through the data that attempts to minimize the (vertical) distance between the line and the points: Notice that our line contains none of the data points! Step 2: Draw a line through the data that attempts to minimize the (vertical) distance between the line and the points: Notice that our line contains none of the data points! 3. Linear Analysis of Data

Example 3.1 (Freehand Linear Fit) Step 3: Next we pick two points ON THE LINE to determine the equation of the line. Let’s choose (3,6) and (6,10). Then: Step 3: Next we pick two points ON THE LINE to determine the equation of the line. Let’s choose (3,6) and (6,10). Then: 3. Linear Analysis of Data

Least Squares Fit We want to find and to minimize the sum of errors squared: 3. Linear Analysis of Data

Least Squares Fit Using some simple calculus techniques, one can easily discover that the best such values are given by where Using some simple calculus techniques, one can easily discover that the best such values are given by where 3. Linear Analysis of Data

Example 3.2 (Least Squares Linear Regression) Given the same set of data as before, use the Least Squares method to find a linear equation that fits the data. Step 1: Find the arithmetic means for x and y: Step 2: Find S xy and S xx : Given the same set of data as before, use the Least Squares method to find a linear equation that fits the data. Step 1: Find the arithmetic means for x and y: Step 2: Find S xy and S xx : 3. Linear Analysis of Data x y475811

Example 3.2 (Least Squares Linear Regression) Step 3: Find and : Step 4: Write down the equation: Compare with our freehand approximation: Step 3: Find and : Step 4: Write down the equation: Compare with our freehand approximation: 3. Linear Analysis of Data

Interpolation & Extrapolation Our linear least squares fitted line is a simple mathematical model for the dataset. Models can be used to describe relationships and make predictions about some physical system. As an example, we can use the linear equation to predict y- values for certain x-values: – If the given x-value lies within the range of x-values for our dataset, then the predicted y-value is an interpolation – If the given x-value lies outside the range of x-values for our dataset, then the predicted y-value is an extrapolation Our linear least squares fitted line is a simple mathematical model for the dataset. Models can be used to describe relationships and make predictions about some physical system. As an example, we can use the linear equation to predict y- values for certain x-values: – If the given x-value lies within the range of x-values for our dataset, then the predicted y-value is an interpolation – If the given x-value lies outside the range of x-values for our dataset, then the predicted y-value is an extrapolation 3. Linear Analysis of Data

Interpolation & Extrapolation: Example 3.3 Using the data and best fit line found in Example 3.2, estimate the y value that would correspond with the following x values: (a) x = 5, and (b) x = 10. Which corresponds with performing an interpolation and which with an extrapolation? Solution: a)x=5: interpolation a)x=10: extrapolation Using the data and best fit line found in Example 3.2, estimate the y value that would correspond with the following x values: (a) x = 5, and (b) x = 10. Which corresponds with performing an interpolation and which with an extrapolation? Solution: a)x=5: interpolation a)x=10: extrapolation 3. Linear Analysis of Data

We’d like to know if our LSR is good. Two sets of measurements are “correlated” if there appears to be some relationship between them In this section we present formulae applicable only for linear correlation Correlated does not imply causally related; e.g. leaf length and width We’d like to know if our LSR is good. Two sets of measurements are “correlated” if there appears to be some relationship between them In this section we present formulae applicable only for linear correlation Correlated does not imply causally related; e.g. leaf length and width 4. Correlation

Correlation Coefficient The correlation coefficient rho is given by: This value will always be between -1 and 1 – If ρ = -1, data are perfectly negatively correlated; if close, highly negatively correlated – If ρ = 0, data are uncorrelated; note: this only means they are not linearly related – If ρ = 1, data are perfectly positively correlated; if close, highly positively correlated ρ 2 x 100 tells us the percent of the variance accounted for by the LSR The correlation coefficient rho is given by: This value will always be between -1 and 1 – If ρ = -1, data are perfectly negatively correlated; if close, highly negatively correlated – If ρ = 0, data are uncorrelated; note: this only means they are not linearly related – If ρ = 1, data are perfectly positively correlated; if close, highly positively correlated ρ 2 x 100 tells us the percent of the variance accounted for by the LSR 4. Correlation

Figure 3.1 Various sets of data with their corresponding correlation coefficient shown above each data set. Notice in the bottom row that there can be a correlation in the data, but the value for the correlation coefficient ρ is zero because there is not a linear correlation 4. Correlation

Example 3.4 How correlated is the data presented in Example 3.1? In order to use the formula, we need to calculate S yy : We have, then, the correlation coefficient: highly correlated Finally, ρ 2 =.8464, so 85% of the variance in the dataset is accounted for by the LSR line How correlated is the data presented in Example 3.1? In order to use the formula, we need to calculate S yy : We have, then, the correlation coefficient: highly correlated Finally, ρ 2 =.8464, so 85% of the variance in the dataset is accounted for by the LSR line 4. Correlation

Homework Chapter 1: Chapter 2: 2.2–2.4, 2.7, 2.9, 2.10 Chapter 3: 3.3, 3.4, 3.9 Chapter 1: Chapter 2: 2.2–2.4, 2.7, 2.9, 2.10 Chapter 3: 3.3, 3.4, 3.9

Exercise 3.4 (d) The average length and width of various bird eggs are given in the following table. Step 1: Find the arithmetic means for x and y: Step 2: Find S xy and S xx : The average length and width of various bird eggs are given in the following table. Step 1: Find the arithmetic means for x and y: Step 2: Find S xy and S xx : Homework x y

Exercise 3.4 (d) Step 3: Find and : Step 4: Write down the equation: Step 3: Find and : Step 4: Write down the equation: Homework

Exercise 3.4 (e) In order to use the formula, we need to calculate S yy : We have, then, the correlation coefficient: highly positively correlated Finally, ρ 2 =.9975, so 99.75% of the variance in the dataset is accounted for by the LSR line In order to use the formula, we need to calculate S yy : We have, then, the correlation coefficient: highly positively correlated Finally, ρ 2 =.9975, so 99.75% of the variance in the dataset is accounted for by the LSR line Homework