1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Advertisements

1 Functions and Applications
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.
Chapter 4 Describing the Relation Between Two Variables
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
SIMPLE LINEAR REGRESSION
REGRESSION AND CORRELATION
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
SIMPLE LINEAR REGRESSION
Correlation & Regression Math 137 Fresno State Burger.
Linear Regression Analysis
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
Descriptive Methods in Regression and Correlation
Linear Regression.
HAWKES LEARNING SYSTEMS math courseware specialists Discovering Relationships Chapter 5 Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc.
SIMPLE LINEAR REGRESSION
Copyright © Cengage Learning. All rights reserved. 1 Functions and Their Graphs.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Linear Regression and Correlation
Copyright © Cengage Learning. All rights reserved.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Prior Knowledge Linear and non linear relationships x and y coordinates Linear graphs are straight line graphs Non-linear graphs do not have a straight.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1.6 Linear Regression & the Correlation Coefficient.
Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation & Regression
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Discovering Mathematics Week 9 – Unit 6 Graphs MU123 Dr. Hassan Sharafuddin.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Slide Copyright © 2009 Pearson Education, Inc. Types of Distributions Rectangular Distribution J-shaped distribution.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Linear Regression
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Copyright © Cengage Learning. All rights reserved.
Department of Mathematics
Regression and Correlation
1 Functions and Applications
Correlation & Regression
AND.
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Section 13.7 Linear Correlation and Regression
CHAPTER 10 Correlation and Regression (Objectives)
Functions and Their Graphs
Correlation and Regression
Linear Correlation and Regression
Presentation transcript:

1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School

2 Let us pause for a few moments… What are we working on in this chapter?

3 Problem Statement If we have a scatter plot that seems “linear”, can we find an equation that generates similar data? How accurate will it be?

4 Regression One important branch of inferential statistics, called regression analysis, is used to compare quantities or variables, to discover relationships that exist between them, and to formulate those relationships in useful ways.

5 Linear Regression Once a scatter diagram has been produced, we can draw a curve that best fits the pattern exhibited by the sample points. The best-fitting curve for the sample points is called an estimated regression curve. If the points in the scatter diagram seem to lie approximately along a straight line, the relationship is assumed to be linear, and the line that best fits the data points is called the estimated linear regression.

6 Linear Regression Linear regression is the process of determining the linear relationship between two variables. If we assume that the best-fitting curve is a line, then the equation of that line will take the form y = ax + b, where a is the slope of the line and b is the y- coordinate of the y -intercept. To identify the estimated regression line, we must find the values of the “regression coefficients” a and b.

7 Regression, 1 st approach

8 2 nd Approach: Med-Med Line

99 How do we evaluate accuracy? Root Mean Square Error (RMS) Sum of Squares of Residuals (SS res )

10 3 rd Approach: Least Squares For each x-value in the data set, the corresponding y-value usually differs from the value it would have if the data point were exactly on the line. These differences are shown in the figure by vertical line segments. The most common procedure is to choose the line where the sum of the squares of all these differences is minimized. This is called the method of least squares, and the resulting line is called the least squares line.

11 Linear Regression Linear regression is the process of determining the linear relationship between two variables. The line of best fit ( regression line or the least squares line ) is the line such that the sum of the squares of the vertical distances from the line to the data points (on a scatter diagram) is a minimum.

12 Linear Regression Formulas The least squares line (regression line) that provides the best fit to the data points ( x 1, y 1 ), ( x 2, y 2 ),… ( x n, y n ) has the equation

13 Med-Med vs. Least Squares The Median-Median Line is sometimes called the resistant line because it is not very influenced by one or two “bad” data points. The L east Squares Line uses every point in its calculation, so it is affected by outliers.

14 Example 1: Regression Suppose that we wish to get an idea of how the number of hours preparing for a final exam relates to the score on the exam. Data is collected and shown below. Hours Score

15 Linear Regression The first step in analyzing these data is to graph the results as shown in the scatter diagram on the next slide.

16 Scatter Diagram

17 Linear Regression If we let x denote hours studying and y denote exam score in the data of the previous slide and assume that the best-fitting curve is a line, then the equation of that line will take the form y = mx + b, where m is the slope of the line and b is the y- coordinate of the y-intercept. To identify the estimated regression line, we must find the values of the “regression coefficients” m and b.

18 Solution The equation is Example 1: Computing a Least Squares Line

19 Estimated Regression Line

20 Example: Med-Med vs. Best Fit Using Dobbie, Find the estimated regression line using both methods Hours Score

21 Example 2: Predicting from a Regression Line Use the result from the previous example to predict the exam score for a student that studied 6.5 hours. II) Best Fit: Use the equation and replace x with 6.5. Based on the given data, the student should make about an 81%. I) Med-Med: Use the equation and replace x with 6.5. Based on the given data, the student should make about an 82%.

Copyright © 2005 Pearson Education, Inc Linear Correlation and Regression

Slide Copyright © 2005 Pearson Education, Inc. Linear Correlation Linear correlation is used to determine whether there is a relationship between two quantities and, if so, how strong the relationship is.  The linear correlation coefficient, r, is a unitless measure that describes the strength of the linear relationship between two variables. If the value is positive, as one variable increases, the other increases. If the value is negative, as one variable increases, the other decreases. The variable, r, will always be a value between –1 and 1 inclusive.

Slide Copyright © 2005 Pearson Education, Inc. Scatter Diagrams A visual aid used with correlation is the scatter diagram, a plot of points (bivariate data).  The independent variable, x, generally is a quantity that can be controlled.  The dependant variable, y, is the other variable. The value of r is a measure of how far a set of points varies from a straight line.  The greater the spread, the weaker the correlation and the closer the r value is to 0.

Slide Copyright © 2005 Pearson Education, Inc. Correlation

Slide Copyright © 2005 Pearson Education, Inc. Correlation

Slide Copyright © 2005 Pearson Education, Inc. Linear Correlation Coefficient The formula to calculate the correlation coefficient (r) is as follows:

Slide Copyright © 2005 Pearson Education, Inc. There are five applicants applying for a job as a medical transcriptionist. The following shows the results of the applicants when asked to type a chart. Determine the correlation coefficient between the words per minute typed and the number of mistakes. Example: Words Per Minute versus Mistakes 934Nancy 1041Kendra 1253Phillip 1167George 824Ellen MistakesWords per MinuteApplicant

Slide Copyright © 2005 Pearson Education, Inc. We will call the words typed per minute, x, and the mistakes, y. List the values of x and y and calculate the necessary sums. Solution xy = 2,281y 2 = 510 x 2 =10,711y = 50x = y Mistakes xyy2y2 x2x2 x WPM

Slide Copyright © 2005 Pearson Education, Inc. Solution continued The n in the formula represents the number of pieces of data. Here n = 5.

Slide Copyright © 2005 Pearson Education, Inc. Solution continued Since 0.86 is fairly close to 1, there is a fairly strong positive correlation. This result implies that the more words typed per minute, the more mistakes made.

Slide Copyright © 2005 Pearson Education, Inc. Linear Regression Linear regression is the process of determining the linear relationship between two variables. The line of best fit (line of regression or the least square line) is the line such that the sum of the vertical distances from the line to the data points is a minimum.

Slide Copyright © 2005 Pearson Education, Inc. The Line of Best Fit Equation:

Slide Copyright © 2005 Pearson Education, Inc. Example Use the data in the previous example to find the equation of the line that relates the number of words per minute and the number of mistakes made while typing a chart. Graph the equation of the line of best fit on a scatter diagram that illustrates the set of bivariate points.

Slide Copyright © 2005 Pearson Education, Inc. Solution From the previous results, we know that Now we find the y-intercept, b. Therefore the line of best fit is y = 0.081x

Slide Copyright © 2005 Pearson Education, Inc. Solution continued To graph y = 0.081x , plot at least two points and draw the graph yx

Slide Copyright © 2005 Pearson Education, Inc. Solution continued