Prediction and Accuracy

Slides:



Advertisements
Similar presentations
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Advertisements

Correlation and Linear Regression
Advanced Algebra II Notes 3.5 Residuals Residuals: y-value of data point – y-value on the line Example: The manager of Big K Pizza must order supplies.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Statistics Measures Chapter 15 Sections
Jeopardy Topic 1Topic Q 1Q 6Q 11Q 16Q 21 Q 2Q 7Q 12Q 17Q 22 Q 3Q 8Q 13Q 18Q 23 Q 4Q 9Q 14Q 19Q 24 Q 5Q 10Q 15Q 20Q 25.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Determining How Costs Behave
Chapter 2 Linear regression.
Copyright © Cengage Learning. All rights reserved.
Department of Mathematics
CHAPTER 1 Linear Equations Section 1.1 p1.
Topics
INTRODUCTION TO STATISTICS
Regression Analysis AGEC 784.
Inference for Least Squares Lines
1 Functions and Applications
Statistical Data Analysis - Lecture /04/03
Statistics 101 Chapter 3 Section 3.
The Lease Squares Line Finite 1.3.
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Chapter 5 STATISTICS (PART 4).
7.3 Best-Fit Lines and Prediction
Introduction to Summary Statistics
Introduction The fit of a linear function to a set of data can be assessed by analyzing residuals. A residual is the vertical distance between an observed.
Homework: Residuals Worksheet
2.1 Equations of Lines Write the point-slope and slope-intercept forms
Mathematical Modeling
Chapter 8 Part 2 Linear Regression
Correlation and Regression
Mathematical Modeling Making Predictions with Data
Regression.
Statistical Analysis Error Bars
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Exam 5 Review GOVT 201.
7.3 Best-Fit Lines and Prediction
Descriptive Analysis and Presentation of Bivariate Data
Least Squares Method: the Meaning of r2
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Objectives Identify linear functions and linear equations.
Objectives Identify linear functions and linear equations.
The graph represents a function because each domain value (x-value) is paired with exactly one range value (y-value). Notice that the graph is a straight.
Chapter 3 Describing Relationships Section 3.2
3 4 Chapter Describing the Relation between Two Variables
Common Core Math I Unit 2: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Graphs, Linear Equations, and Functions
Identifying Linear Functions
Functions and Their Graphs
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Lesson 2.2 Linear Regression.
DSS-ESTIMATING COSTS Cost estimation is the process of estimating the relationship between costs and cost driver activities. We estimate costs for three.
Chapter 2: Linear Relations and Functions
Scatter Plot 3 Notes 2/6/19.
Chapter 8 – Quadratic Functions and Equations
Identifying Linear Functions
Objectives Identify linear functions and linear equations.
Graphing a Linear Equation
Linear Regression Dr. Richard Jackson
Introduction The fit of a linear function to a set of data can be assessed by analyzing residuals. A residual is the vertical distance between an observed.
Homework: pg. 180 #6, 7 6.) A. B. The scatterplot shows a negative, linear, fairly weak relationship. C. long-lived territorial species.
Objectives Identify linear functions and linear equations.
Lesson 2.2 Linear Regression.
Warm Up 1. Solve 2x – 3y = 12 for y. 2. Graph for D: {–10, –5, 0, 5, 10}.
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Prediction and Accuracy Lesson 3.5

The median-median line method gives you a process by which anyone will get the same line of fit for a set of data. However, unless your data are perfectly linear, even the median-median line will not be perfect. In some cases you can find a more satisfactory model if you draw a line by hand. Having a line that “looks better” is not a convincing argument that it really is better.

You need a way to evaluate how accurately your model describes the data. One excellent method is to look at the residuals, or the vertical differences between the points in your data set and the points generated by your line of fit.

Similar to the deviation from the mean that you learned about in Chapter 2, a residual is a signed distance.

Here, a positive residual indicates that the point is above the line, and a negative residual indicates that the point is below the line.

A line of good fit should have about as many points above it as below A line of good fit should have about as many points above it as below. This means that the sum of the residuals should be near zero.

. In other words, if you connect each point to the line with a vertical segment, the sum of the lengths of the segments above the line should be about equal to the sum of those below the line.

Example The manager of Big K Pizza must order supplies for the month of November. The numbers of pizzas sold in November during the past four years were 512, 603, 642, and 775, respectively. How many pizzas should she plan for this November?

Let x represent the past four years, 1 through 4, and let y represent the number of pizzas. Graph the data. The median-median method gives the linear model.

Evaluate the median-median line when x is 1, 2, 3, and 4. Calculating the residuals is one way to evaluate the accuracy of this model before using it to make a prediction. Evaluate the median-median line when x is 1, 2, 3, and 4. Subtract the two y-values to calculate the residuals. Year 1 2 3 4 Number of pizzas (y) 512 603 642 775 Y value on median-median line 505.0 592.7 680.3 768.0 Residual 7.0 10.3 -38.3

The sum of the residuals is 7+10. 3+ (-38. 3) + 7. 0, or -14 The sum of the residuals is 7+10.3+ (-38.3) + 7.0, or -14.0 pizzas, which is fairly close to zero in relation to the large number of pizzas purchased. The linear model is therefore a pretty good fit. The sum is negative, which means that all together the points below the line are a little farther away than those above. Year 1 2 3 4 Number of pizzas (y) 512 603 642 775 Y value on median-median line 505.0 592.7 680.3 768.0 Residual 7.0 10.3 -38.3

Using the median-median line the manager should plan for 417. 3 + 87 Using the median-median line the manager should plan for 417.3 + 87.7(5), or about 856 pizzas, she will be very close to the linear pattern established over the past four years. Year 1 2 3 4 Number of pizzas (y) 512 603 642 775 Y value on median-median line 505.0 592.7 680.3 768.0 Residual 7.0 10.3 -38.3

likely to spoil or whether or not she can easily order more supplies. Because the residuals range from -38.3 to 7, she may want to adjust her prediction higher or lower depending on factors such as whether or not the supplies are likely to spoil or whether or not she can easily order more supplies. Year 1 2 3 4 Number of pizzas (y) 512 603 642 775 Y value on median-median line 505.0 592.7 680.3 768.0 Residual 7.0 10.3 -38.3

Spring Experiment In this investigation you will collect data on how a spring responds to various weights. You will create a model using a median-median line and then you will use residuals to judge the accuracy of any predictions made using your model.

Procedural Note Attach the mass holder to the spring. Hang the spring from a support, and the mass holder from the spring. Measure the length of the spring (in centimeters) from the first coil to the last coil.

Place different amounts of mass on the mass holder, recording the corresponding length of the spring each time. Collect about 10 data points of the form (mass, spring length). Plot the data and find the median-median line. Give the real-world meanings of the slope and the y- intercept.

145 8.65

Use a table to organize your data and calculate the residuals Use a table to organize your data and calculate the residuals. Then answer these questions. a) What is the sum of the residuals? Does it appear that your linear model is a good fit for the data? Explain. Sum of the residuals: -0.92 which is very small relative to the size of the y values. So a linear fit is very good.

Use a table to organize your data and calculate the residuals Use a table to organize your data and calculate the residuals. Then answer these questions. b) Find the greatest positive and negative residuals. What could the magnitude of these residuals indicate? The greatest positive and negative residuals are -0.221 for 220 g and 0.171 for 90 g. These could indicate an error in measuring for these two masses.

Use a table to organize your data and calculate the residuals Use a table to organize your data and calculate the residuals. Then answer these questions. c) If you used your model to predict the length of the spring for a particular weight and then measured the spring for that weight, would you be surprised if the predicted length and the measured length were different? Why or why not? Use your model to predict the length of your spring for a weight 2 units larger than your heaviest weight. Now give an estimate in the form ___ + ___ that you believe would contain the actual measured length.

The sum of the residuals for your line of fit should be close to zero The sum of the residuals for your line of fit should be close to zero. However, as shown at right, you could find a poorly fitting line and still get zero for the sum of the residuals. A line of good fit should also follow the direction of the points. This means that the individual residuals should be as close to zero as possible.

The sum of the residuals does not give a complete picture. We need some other way to judge how accurate predictions from a model will be. One useful measure of accuracy starts by squaring the residuals to make them all positive.

The pizza data and the linear model in Example A are illustrated in the table below. The sum of the squares is quite large: 1674.22 pizzas2.

Some kind of average would give a better indication of the error in the predictions from this model. You might divide by 4 since there are four data points. But this is not necessarily the best thing to do. Remember it takes a minimum of two points to make any equation of a line. So two of the points can be thought of as defining the line. The other points determine the spread. So you should actually divide by 2 less than the number of data points.

The measure of the error should be measured in numbers of pizzas, rather than pizzas2, so take the square root of this number. This means that generally this line should predict values for the number of pizzas that are within 28.9 pizzas of the actual data. This value is called the root mean square error.

Root Mean Square Root The root mean square error, s, is a measure of the spread of data points from a model. where yi represents the y-values of the individual data pairs, represents the respective y-values predicted from the model, and n is the number of data pairs.

Example B A scientist measures the current in milliamps through a circuit with constant resistance as the voltage in volts is varied. What is the root mean square error for the model ? What is the real-world meaning of the root mean square error? Predict the current when the voltage is 25.000 volts. How accurate is the prediction?

A graph of the data with the linear model shows a good fit. To calculate the root mean square error, first calculate the residual for each data point, take the sum of the squares of the residuals, divide by n - 2, and take a square root.

This means that values predicted by this equation will generally be within 0.0034 milliamp of the actual current.

For a voltage of 25. 000 volts, the model gives 0. 47(25. 000), or 11 For a voltage of 25.000 volts, the model gives 0.47(25.000), or 11.750 milliamps, as the current. Considering the root mean square error, the scientist could expect a reading between 11.750- .0034 (11.7466) milliamps and 11.750+0.0034 (11.7534) milliamps.