Linear Modeling-Trendlines  The Problem - Last time we discussed linear equations (models) where the data is perfectly linear. By using the slope-intercept.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Unit 4: Linear Relations Minds On 1.Determine which variable is dependent and which is independent. 2.Graph the data. 3.Label and title the graph. 4.Is.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Chapter 10 Regression. Defining Regression Simple linear regression features one independent variable and one dependent variable, as in correlation the.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
Week 1 LSP 120 Joanna Deszcz.  Relationship between 2 variables or quantities  Has a domain and a range  Domain – all logical input values  Range.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Prepared by Ozlem Elgun1.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
Studies of the metabolism of alcohol consistently show that blood alcohol content (BAC), after rising rapidly after ingesting alcohol, declines linearly.
Prepared by Ozlem Elgun
Regression and Correlation
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
LSP 120: Quantitative Reasoning and Technological Literacy Section 903 Özlem Elgün.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation and Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Linear Functions and Modeling
Simple Linear Regression
Correlation & Regression
Lecture 16 Correlation and Coefficient of Correlation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
2- 4-’13 What have we reviewed so far? Real Numbers and Their Porperties. Equations and Inequalities with one variable. Functions and Special Functions.
Chapter 10 Correlation and Regression
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Modeling a Linear Relationship Lecture 47 Secs – Tue, Apr 25, 2006.
Scatterplots are used to investigate and describe the relationship between two numerical variables When constructing a scatterplot it is conventional to.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Correlation The apparent relation between two variables.
Financial Statistics Unit 2: Modeling a Business Chapter 2.2: Linear Regression.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.3 Prepared.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
CHAPTER 3 Describing Relationships
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.2 Prepared.
Independent Dependent Scatterplot Least Squares
CORRELATION ANALYSIS.
ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.
UNIT 2 BIVARIATE DATA. BIVARIATE DATA – THIS TOPIC INVOLVES…. y-axis DEPENDENT VARIABLE x-axis INDEPENDENT VARIABLE.
Unit 3 – Association: Contingency, Correlation, and Regression Lesson 3-3 Linear Regression, Residuals, and Variation.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Copyright © Cengage Learning. All rights reserved.
Regression and Correlation
Scatterplots Chapter 6.1 Notes.
Inference for Least Squares Lines
SIMPLE LINEAR REGRESSION MODEL
CHAPTER 10 Correlation and Regression (Objectives)
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Lecture Slides Elementary Statistics Thirteenth Edition
Regression Analysis PhD Course.
Correlation and Regression
Correlation and Regression
CORRELATION ANALYSIS.
Algebra Review The equation of a straight line y = mx + b
7.1 Draw Scatter Plots and Best Fitting Lines
Modeling a Linear Relationship
Simple Linear Regression
Honors Statistics Review Chapters 7 & 8
Presentation transcript:

Linear Modeling-Trendlines  The Problem - Last time we discussed linear equations (models) where the data is perfectly linear. By using the slope-intercept formula, we derived linear equation/models. In the “ real world ” most data is not perfectly linear. How do we handle this type of data?  The Solution - We use trendlines (also known as line of best fit and least squares line).  Why - If we find a trendline that is a good fit, we can use the equation to make predictions. Generally we predict into the future (and occasionally into the past) which is called extrapolation. Constructing points between existing points is referred to as interpolation.

Is the trendline a good fit for the data?  To answer this question, you need to address the following five guidelines: Guideline 1: Do you have at least 7 data points? For the datasets that we use in this class, you should use at least 7 of the most recent data points available.

Guideline 2: Does the R-squared value indicate a relationship? R2 is a standard measure of how well the line fits the data. If R2 is very low, it tells us the model is not very good and probably shouldn't be used. The R- squared value is also called the Coefficient of Determination and can be written as r 2 or R2.  If the R 2 = 1, then there is a perfect match between the line and the data points. If the R 2 = 0, then there is no relationship between n the x and y values.  If the R 2 value is between.7 and 1.0, there is a strong linear relationship  If the R 2 value is between.4 and.7, there is a moderate linear relationship.  If the R 2 value is below.4, the relationship is weak and you should not use this data to make predictions.

More facts …..  R 2 is a measure that allows us to determine how certain one can be in making predictions from a certain model/graph.  The coefficient of determination is such that 0 < r 2 < 1, and denotes the strength of the linear association between x and y.  The coefficient of determination represents the percent of the data that is the closest to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation). The other 15% of the total variation in y remains unexplained.

Calculating the coefficient of determination  The mathematical formula for computing r is: where n is the number of pairs of data.  To compute R 2, just square the result from the above formula.

Having a high R-squared value is not enough. When does a model fit well? A relatively high R-squared value does not guarantee that the model is a good one. There are some other factors you should look for.

A good model will have fairly random distribution of data points above and below the line. For example: The lean of the leaning tower of Pisa…

Year Lean of LeaningTower of Pisa (tenths of mm in excess of 2.9 meters)

In contrast, consider the following data: North Sea Plaice Length vs. weight.

North Sea Plaice Length (cm)Weight (g)

Guidelines continued …  Guideline 3: Verify that your trendline fits the shape of your graph. For example, if your trendline continues upward, but the data makes a downward turn during the last few years, verify that the “higher” prediction makes sense (use practical knowledge). In some cases it is obvious that you have a localized trend. Localized trends will be discussed at a later date.

A related situation occurs when there is a consistent long term uptrend with an abrupt change toward the end: US Murder Rate (per 100,000 population)

YearUS Murder Rate (per 100,000 population)

Guideline 4: Look for outliers:  Outliers should be investigated carefully. Often they contain valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, try to understand why they appeared and whether it is likely similar values will continue to appear. Of course, outliers are often bad data points. If the data was entered incorrectly, it is important to find the right information and update it. In some cases, the data is correct and an anomaly occurred that partial year. The outlier can be removed if it is justified. It must also be documented.

Individual points can very strong effects on models. Watch out for them. Consider the following data: Length of animal vs. running speed.

Animal Length (cm) Running Speed(cm/sec) Clover mite Anyestid mite Argentine ant Ant Deer mouse9250 Lizard15720 Chipmunk16480 Iguana24730 Squirrel25760 Fox Cheetah Ostrich

Ostrich

Guideline 5: Practical Knowledge  How many years out can we predict? Based on what you know about the topic, does it make sense to go ahead with the prediction? Use your subject knowledge, not your mathematical knowledge to address this guideline.