TODAY we will Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Chapter 12 Inference for Linear Regression
Forecasting Using the Simple Linear Regression Model and Correlation
Diagnostics – Part I Using plots to check to see if the assumptions we made about the model are realistic.
Inference for Regression
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Chapter 3 Bivariate Data
Correlation and Linear Regression
BA 555 Practical Business Analysis
Chapter Topics Types of Regression Models
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Simple Linear Regression Analysis
Regression Diagnostics Checking Assumptions and Data.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Model Checking Using residuals to check the validity of the linear regression model assumptions.
Warm-up with 3.3 Notes on Correlation
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Prior Knowledge Linear and non linear relationships x and y coordinates Linear graphs are straight line graphs Non-linear graphs do not have a straight.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Data Handling & Analysis BD7054 Scatter Plots Andrew Jackson
Warm-up with 3.3 Notes on Correlation Universities use SAT scores in the admissions process because they believe these scores provide some insight into.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
 Graph of a set of data points  Used to evaluate the correlation between two variables.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Creating a Residual Plot and Investigating the Correlation Coefficient.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Inference for Regression
Inference for Least Squares Lines
Linear Regression.
Section 3.2: Least Squares Regression
SCATTERPLOTS, ASSOCIATION AND RELATIONSHIPS
(Residuals and
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
No notecard for this quiz!!
Unit 3 – Linear regression
PENGOLAHAN DAN PENYAJIAN
Residuals and Residual Plots
Review of Chapter 3 Examining Relationships
Regression Assumptions
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Regression Assumptions
Review of Chapter 3 Examining Relationships
Presentation transcript:

TODAY we will Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate for predictions Understand how the Standard error (Se) is used in regression Analysis

Review  How to describe a scatterplot  Correlation Coefficient ( r )  Math Vs. Stats  Equation of Line vs. LSRL  Interpret Slope and y-intercept  What is a residual (or error)?

Review How to describe a scatterplot  Trend ~ Positive or Negative  Form ~ Linear or non Linear  Strength ~ moderate, weak or strong  Correlation Coefficient ( r ) -1< r < 1 Strength R Close to 1 or -1 ~ Strong association R Close to 0 ~ Weak or no linear association  Trend Positive association (as x variable increase, y variable also increase) Negative Association (as x variable increase, Y variable decrease)

Review  Math vs. Stats  Equation of Line vs. LSRL Line  Math  y = mx + b Line  Stats 

Review Interpret Slope and y-intercept  Slope: For every one unit of x, y increases (decreases) on average by the slope.  Y-intercept When the value of the variable x=0 then the value of the variable y = “a”

Review What is a residual (or Error) Observed y Predicted y } residual Error = Residuals OBSERVED Y VALUE – Predicted Y value

Use Residual Analysis to assess if the model (LSRL) is appropriate for making predictions

Correlation and Linearity and Outliers  Only use linear correlation to interpret the data when there is a linear relationship  An outlier can strongly influence the correlation.

Fitting a Model for Prediction or Fitting the LRSL for Prediction Stochastic MESSAGES All models are wrong but some are useful Text Deterministic Residual Analysis Address directly the problem of Signal and Noise Allow Random Variation A model is not the reality Signal Noise

Signal and Noise

Types of Residual plots Different plots can highlight different departures or problems in the prediction model. 1)Residual vs. Fitted 2)Histogram 3)PP~PLOT 4)Order vs. Fitted Note: these plots are from software output (Minitab)

Residual vs. Fitted value plot  Three common defects may be revealed by plotting residuals vs. fitted value  1) Outliers  2) Progressive change in the variance: Band of uniform width Funnel shape = not equal variance : transform  3) inadequacy of the model : Curvature ~ wrong model Linear trend going up ~ wrong calculation

Residual vs. Fitted

Let's look at an example to see what a "well-behaved" residual plot looks like.

Scatterplot Some researchers (Urbano- Marquez, et al., 1989) were interested in determining whether or not alcohol consumption was linearly related to muscle strength. The researchers measured the total lifetime consumption of alcohol (x) on a random sample of n = 50 alcoholic men. They also measured the strength (y) of the deltoid muscle in each person's nondominant arm. A fitted line plot of the resulting data, (alcoholarm.txt), looks like:

Scatterplot. Residual Plot Residual vs. Fitted

Let's look at an example to see what a ”not so well-behaved" residual plot looks like.

What do you notice in this scatterplot? 0 OUTLIER Scatterplot Residual plot Predicted or Fitted Foot length

0 Predicted or Fitted

Outlier Removed Predicted or Fitted 0

Let's look at an example to see what a ”not well-behaved" residual plot looks like.

0

Heteroscedasticity  When the requirement of a constant variance is violated we have a condition of heteroscedasticity.  Diagnose heteroscedasticity by plotting the residual against the predicted y The spread increases with y ^ y ^ Residual ^ y

Signal and Noise

Residuals plots fitted vs. residuals Homoscedasticity vs. Heteroscedasticity Homoscedasticity A residual plot is a scatterplot of the standardized residuals against the fitted values

Let's look at an example to see what a ”not well-behaved" residual plot looks like.

How does a non-linear regression function show up on a residual vs. fits plot? The answer: The residuals depart from 0 in some systematic manner, such as being positive for small x values, negative for medium x values, and positive again for large x values. Any systematic (non-random) pattern is sufficient to suggest that the regression function is not linear.

2) The random errors are normally distributed and centered at zero Histograms + PP PLOTS --  Normality assumption Histogram show why center at zero and why bell shape QQ plots better to discover the normal shape because the histogram bins can be manipulated and therefore the normal shape maybe difficult in some cases.

Histograms of residuals Centered at zero Bell shaped No outliers What to look for? Centered at zero Bell shaped No outliers How strict? Centered at zero Bell shaped No outliers What does it mean when Histogram is skewed

R, R-squared,SE 4 in one residual plots

Look at this graph normal residuals???

Here's the corresponding normal probability plot of the residuals:

residuals vs. order plot residuals vs. order plot" as a way of detecting a particular form of non- independence of the error terms, namely serial correlation. If the data are obtained in a time (or space) sequence, a residuals vs. order plot helps to see if there is any correlation between the error terms that are near each other in the sequence. The plot is only appropriate if you know the order in which the data were collected! Highlight this, underline this, circle this,..., er, on second thought, don't do that if you are reading it on a computer screen. Do whatever it takes to remember it though — it is a very common mistake made by people new to regression analysis. So, what is this residuals vs. order plot all about? As its name suggests, it is a scatter plot with residuals on the y axis and the order in which the data were collected on the x axis. Here's an example of a well-behaved residuals vs. order plot:

Residual Vs. Order The residuals bounce randomly around the residual = 0 line as we would hope so. In general, residuals exhibiting normal random noise around the residual = 0 line suggest that there is no serial correlation.

A residuals vs. order plot that exhibits (positive) trend as the following plot does: Residual Vs. Order

R-SquaredResidual Standard Error R2R2 ResidualsSe Residuals Analysis is more important than High R 2

Residual Activity