Lecture 22: Thurs., April 1 Outliers and influential points for simple linear regression Multiple linear regression –Basic model –Interpreting the coefficients.

Slides:



Advertisements
Similar presentations
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Advertisements

Residuals.
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Lecture 16 – Thurs., March 4 Chi squared test for M&M experiment Simple linear regression (Chapter 7.2) Next class after spring break: Inference for simple.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Lecture 20 Simple linear regression (18.6, 18.9)
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
CHAPTER 3 Describing Relationships
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Correlation and Regression Analysis
Regression and Correlation Methods Judy Zhong Ph.D.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Linear Regression. Simple Linear Regression Using one variable to … 1) explain the variability of another variable 2) predict the value of another variable.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Confidence Interval of a Mean
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Lesson Correlation and Regression Wisdom. Knowledge Objectives Recall the three limitations on the use of correlation and regression. Explain what.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Get out p. 193 HW and notes. LEAST-SQUARES REGRESSION 3.2 Interpreting Computer Regression Output.
Nonparametric Statistics STAT E-150 Statistical Methods.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Lecture Slides Elementary Statistics Twelfth Edition
The simple linear regression model and parameter estimation
Chapter 11: Linear Regression and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Least Squares Regression Line.
CHAPTER 3 Describing Relationships
Simple Linear Regression - Introduction
Regression and Residual Plots
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 29: Multiple Regression*
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Homework: PG. 204 #30, 31 pg. 212 #35,36 30.) a. Reading scores are predicted to increase by for each one-point increase in IQ. For x=90: 45.98;
CHAPTER 3 Describing Relationships
Presentation transcript:

Lecture 22: Thurs., April 1 Outliers and influential points for simple linear regression Multiple linear regression –Basic model –Interpreting the coefficients

Outliers and Influential Observations An outlier is an observation that lies outside the overall pattern of the other observations. A point can be an outlier in the x direction, the y direction or in the direction of the scatterplot. For regression, the outliers of concern are those in the x direction and the direction of the scatterplot. A point that is an outlier in the direction of the scatterplot will have a large residual. An observation is influential if removing it markedly changes the least squares regression line. A point that is an outlier in the x direction will often be influential. The least squares method is not resistant to outliers. Follow outlier examination strategy in Display 3.6 for dealing with outliers in x direction and outliers in direction of scatterplot.

Outliers Example Does the age at which a child begins to talk predict a later score on a test of mental ability? gesell.JMP contains data on the age at first word (x) and their Gesell Adaptive score (y), an ability test taken much later. Child 18 is an outlier in the x direction and potentially influential. Child 19 is an outlier in the direction of the scatterplot. To assess whether a point is influential, fit the least squares line with and without the point (excluding the row to fit it without the point) and see how much of a difference it makes. Child 18 is influential.

Will You Take Mercury With Your Fish? Too much mercury in one’s body results in memory loss, depression, irritability and anxiety – the “mad hatter” syndrome. Rivers and oceans contain small amounts of mercury which can accumulate in fish over their lifetimes. Concentration of mercury in fish tissue can be obtained at considerable expense by catching fish and sending samples to a lab for analysis. It is important to understand the relationship between mercury concentration and measurable characteristics of a fish such as length and weight in order to develop safety guidelines about how much fish to eat.

Data Set mercury.JMP contains data from a study of large mouth bass in the Wacamaw and Lumber rivers in North Carolina. At several stations along each river, a group of fish were caught, weighted, and measured. In addition a filet from each fish caught was sent to the lab so that the tissue concentration of mercury could be determined for each fish. We want to predict Y=mercury concentration per weight measured in parts per million based on X 1 =length (centimeters) and X 2 =weight (measured in grams)

Multiple Regression Model Multiple Regression: Seeks to estimate the mean of Y given multiple explanatory variables X 1,…,X p, denoted by Assumptions of ideal multiple linear regression model – (linearity) – (constant variance) –Distribution of Y for each subpopulation X 1,…,X p is normally distributed. –The selection of an observation from any of the subpopulations is independent of the selection of any other observation.

Multiple Regression Model: Another Representation Data: We observe Ideal Multiple Regression Model – has normal distribution with mean=0, SD= – are independent = “error” = error from predicting by its subpopulation mean

Estimation of Multiple Linear Regression Model The coefficients are estimated by choosing to make the sum of squared prediction errors as small as possible, i.e., choose to minimize Predicted value of y given x 1,…,x p : = SD(Y|X 1,…,X p ), estimated by = root mean square error

Multiple Linear Regression in JMP Analyze, Fit Model Put response variable in Y Click on explanatory variables and then click Add under Construct Model Effects Click Run Model.

Multiple Regression for Mercury Data

Residuals and Root Mean Square Error from Multiple Regression Residual for observation i = Root Mean Square Error = As with simple linear regression, under the ideal multiple linear regression model – Approximately 68% of predictions of a future Y based on will be off by at most –Approximately 95% of predictions of a future Y based on will be off by at most

Interpreting the Coefficients = increase in mean of Y that is associated with a one unit (1cm) increase in length, holding fixed weight = increase in mean of Y that is associated with a one unit (1 gram) increase in weight, holding fixed length Interpretation of multiple regression coefficients depends on what other explanatory variables are in the model. See handout.