Daniela Stan Raicu School of CTI, DePaul University

Slides:



Advertisements
Similar presentations
Regression line – Fitting a line to data If the scatter plot shows a clear linear pattern: a straight line through the points can describe the overall.
Advertisements

BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
CHAPTER 3 Describing Relationships
Basic Practice of Statistics - 3rd Edition
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Chapter 5 Regression. Chapter outline The least-squares regression line Facts about least-squares regression Residuals Influential observations Cautions.
Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
BPS - 3rd Ed. Chapter 51 Regression. BPS - 3rd Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
BPS - 5th Ed. Chapter 51 Regression. BPS - 5th Ed. Chapter 52 u Objective: To quantify the linear relationship between an explanatory variable (x) and.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
CHAPTER 3 Describing Relationships
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 5: Regression ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Chapter 5: 02/17/ Chapter 5 Regression. 2 Chapter 5: 02/17/2004 Objective: To quantify the linear relationship between an explanatory variable (x)
CHAPTER 3 Describing Relationships
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
Essential Statistics Regression
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Basic Practice of Statistics - 3rd Edition
LSRL Least Squares Regression Line
Cautions about Correlation and Regression
Chapter 3.2 LSRL.
Daniela Stan Raicu School of CTI, DePaul University
Basic Practice of Statistics - 3rd Edition
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Chapter 2 Looking at Data— Relationships
Basic Practice of Statistics - 5th Edition
Daniela Stan Raicu School of CTI, DePaul University
Least-Squares Regression
Least Squares Regression Line LSRL Chapter 7-continued
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
^ y = a + bx Stats Chapter 5 - Least Squares Regression
CHAPTER 3 Describing Relationships
Daniela Stan Raicu School of CTI, DePaul University
Least-Squares Regression
Chapter 2 Looking at Data— Relationships
Basic Practice of Statistics - 5th Edition Regression
Daniela Stan Raicu School of CTI, DePaul University
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 5 LSRL.
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition Regression
Chapter 3: Describing Relationships
Least-Squares Regression
Essential Statistics Scatterplots and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
3.2 – Least Squares Regression
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Daniela Stan Raicu School of CTI, DePaul University CSC 323 Quarter: Spring 02/03 Daniela Stan Raicu School of CTI, DePaul University 11/10/2018 Daniela Stan - CSC323

Outline Chapter 2: Looking at Data – Relationships between two or more variables Remarks on Correlation (last slides from the previous lecture) Linear regression Least-squares regression line Residual Analysis Cautions about regression and correlation SAS procedures for univariate data, scatterplots, correlation and regression 11/10/2018 Daniela Stan - CSC323

Correlation X Y x1 y1 x2 y2 … xn yn The correlation r measures the direction and strength of the linear relationship between two quantitative variables. Suppose we have the following data: X Y x1 y1 x2 y2 … xn yn Where sx, sy are the standard deviations for the two variables X and Y 11/10/2018 Daniela Stan - CSC323

More on Correlation Correlation ignores distinction between explanatory and response variables Correlation requires that both variables be quantitative Correlation is not affected by changes in the unit of measurement of either variable Correlation measures the strength of only linear relationships Correlation is not resistant measure, so outliers can greatly change the value of r. 11/10/2018 Daniela Stan - CSC323

Not all Relationships are Linear Miles per Gallon versus Speed Curved relationship (r is misleading) Speed varies from 20 mph to 60 mph MPG varies from trial to trial, even at the same speed Statistical relationship Correlation measures the strength of only linear relationships 11/10/2018 Daniela Stan - CSC323

Problems with Correlations Outliers can inflate or deflate correlations Groups combined inappropriately may mask relationships (a third variable) groups may have different relationships when separated Plot Correlation is not resistant measure, so outliers can greatly change the value of r. 11/10/2018 Daniela Stan - CSC323

Linear Regression Objective: To quantify the linear relationship between an explanatory variable and response variable by fitting a line to the data (that is, drawing a line that comes as close as possible to the points). Example: Regression line 11/10/2018 Daniela Stan - CSC323

Linear Regression A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. Linear Regression equation: ^ y = a + b*x b = slope ~ rate of change a = intercept (x=0) Height= a + b*age 11/10/2018 Daniela Stan - CSC323

Prediction Use of Regression: to predict the value of y for any value of x by substituting this x into the equation of the regression line. Example: Prediction via Regression Line Husband and Wife: Ages The regression equation is y = 3.6 + 0.97x, where y is the average age of all husbands who have wives of age x For all women aged 30, we predict the average husband age to be 32.7 years: 3.6 + (0.97)(30) = 32.7 years Suppose we know that an individual wife’s age is 30. What would we predict her husband’s age to be? 11/10/2018 Daniela Stan - CSC323

Least-squares Regression Used to determine the “best” line; We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict) The least - squares regression line of y on x is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible. Y Observed value y Error Predicted value       A residual is the difference between an observed value of the response variable y and the value predicted by the regression line. x 11/10/2018 Daniela Stan - CSC323

The regression line makes the prediction errors as small as possible. Least - Squares Regression The regression line makes the prediction errors as small as possible. 11/10/2018 Daniela Stan - CSC323

Least - Squares Regression (cont.) How is the least – squares regression line calculated? = predicted value Where: r = correlation, Sx,Sy = standard deviations = means 11/10/2018 Daniela Stan - CSC323

Coefficient of Determination (R2) Measures usefulness of regression prediction R2 (or r2, the square of the correlation): measures how much variation in the values of the response variable (y) is explained by the regression line Example: r=1: R2=1: regression line explains/captures all (100%) of the variation in y r=.7: R2=.49: regression line explains almost half (50%) of the variation in y 11/10/2018 Daniela Stan - CSC323

A Caution: Beware of Extrapolation Extrapolation is the use of regression line for prediction outside the range values of the explanatory variable x that you used to obtain the line. Such predictions are often not accurate. Sarah’s height was plotted against her age Can you predict her height at age 42 months? Can you predict her height at age 30 years (360 months)? 11/10/2018 Daniela Stan - CSC323

A Caution: Beware of Extrapolation Regression line: y = 71.95 + .383 x height at age 42 months? y = 88 height at age 30 years? y = 209.8 She is predicted to be 6’ 10.5” at age 30. 11/10/2018 Daniela Stan - CSC323

Accuracy of the predictions One possible measure of the accuracy of the regression predictions is given by the root mean square error (r.m.s. error). The r.m.s. error is defined as the square root of the average of the square residuals: In large data sets, the r.m.s. error is approximately equal to 11/10/2018 Daniela Stan - CSC323

Confounding factor A confounding factor is a variable that has an important effect on the relationship among the variables in a study but it is not included in the study. Example: The mathematics department of a large university must plan the timetable for the following year. Data are collected on the enrollment year, the number x of first-year students and the number y of students enrolled in elementary math courses. The fitted regression line has equation: =2491.69+1.0663 x R2=0.694. 11/10/2018 Daniela Stan - CSC323

Influential Point An observation is influential for the regression line, if removing it would change considerably the fitted line. An influential point pulls the regression line towards itself. Regression line if  is omitted                  Influential point/outlier       11/10/2018 Daniela Stan - CSC323

Summary - Warnings Correlation measures linear association, regression line should be used only when the association is linear. Extrapolation – do not use the regression line to predict values outside the observed range – predictions are not reliable. Correlation and regression line are sensitive to influential / extreme points. 11/10/2018 Daniela Stan - CSC323

Data Mining Domain Understanding Data Selection Cleaning & Exploring really large data bases in the hope of finding useful patterns is called data mining. Domain Understanding Data Selection Cleaning & Preprocessing Knowledge Evaluation & Interpretation Discovering patterns The entire process is iterative and interactive. 11/10/2018 Daniela Stan - CSC323