Univariate Point Estimation Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 1 Chapter 4: Statistical Approaches.

Slides:



Advertisements
Similar presentations
Kin 304 Regression Linear Regression Least Sum of Squares
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Inference for Regression
Objectives (BPS chapter 24)
Simple Linear Regression
Inference for Regression 1Section 13.3, Page 284.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Predictive Analysis in Marketing Research
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Correlation & Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Simple Linear Regression Models
Chapter 6 & 7 Linear Regression & Correlation
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Name: Angelica F. White WEMBA10. Teach students how to make sound decisions and recommendations that are based on reliable quantitative information During.
Introduction to Linear Regression
©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Chapter 16 Data Analysis: Testing for Associations.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 12 Simple Linear Regression.
Chapter 12: Correlation and Linear Regression 1.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
The simple linear regression model and parameter estimation
Correlation, Bivariate Regression, and Multiple Regression
Virtual COMSATS Inferential Statistics Lecture-26
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 12: Regression Diagnostics
Chapter 13 Simple Linear Regression
Inference for Regression Lines
Simple Linear Regression
Introductory Statistics
Presentation transcript:

Univariate Point Estimation Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 1 Chapter 4: Statistical Approaches to Estimation and Prediction

Review: Data Mining Tasks 2 Descriptive model

Introduction Numerical variables statistical analysis have been performing data mining for over a century Univariate, based on one variable – Point Estimation, measure of center (mean, median, mode), measure of location (percentile, quantile) – Measure of spread or measure of variability – Confidence Interval Estimation 3

Measure of Spread 4 For both: The mean P/E ratio is 10, the median is 11, and the mode is 11 The measures of variability for portfolio A should be larger than those of B

Measure of Spread Range – (maximum - minimum) Standard deviation Mean absolute deviation Interquartile range – IQR = Q3 – Q1 5

Statistical Inference 6 Statistical inference consists of methods for estimating and testing hypotheses about population characteristics based on the information contained in the sample.

How confidence are we in our estimates? Confidence Interval – point estimate ± margin of error Example: – t-interval is used when the population is normal or the sample size is large 7

T-interval 8

T-interval Example: Customer Service Calls 9 We are 95% confident that the population mean number of customer service calls for all customers falls between and calls.

TOC Univariate – Point Estimation – Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 10

Bivariate Method: Linear Regression Bivariate: – Using the value of one variable to estimate the value of a different variable Linear Regression: – An approach for modeling the linear relationship between predicator (x) and response (y) variables. 11

Cereal dataset (from Data and Story Library, 12

Scatter plot of nutritional rating versus sugar content 13

Regression Analysis: Rating versus Sugars 14

Regression Analysis: Rating versus Sugars The estimated regression equation (ERE): yˆ = 59.4 − 2.42(sugars). coefficients: – Y-intercept or constant b 0 = 59.4 – Slope b 1 = −2.42 SE coef: the standard errors of the coefficients, a measure of the variability of the coefficients. S, the standard error of the estimate, indicates a measure of the size of the “typical” error in prediction. R-squared is a measure of how closely the linear regression model fits the data, (90 to 100%: a very nice fit) Fit: estimated rating using the regression equation Residual, prediction error or estimation error: (y-yˆ) CI: confidence Interval PI: Prediction Interval 15

Confidence Intervals for the Mean Value of Y Given X 16

Prediction Intervals for a randomly Chosen Value of Y Given X 17 It is “easier” to predict the class average on an exam than it is to predict a randomly chosen student’s score. Prediction intervals are used to estimate the value of a randomly chosen value of y, given x. Clearly, this is a more difficult task => intervals of greater width (lower precision) than confidence intervals for the mean

Regression Analysis: Rating versus Sugars 18

Danger of Extrapolation Extrapolation: making predictions for x values lying outside the range of regression, Example: sugar = 30 grams Estimated Regression Equation: – Rating: yˆ = 59.4 − 2.42(sugars) – =59.4 − 2.42(30) = −13.2. While minimum rating is 18 !!! Negative rating !! 19

TOC Univariate – Point Estimation – Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 20

Scatter plot of rating versus calories, protein, fat, and sodium 21

Scatter plot of rating versus fiber, carbohydrates, sugars, potassium, and vitamins 22

Cross correlation 23 Negative correlation: negative slope in regression line Zero correlation: uncorrelated One correlation: high correlated (multicollinearity)

Multicollinearity Definition: a condition where some of the predictor variables are correlated with each other. it leads to – instability in the solution space, – leading to possible incoherent results. – Overemphasizing a particular component of the model, like it’s double counted. Solutions: – Simply elimination (potassium in Cereal dataset) – Supervised dimension reduction approaches like LDA – Unsupervised dimension reduction approaches such as PCA, ICA, … 24

Multiple regression 25

Comparisons R-sqSE Bivariate Regression58.0%9.162 Multiple Regression99.5% The uncorrelated variable (Carbohydrates) did not eliminated Elimination of Carbohydrate leads to an increase in SE and decrease in R-sq

Verifying Model Assumptions Assumptions such as: – Linearity – Independence – Normality – Constant Variance – … Examples: – Checking normal distribution by normal plot of the residual – Checking linear relationship and constant variance by plotting standardized residual vs. fitted (predicted value) 27

Normal Plot of the Residual 28 We do not detect systematic deviations from linearity in the normal plot of the standardized residuals => our normality assumption is intact.

Checking linear relationship and constant variance If obvious curvature exists in the scatter plot, the linearity assumption is violated. If the vertical spread of the points in the plot is systematically nonuniform, the constant variance assumption is violated. We detect no such patterns in Figure => the linearity and constant variance assumptions are intact 29