 Consumer Research Organization.  Commissions surveys and publishes reports & ratings for automobiles.  Maintains online discussion forums where consumers.

Slides:



Advertisements
Similar presentations
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
Advertisements

Correlation and regression
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 20 Curved Patterns.
 Popular Technology Blog.  Has articles on:  Tech Industry trends  Start ups  New technologies  Expert Opinions 2.
Inference for Regression
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
BA 555 Practical Business Analysis
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Statistics 303 Chapter 10 Least Squares Regression Analysis.
Correlation and Regression Analysis
Simple Linear Regression Analysis
The Gas Guzzling Luxurious Cars Tony Dapontes and Danielle Sarlo.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Inference for regression - Simple linear regression
Linear Regression and Correlation
Simple Linear Regression
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Copyright © 2011 Pearson Education, Inc. The Simple Regression Model Chapter 21.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Simple & Multiple Regression 1: Simple Regression - Prediction models 1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
ETM U 1 Multiple regression More than one indicator variable may be responsible for the variation we see in the response. Gas mileage is a function.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Linear Discriminant Analysis and Logistic Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
IT523-01N: DATA WAREHOUSING AND DATA MINING FINAL PROJECT INSTRUCTOR: DR. SHEILA FOURNIER- BONILLA ELEISHA BARNETT How Mpgs are Affected in Vehicles: A.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Stat 112 Notes 8 Today: –Chapters 4.3 (Assessing the Fit of a Regression Model) –Chapter 4.4 (Comparing Two Regression Models) –Chapter 4.5 (Prediction.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Correlation and Linear Regression
The simple linear regression model and parameter estimation
Projection on Latent Variables
Multiple Regression.
Chapter 12: Regression Diagnostics
Diagnostics and Transformation for SLR
Inference about the Slope and Intercept
Prepared by Lee Revere and John Large
Residuals The residuals are estimate of the error
Regression Models - Introduction
Multiple Regression BPS 7e Chapter 29 © 2015 W. H. Freeman and Company.
Inference about the Slope and Intercept
Hypothesis testing and Estimation
Indicator Variables Response: Highway MPG
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Diagnostics and Transformation for SLR
Presentation transcript:

 Consumer Research Organization.  Commissions surveys and publishes reports & ratings for automobiles.  Maintains online discussion forums where consumers can post questions/experiences related to cars and driving. 2

 Focus on cars in the 10 – 25 year range.  Studying the mileage of cars of the above age.  Aiming to quantify the impact of a set of variables on the mileage, and  thereby, provide decision-making help to a potential customer looking to buy a car within the above age range. 3

 Multiple Linear Regression  Least Squares Approach.  Enables quantification of the strength of the relationship between the response and the predictor variables.  Allows prediction of response values based upon knowledge of relationships between response and predictor variables. 4

 R  Version   free download.  Provides inbuilt ‘dummy’ variable creation for factor variables.  Model generated contains fitted values and residuals.  Can be easily accessed and isolated for analysis. 5

 392 observations *  The response variable is MPG (Miles per Gallon)  The predictor variables are:  cylinders: 3,4,5,6 or 8 cylinders.  displacement (cu. inches): the total air displaced by the pistons in all of an engine’s cylinders. It is a measure of engine size and power.  horsepower (hp)  weight (lbs): Vehicle weight.  acceleration(seconds): time to accelerate from 0 to 60 mph.  origin: 1. American, 2. European,3. Japanese  age(years): years lapsed since year of manufacture. 6 * Revised from the StatLib library at Carnegie Mellon University

 Dataset available in the form of a.csv file.  “carID” is a unique identifier for each row and does not contain any logic or intelligence.  All the variables are in numeric format.  The variables “cylinders” and “origin” will need to be converted to factor variables. 7

 A “base” regression model was built, to predict “mpg” using all the variables.  Residual plots were created and visually inspected for zero mean, constant variance and independence assumptions.  Normality assumption was verified by generating histogram and QQ-plots.  An R 2 value of was obtained.  An improved model ( i.e. containing one less predictor variable) was identified by statistical selection. 8

 Client feedback:  “Can this model be applied to dataset with different values for the same variables? Is it re-usable?”  ( This model can be re-used for prediction, provided the values of the variables are within the ranges seen within the dataset used to generate this model. The assumptions of linear regression may not hold good, once the data is out of this range, and hence linear regression may not be applicable.) 9

 Client feedback:  “Is there a need to download/install special R packages to carry out the necessary charting and analysis?”  (No packages need to be downloaded or installed. The basic R functionality is more than enough to carry out regression analysis, work with the residuals and fitted values, and generate the necessary visualizations.) 10

 Ideas for further analysis:  Include car name (e.g. “Chevrolet Malibu”) as a variable in the analysis.  Generate side-by-side comparison of different statistical selection methods for improving the model. 11

 Data gathered from nationwide surveys over a period of 7 months.  Analysis and review carried out over an 8-week period. 12

 An analyst and a ‘domain’ expert were assigned to this project full time.  This project involved a combined effort of about man-hours.  The cost can be estimated keeping the above details in mind. 13