1 Chapter 9 Supplement Model Building. 2 Introduction Introduction Regression analysis is one of the most commonly used techniques in statistics. It is.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1 Inducements–Call Blocking. Aware of the Service?
1
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
BUS 220: ELEMENTARY STATISTICS
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
Chapter 7 Sampling and Sampling Distributions
Simple Linear Regression 1. review of least squares procedure 2
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Financial and Managerial Accounting
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
Introduction to Cost Behavior and Cost-Volume Relationships
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
PP Test Review Sections 6-1 to 6-6
Copyright © Cengage Learning. All rights reserved.
Cost-Volume-Profit Relationships
Chapter 6 The Mathematics of Diversification
2009 Foster School of Business Cost Accounting L.DuCharme 1 Determining How Costs Behave Chapter 10.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
Chapter 1: Expressions, Equations, & Inequalities
1..
Adding Up In Chunks.
Statistical Analysis SC504/HS927 Spring Term 2008
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Lecture Unit Multiple Regression.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Subtraction: Adding UP
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
1 Interpreting a Model in which the slopes are allowed to differ across groups Suppose Y is regressed on X1, Dummy1 (an indicator variable for group membership),
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Stock Valuation and Risk
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Regression and Model Building
Select a time to count down from the clock above
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Chapter 5 The Mathematics of Diversification
Fundamentals of Real Estate Lecture 13 Spring, 2003 Copyright © Joseph A. Petry
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
Lecture 26 Model Building (Chapters ) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
1 Lecture Eleven Probability Models. 2 Outline Bayesian Probability Duration Models.
1 Lecture Eleven Probability Models. 2 Outline Bayesian Probability Duration Models.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Economics 173 Business Statistics Lecture 22 Fall, 2001© Professor J. Petry
Outline When X’s are Dummy variables –EXAMPLE 1: USED CARS –EXAMPLE 2: RESTAURANT LOCATION Modeling a quadratic relationship –Restaurant Example.
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
Lecture 27 Chapter 20.3: Nominal Variables HW6 due by 5 p.m. Wednesday Office hour today after class. Extra office hour Wednesday from Final Exam:
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
1 Chapter 20 Model Building Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful.
Lecture Eleven Probability Models.
Presentation transcript:

1 Chapter 9 Supplement Model Building

2 Introduction Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful for several reasons: –It can cover a variety of mathematical models linear relationships. non - linear relationships. nominal independent variables. –It provides efficient methods for model building

3 Polynomial Models Polynomial Models There are models where the independent variables (x i ) may appear as functions of a smaller number of predictor variables. Polynomial models are one such example.

4 y =  0 +  1 x 1 +  2 x 2 +…+  p x p +  y =  0 +  1 x +  2 x 2 + …+  p x p +  Polynomial Models with One Predictor Variable Polynomial Models with One Predictor Variable

5 y  0  1 x   First order model (p = 1) y =  0 +  1 x +  2 x 2 +   2 < 0  2 > 0 Second order model (p=2) Polynomial Models with One Predictor Variable Polynomial Models with One Predictor Variable

6 y =  0 +  1 x +  2 x 2 +   3 x 3 +   3 < 0  3 > 0 Third order model (p = 3) Polynomial Models with One Predictor Variable Polynomial Models with One Predictor Variable

7 First order model y =  0 +  1 x 1 +  Polynomial Models with Two Predictor Variables Polynomial Models with Two Predictor Variables x1x1 x2x2 y  2 x 2 +   1 < 0  1 > 0 x1x1 x2x2 y  2 > 0  2 < 0

8 First order model, two predictors, and interaction y =  0 +  1 x 1 +  2 x 2 +  3 x 1 x 2 +  x1x1 X 2 = 2 X 2 = 3 X 2 =1  0 +  2 (1)] +[  1 +  3 (1)]x 1  0 +  2 (3)] +[  1 +  3 (3)]x1  0 +  2 (2)] +[  1 +  3 (2)]x 1 The two variables interact to affect the value of y. First order model y =  0 +  1 x 1 +  2 x 2 +  The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. x1x1 X 2 = 1 X 2 = 2 X 2 = 3  0 +  2 (1)] +  1 x 1  0 +  2 (2)] +  1 x 1  0 +  2 (3)] +  1 x 1 Polynomial Models with Two Predictor Variables Polynomial Models with Two Predictor Variables

9 Second order model with interaction y =  0 +  1 x 1 +  2 x 2 +  3 x  4 x  y = [  0 +  2 (2)+  4 (2 2 )]+  1 x 1 +  3 x  Second order model y =  0 +  1 x 1 +  2 x 2 +  3 x  4 x  X 2 =1 X 2 = 2 X 2 = 3 y = [  0 +  2 (1)+  4 (1 2 )]+  1 x 1 +  3 x  x1x1 X 2 =1 X 2 = 2 X 2 = 3 y = [  0 +  2 (3)+  4 (3 2 )]+  1 x 1 +  3 x  Polynomial Models with Two Predictor Variables Polynomial Models with Two Predictor Variables  5 x 1 x 2 + 

10 Selecting a Model Several models have been introduced. How do we select the right model? Selecting a model: –Use your knowledge of the problem (variables involved and the nature of the relationship between them) to select a model. –Test the model using statistical techniques.

11 Selecting a Model; Example Selecting a Model; Example Example: The location of a new restaurant –A fast food restaurant chain tries to identify new locations that are likely to be profitable. –The primary market for such restaurants is middle- income adults and their children (between the age 5 and 12). –Which regression model should be proposed to predict the profitability of new locations?

12 –Quadratic relationships between Revenue and each predictor variable should be observed. Why? Members of middle-class families are more likely to visit a fast food restaurant than members of poor or wealthy families. Income Low Middle High Revenue Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids. age Revenue Low Middle High Selecting a Model; Example Selecting a Model; Example Solution –The dependent variable will be Gross Revenue

13 Selecting a Model; Example Selecting a Model; Example Solution –The quadratic regression model built is Sales =  0 +  1 INCOME +  2 AGE +  3 INCOME 2 +  4 AGE 2 +  5 ( INCOME )( AGE ) +  Sales =  0 +  1 INCOME +  2 AGE +  3 INCOME 2 +  4 AGE 2 +  5 ( INCOME )( AGE ) +  Include interaction term when in doubt, and test its relevance later. SALES = annual gross sales INCOME = median annual household income in the neighborhood AGE = mean age of children in the neighborhood

14 To verify the validity of the proposed model for recommending the location of a new fast food restaurant, 25 areas with fast food restaurants were randomly selected. –Each area included one of the firm’s and three competing restaurants. –Data collected included (Xm9-01.xls):Xm9-01.xls Previous year’s annual gross sales. Mean annual household income. Mean age of children Selecting a Model; Example Selecting a Model; Example

15 Xm9-01.xls Collected data Added data Selecting a Model; Example Selecting a Model; Example

16 The Quadratic Relationships – Graphical Illustration

17 Model Validation This is a valid model that can be used to make predictions. But…

18 Model Validation The model can be used to make predictions... …but multicollinearity is a problem!! The t-tests may be distorted, therefore, do not interpret the coefficients or test them. In excel: Tools > Data Analysis > Correlation Reducing multicollinearity

19 Nominal Independent Variables Nominal Independent Variables In many real-life situations one or more independent variables are nominal. Including nominal variables in a regression analysis model is done via indicator variables. An indicator variable (I) can assume one out of two values, “zero” or “one”. 1 if a first condition out of two is met 0 if a second condition out of two is met I= 1 if data were collected before if data were collected after if the temperature was below 50 o 0 if the temperature was 50 o or more 1 if a degree earned is in Finance 0 if a degree earned is not in Finance

20 Nominal Independent Variables; Example: Auction Price of Cars A car dealer wants to predict the auction price of a car. Xm9-02a_supp –The dealer believes now that odometer reading and the car color are variables that affect a car’s price. –Three color categories are considered: White Silver Other colors Note: Color is a nominal variable.

21 data - revised (Xm9-02b_supp)Xm9-02b_supp I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver The category “Other colors” is defined by: I 1 = 0; I 2 = 0 Nominal Independent Variables; Example: Auction Price of Cars

22 Note: To represent the situation of three possible colors we need only two indicator variables. Conclusion: To represent a nominal variable with m possible categories, we must create m-1 indicator variables. How Many Indicator Variables?

23 Solution –the proposed model is y =  0 +  1 (Odometer) +  2 I 1 +  3 I 2 +  –The data White car Other color Silver color Nominal Independent Variables; Example: Auction Car Price

24 From Excel we get the regression equation PRICE = (Odometer)+90.48(I-1) (I-2) A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $ more than a car of the “Other color” category. For one additional mile the auction price decreases by 5.55 cents. Example: Auction Car Price The Regression Equation

25 Odometer Price Price = (Odometer) (0) (1) Price = (Odometer) (1) (0) Price = (Odometer) (0) + 148(0) (Odometer) (Odometer) (Odometer) The equation for an “other color” car. The equation for a white color car. The equation for a silver color car. From Excel (Xm9-02b_supp) we get the regression equationXm9-02b PRICE = (Odometer)+90.48(I-1) (I-2) Example: Auction Car Price The Regression Equation

26 There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Xm9-02bXm9-02b_supp Example: Auction Car Price The Regression Equation

27 The Dean wants to evaluate applications for the MBA program by predicting future performance of the applicants. The following three predictors were suggested: –Undergraduate GPA –GMAT score –Years of work experience It is now believed that the type of undergraduate degree should be included in the model. Nominal Independent Variables; Example: MBA Program Admission (MBA II) MBA IIMBA II Note: The undergraduate degree is nominal data.

28 Nominal Independent Variables; Example: MBA Program Admission I 1 = 1 if B.A. 0 otherwise I 2 = 1 if B.B.A 0 otherwise The category “Other group” is defined by: I 1 = 0; I 2 = 0; I 3 = 0 I 3 = 1 if B.Sc. or B.Eng. 0 otherwise

29 Nominal Independent Variables; Example: MBA Program Admission MBA-II

30 Applications in Human Resources Management: Pay-Equity Pay-equity can be handled in two different forms: –Equal pay for equal work –Equal pay for work of equal value. Regression analysis is extensively employed in cases of equal pay for equal work.

31 Human Resources Management: Pay-Equity Example (Xm9-03_supp)Xm9-03 –Is there sex discrimination against female managers in a large firm? –A random sample of 100 managers was selected and data were collected as follows: Annual salary Years of education Years of experience Gender

32 Solution –Construct the following multiple regression model: y =  0 +  1 Education +  2 Experience +  3 Gender +  –Note the nature of the variables: Education – Interval Experience – Interval Gender – Nominal (Gender = 1 if male; =0 otherwise). Human Resources Management: Pay-Equity

33 Solution – Continued (Xm9-03)Xm9-03 Human Resources Management: Pay-Equity Analysis and Interpretation The model fits the data quite well. The model is very useful. Experience is a variable strongly related to salary. There is no evidence of sex discrimination.

34 Solution – Continued (Xm9-03)Xm9-03 Human Resources Management: Pay-Equity Analysis and Interpretation Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17 Average salary for female manager is $76,189 Average salary for male manager is $97,832

35 Stepwise Regression Multicollinearity may prevent the study of the relationship between dependent and independent variables. The correlation matrix may fail to detect multicollinearity because variables may relate to one another in various ways. To reduce multicollinearity we can use stepwise regression. In stepwise regression variables are added to or deleted from the model one at a time, based on their contribution to the current model.

36 Model Building Identify the dependent variable, and clearly define it. List potential predictors. –Bear in mind the problem of multicollinearity. –Consider the cost of gathering, processing and storing data. –Be selective in your choice (try to use as few variables as possible).

37 Identify several possible models. –A scatter diagram of the dependent variables can be helpful in formulating the right model. –If you are uncertain, start with first order and second order models, with and without interaction. –Try other relationships (transformations) if the polynomial models fail to provide a good fit. Use statistical software to estimate the model. Gather the required observations (have at least six observations for each independent variable).

38 Determine whether the required conditions are satisfied. If not, attempt to correct the problem. Select the best model. –Use the statistical output. –Use your judgment!!