DECISION MODELING WITH MICROSOFT EXCEL Chapter 13 Copyright 2001 Prentice Hall Publishers and Ardith E. Baker Part 1.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Managerial Economics in a Global Economy
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Copyright © 2010 Pearson Education, Inc. Slide
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Simple Linear Regression
The Basics of Regression continued
SIMPLE LINEAR REGRESSION
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
THE IDENTIFICATION PROBLEM
SIMPLE LINEAR REGRESSION
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Spreadsheet Problem Solving
Slides 13c: Causal Models and Regression Analysis
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Simple Linear Regression
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
Correlation and Linear Regression
Lecture 3-2 Summarizing Relationships among variables ©
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Correlation and Linear Regression
Linear Trend Lines Y t = b 0 + b 1 X t Where Y t is the dependent variable being forecasted X t is the independent variable being used to explain Y. In.
Inferences for Regression
Chapter 8: Regression Analysis PowerPoint Slides Prepared By: Alan Olinsky Bryant University Management Science: The Art of Modeling with Spreadsheets,
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
CHAPTER 14 MULTIPLE REGRESSION
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 8: Simple Linear Regression Yang Zhenlin.
EXCEL DECISION MAKING TOOLS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
ANOVA, Regression and Multiple Regression March
EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
Basic Estimation Techniques
Chapter 11 Simple Regression
Basic Estimation Techniques
Correlation and Regression
Simple Linear Regression
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Presentation transcript:

DECISION MODELING WITH MICROSOFT EXCEL Chapter 13 Copyright 2001 Prentice Hall Publishers and Ardith E. Baker Part 1

Many important decisions made by individuals and organizations crucially depend on an assessment of the_________. There are a few “____” sayings that illustrate the promise and frustration of forecasting: “It is difficult to_________, especially in regards to the future.” “It isn’t difficult to forecast, just to forecast ___________.” “_______, if tortured enough, will confess to just about anything.”

Forecasting is playing an increasingly important role in the_______________. Economic forecasts__________ Government policies and business decisions Insurance companies’ ___________decisions in mortgages and bonds Service industries’ (such as airlines, hotels, rental cars, cruise lines, etc.) forecasts of _______as input for revenue management There is clearly a steady __________in the use of quantitative forecasting models at many levels in industry and government. The many types of forecasting models will be distributed into two major techniques: ___________and____________

____________forecasting models possess two important and attractive features: 1. They are expressed in mathematical ________. Thus, they establish an unambiguous record of how the forecast is made. 2. With the use of _______________and computers, quantitative models can be based on an amazing quantity of data. Two types of quantitative forecasting models that will be discussed in the next two sections are: ________models and _________models

In a _______forecasting model, the forecast for the quantity of interest “rides piggyback” on another quantity or set of quantities. In other words, our ________of the value of one variable (or perhaps several variables) enables us to forecast the value of another___________. In this model, let y denote the _________of some variable of interest and y denote a predicted or _________value for that variable. ^

Then, in a causal model, where f is a forecasting________, or function, and x 1, x 2, … x i, is a set of variables y = f(x 1, x 2, … x n ) ^ In this representation, the x variables are often called _________variables, whereas y is the dependent or __________variable. ^ We either _______the independent variables in advance or can forecast them more easily than y. ^ Then the independent variables will be used in the forecasting model to forecast the __________ variable.

Companies often find by looking at past __________that their monthly sales are directly related to the monthly______, and thus figure that a good forecast could be made using next month’s GDP figure. The only problem is that this quantity is not _______, or it may just be a forecast and thus not a truly independent___________. To use a causal forecasting model, requires two conditions: 1. There must be a ___________between values of the independent and dependent variables such that the former provides ____________about the latter.

2. The _______for the independent variables must be known and available to the forecaster at the ____the forecast is made. Simply because there is a mathematical relationship does not ___________that there is really cause and effect. One commonly used approach in creating a causal forecasting model is called_____________. Consider an oil company that is planning to expand its _________of modern self-service gasoline stations. CURVE FITTING: AN OIL COMPANY EXPANSION

The company plans to use __________(measured in the average number of cars per hour) to forecast ______(measured in average dollar sales per hour). The firm has had five stations in operation for more than a year and has used _________data to calculate the following averages:

The averages are plotted in a scatter diagram.

Now, these data will be used to construct a _________that will be used to forecast sales at any proposed location by measuring the traffic flow at that ________and plugging its value into the constructed function. Least Squares Fits The method of __________is a formal procedure for curve fitting. It is a two- step process. 1. Select a specific functional form (e.g., a ___________or quadratic curve). 2. Within the set of functions specified in step 1, choose the specific function that __________the sum of the squared deviations between the data points and the function___________.

To demonstrate the process, consider the sales- traffic flow example. 1. Assume a _______line; that is, functions of the form y = a + bx. 2. Draw the line in the ____________and indicate the __________between observed points and the function as d i. d 1 = y 1 – [a +bx 1 ] = 220 – [a + 150b] For example, where y 1 = actual sales/hr at location 1 x 1 = actual traffic flow at location 1 a = y-axis intercept for the function b = slope for the function

The value d 1 2 is one measure of __________the value of the function [a +bx 1 ] is to the ________ value, y 1 ; that is it indicates how well the function fits at this one point. d2d2d2d2 d5d5d5d5 d4d4d4d4 d1d1d1d1 d3d3d3d3 y = a + bx y x

One measure of how well the function fits overall is the sum of the __________________: di2di2di2di2  i=15 Consider a ________model with n as opposed to five____________. Since each d i = y i – (a +bx i ), the sum of the squared deviations can be written as:  i=1n (y i – [a +bx i ]) 2 Using the method of__________, select a and b so as to minimize the sum in the equation above.

Now, take the __________derivative of the sum with respect to a and set the resulting expression equal to______.  i=1n -2(y i – [a +bx i ]) = 0 A second __________is derived by following the same procedure with b.  i=1n -2x i (y i – [a +bx i ]) = 0 Recall that the values for x i and y i are the ______________, and our goal is to find the values of a and b that satisfy these two equations.

The solution is: xixixixi  i=1 n x i y i - b = 1 n  i=1 n xixixixi  i=1n yiyiyiyi  i=1 n xi2xi2xi2xi2 - 1 n  i=1 n 2 a =a =a =a = 1 n  i=1n yiyiyiyi - b 1 n  i=1 n xixixixi The next step is to determine the values for:  i=1 n xi2xi2xi2xi2  i=1 n yiyiyiyi  i=1 n xixixixi  i=1 n xiyixiyixiyixiyi Note that these _______depend only on observed data and can be found with simple arithmetic ___________or automatically using Excel’s predefined___________.

Using Excel, click on Tools – Data Analysis … In the resulting dialog, choose Regression.

In the __________dialog, enter the Y-range and X-range. Choose to place the _______in a new worksheet called Results Select ___________and Normal Probability Plots to be created along with the output.

Click OK to produce the following results: Note that a (Intercept) and b (X Variable 1) are reported as and , respectively.

To add the resulting ____________line, first click on the worksheet Chart 1 which contains the original_____________. Next, click on the ____________so that they are highlighted and then choose Add Trendline … from the Chart pull-down menu.

Choose Linear Trend in the resulting dialog and click OK.

A linear trend is fit to the data:

One of the other __________output values that is given in Excel is: R Square = 69.4% This is a “_________” measure which represents the R 2 statistic discussed in introductory statistics classes. R 2 ranges in value from __________and gives an indication of how much of the total ________in Y from its mean is explained by the new trend line. In fact, there are three different sums of errors: TSS (________Sum of Squares) ESS (________Sum of Squares) RSS (________Sum of Squares)

The basic relationship between them is: TSS = ESS + RSS They are defined as follows: TSS =  i=1n (Y i – Y ) 2 – ESS =  i=1n (Y i – Y i ) 2 ^  i=1n (Y i – Y ) 2 ^– RSS = Essentially, the ____is the amount of variation that can’t be explained by the___________. The ____quantity is effectively the amount of the ________, total variation (TSS) that could be removed using the regression line.

If the regression line fits________, then ESS = 0 and RSS = TSS, resulting in R 2 = 1. R2 =R2 =R2 =R2 =RSSTSS R 2 is defined as: In this example, R 2 =.694 which means that approximately 70% of the variation in the Y values is explained by the one ____________ variable (X), cars per hour.

Now, returning to the original question: Should we build a station at Buffalo Grove where traffic is 183 cars/hour? The best guess at what the corresponding _____ volume would be is found by placing this X value into the new ____________equation: Sales/hour = * (183 cars/hour) However, it would be nice to be able to state a _____confidence interval around this best guess. y = a + b * x ^ = $227.29

Excel reports that the ______________(S e ) is This quantity represents the amount of ______in the actual data around the regression line. We can get the information to do this from Excel’s Summary Output. The formula for S e is: Se =Se =Se =Se =  i=1n (Y i – Y i ) 2 ^ n – k -1 Where n is the number of data points (e.g., 5) and k is the number of ___________variables (e.g., 1).

This equation is __________to: n – k -1 ESS Once we know S e and based on the ______ distribution, we can state that We have 68% confidence that the _____ value of sales/hour is within + 1 S e of the predicted value ($277.29).We have 68% confidence that the _____ value of sales/hour is within + 1 S e of the predicted value ($277.29). We have 95% confidence that the actual value of _____/hour is within + 2 S e of the predicted value ($277.29).We have 95% confidence that the actual value of _____/hour is within + 2 S e of the predicted value ($277.29). [ – 2(44.18); (44.18)] [$138.93; $315.65] The 95% ___________interval is:

Another value of interest in the Summary report is the ____________for the X variable and its associated values. The t-statistic is 2.61 and the ________is A P-value less than 0.05 represents that we have at least 95% confidence that the ____parameter (b) is statistically significantly than 0 (zero). A slope of __results in a flat trend ______and indicates no relationship between Y and X. The 95% confidence limit for b is [-0.205; 2.064] Thus, we can’t _________the possibility that the true value of b might be 0.

Also given in the Summary report is the _____________. Since there is only ____ independent variable, the F –significance is identical to the P-value for the t-statistic. In the case of more than one X variable, the F – significance tests the ___________that all the X variable parameters as a group are statistically significantly different than zero.

Concerning multiple regression________, as you add other X variables, the R 2 statistic will always _______, meaning the RSS has increased. In this case, the _________ R 2 statistic is a reliable __________of the true goodness of fit because it compensates for the reduction in the ____due to the addition of more independent variables. Thus, it may report a _________adjusted R 2 value even though R 2 has increased, unless the improvement in ____is more than compensated for by the __________of the new independent variables.

Fitting a Quadratic Function The method of least ________can be used with any number of independent variables and with any _________ form (not just linear). Suppose that we wish to fit a _________function of the form y = a 0 + a 1 x + a 2 x 2 to the previous data with the method of least squares. The goal is to select a 0, a 1, and a 2 in order to __________the sum of squared deviations, which is now  i=15 (y i – [a 0 + a 1 x i + a 2 x i 2 ]) 2

Proceed by setting the partial ____________with respect to a 0, a 1, and a 2 equal to______. This gives the equations 5a 0 + (  x i )a 1 + (  x i 2 )a 2 =  y i (  x i )a 0 + (  x i 2 )a 1 + (  x i 3 )a 2 =  x i y i (  x i 2 )a 0 + (  x i 3 )a 1 + (  x i 4 )a 2 =  x i 2 y i This is a simple set of three linear equations in three__________. Thus, the general name for this least squares curve fitting is “___________________.” The term _________comes from the fact that simultaneous linear equations are being solved.

Solver will be used to find the coefficients in Excel. Consider the following worksheet:

Now, to find the ____________values for the parameters (a 0, a 1, and a 2 ) using________, first click on Tools – Solver.

In the resulting Solver Parameter dialog, specify the following settings: Click Solve to solve the____________, nonlinear optimization model. In this model, the objective function is to minimize the sum of_______________.

Here are the Solver results. The parameter values are: This formula calculates the sum of squared errors directly.

Use Excel’s Chart Wizard to plot the _______data and the resulting ___________function. First, highlight the original range of data, then click on the ______________button.

Use Excel’s Chart Wizard to plot the original data as a __________and specify a quadratic function via the Chart – Add Trendline … option.

Comparing the Linear and Quadratic Fits In the method of least squares, the _____of the squared deviations was selected as the measure of “______________.” Thus, the linear and quadratic fits can be compared with this___________. In order to make this comparison, go back to the linear regression “________” spreadsheet and make the corresponding calculation in the original “______” spreadsheet.

Note that the sum of the squared deviations for the ________function is indeed smaller than that for the ______function (i.e., 4954 < ). Indeed, the quadratic gives roughly a 15% __________in the sum of squared deviations. It follows then: the best quadratic function must be _______as good as the best linear function. A linear function is a special type of ________ function in which a 2 = 0.

If a quadratic function is at least as good as a linear function, why not choose a more ________ form, thereby getting an even better_____? WHICH CURVE TO FIT? In practice, _______of the form (with only a single independent variable for illustrative purposes) are often suggested: y = a 0 + a 1 x + a 2 x 2 + … + a n x n Such a function is called a _________of degree n, and it represents a broad and flexible class of functions. n = 2quadratic n = 3cubic n = 4_______ …

One must proceed with __________when fitting data with a ___________function. For example, it is possible to find a (k – 1)-degree polynomial that will _________fit k data points. To be more specific, suppose we have seven _________observations, denoted (x i, y i ), i = 1, 2, …, 7 It is possible to find a ____________polynomial y = a 0 + a 1 x + a 2 x 2 + … + a 6 x 6 that exactly passes through each of these seven data points.

A perfect fit gives ______for the sum of squared deviations. However, this is ________, for it does not imply much about the _________ value of the model for use in future forecasting.

Despite the ________of the polynomial function, the forecast is very_______. The linear fit might provide more __________forecasts. Also, note that the polynomial fit has __________ extrapolation properties (i.e., the polynomial “_________” at its extremes).

One way of finding which fit is truly “better” is to use a different standard of_____________, the “mean squared error” or MSE. MSE = sum of squared errors (# of points – # of parameters) For the___________, the number of parameters estimated is 2 (a, b) MSE = 5854(5-2) = MSE = 4954(5-3) = For the quadratic fit

So, the MSE gets ______in this case even though the total sum of squares will always be less or the same for a ___________fit. When there is a_________, both the total sum of squares and the MSE will be_____. Because of this, most forecasting programs will fit only up through a _____polynomial, since higher degrees don’t reflect the general trend of ______data.

What is a Good Fit? A good historical fit may have poor _______power. So what is a good fit? It depends on whether one has some idea about the _________real-world process that relates the y’s and x’s. To be an __________forecasting device, the forecasting function must to some extent capture important ________of that process. The more one knows, the _______one can do. However, knowledge of the underlying process is typically phrased in__________ language. For example, linear curve fitting, in the statistical context, is called______________.

If the statistical _____________about the linear regression model are precisely satisfied (e.g., errors are _________distributed around the regression line), then in a precise and well- defined sense, statisticians can prove that the linear fit is the “______possible fit.” In the real world one can never be completely certain about the ____________process. The question then becomes: How much ___________can we have that the underlying process is one that satisfies a particular set of statistical____________? Fortunately, statistical analysis can reveal how well the _________data do indeed satisfy those assumptions.

And if it does not satisfy the assumptions, then try a different________. Remember, there is an underlying real-world _________and the model is a selective ___________________of that problem. How good is that model? Ideally, to test the goodness of a model, one would like to have considerable ____________with its use. If, in repeated use, it is observed that the model performs well, then our confidence is________. However, what confidence can we have at the outset, without experience?

Validating Models One_________, is to ask the question: Suppose the model had been used to make past decisions; how well would the firm have fared? This approach “creates” experience by ________ the past. This is often referred to as _________of the model. Typically, one uses only a ______of the historical data to create the model – for example, to fit a polynomial of a specified degree. One can then use the remaining _____to see how well the model would have performed.

End of Part 1 Please continue to Part 2