1 James R. Black Qing Qing Wu 17 Feb 2016 Modeling Prediction Intervals using Monte Carlo Simulation Software 2016 ICEAA Professional Development & Training.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Simple Linear Regression and Correlation
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Simple Linear Regression
Lecture 3 Cameron Kaplan
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Statistics for Business and Economics
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 14 1 MER301: Engineering Reliability LECTURE 14: Chapter 7: Design of Engineering.
Part 4 Chapter 13 Linear Regression
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
Simple Linear Regression Analysis
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics
Lecture 5 Correlation and Regression
Correlation and Linear Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 8: Regression Analysis PowerPoint Slides Prepared By: Alan Olinsky Bryant University Management Science: The Art of Modeling with Spreadsheets,
Chapter 6 & 7 Linear Regression & Correlation
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
CHAPTER 15 Simple Linear Regression and Correlation
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Environmental Modeling Basic Testing Methods - Statistics III.
Ch 5-1 © 2004 Pearson Education, Inc. Pearson Prentice Hall, Pearson Education, Upper Saddle River, NJ Ostwald and McLaren / Cost Analysis and Estimating.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Engineers often: Regress data to a model  Used for assessing theory  Used for predicting  Empirical or theoretical model Use the regression of others.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Regression and Correlation of Data Summary
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
Part 5 - Chapter
Part 5 - Chapter 17.
Basic Estimation Techniques
Chapter 11 Simple Regression
Basic Estimation Techniques
Statistical Methods For Engineers
Part 5 - Chapter 17.
Chapter 14 Inference for Regression
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Simple Linear Regression
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

1 James R. Black Qing Qing Wu 17 Feb 2016 Modeling Prediction Intervals using Monte Carlo Simulation Software 2016 ICEAA Professional Development & Training Workshop

2 Presenter Bios James “Jay” Black has 12 years of cost estimating experience and currently works as senior operations research analyst for the Administration for Children and Families within the U.S. Department of Health and Human Services. In this role, he supports the Grants Center of Excellence software suite used to administer 1200 grant programs in eight Federal departments. Jay has a Masters in Systems Engineering from Johns Hopkins University and holds a current CCE/A certification. Qing Qing “Q” Wu is a cost analyst for the Cost Effectiveness Branch at the Naval Surface Warfare Center Carderock Division. She supports the Naval Sea Systems Command 05C Cost Engineering & Industrial Analysis Division in their Weapon Systems Division. She has a Bachelor’s degree in Mathematics from the Macaulay Honors College at The City College of New York.

3 Presentation Summary References/Acknowledgements: –2014 ICEAA Workshop presentation prepared by Dr. Christian Smart (MDA) and Marc Greenberg (NASA) –Joint Agency Cost Schedule Risk and Uncertainty Handbook (CSRUH, Feb 2014) Abstract: –The use of a prediction interval (PI) is a simple method of quantifying risk and uncertainty for a Cost Estimating Relationship (CER) derived from an Ordinary Least Squares (OLS) regression –Yet, few cost estimators implement PIs in their estimates despite their frequent use of CERs This presentation will provide a step-by-step tutorial for modeling a PI for an example CER using Monte Carlo Simulation software and will identify the beneficial impact on the coefficient of variation (CV)

4 Cost Estimating Relationships (CERs) Definition: A Cost Estimating Relationship (CER) is a mathematical expression of cost as a function of one or more independent variables CERs are often developed using regression analysis to fit an equation to a data set Examples of equations used for CERs include: Linear CER:y = a + bx Nonlinear CERs:y = ax b y = ab x y = a + bx c where y = Cost x = Technical Parameter

5 Modeling Uncertainty CERs do not perfectly fit historical data upon which they are based This results in an underlying uncertainty distribution about an estimate –The outcome of a CER represents only one point on an uncertainty distribution (typically mean or median) This brief will model this uncertainty

6 Modeling Uncertainty (cont.) Model uncertainty is variation about the dependent variable, i.e., cost For a linear CER: For a nonlinear CER: where  represents the error between the estimated cost and the actual cost Y; the estimate uncertainty is captured by the Prediction Interval Often used to create weight based estimates Often used to model learning curve

7 Example: Modeling Uncertainty for a Linear CER For example, consider a linear CER: Using Monte Carlo simulation software (e.g. or Oracle Crystal Ball), define a distribution for  –  = normal(mean = 0, std dev = prediction error) –OR –  = student-t(midpoint = 0, scale = prediction error, degrees of freedom) Ok, so how do you define prediction error?

8 Prediction Interval Equation = Calculated Value from Regression Line = t Critical Value (T.INV.2T function in Excel) = Standard Error of the Estimate (STEYX function in Excel) = number of observations = average of X = sum of squared deviations of X from its mean (DEVSQ function in Excel) Prediction Error

9 Evaluating the Prediction Error Development $ In BY12$M Weight In Lbs. $1,0001,000 $2,0003,000 $1,6002,500 $1, $2,0003,500 $3,5009,000 $5,00030,000 $4,00010,000 $1,6004,000 Example Dataset Set Up the Inputs Prediction Error = Evaluate Prediction Error =SEE*(SQRT(((n+1)/n)+((( X-Avg)^2)/Devsq)))

10 Development $ In BY12$M Weight In Lbs. $1,0001,000 $2,0003,000 $1,6002,500 $1, $2,0003,500 $3,5009,000 $5,00030,000 $4,00010,000 $1,6004,000 Example Dataset * Note: the use of an Excel trendline is for presentation brevity, make sure you consider T- & F-Stat, R^2 adj, and other fit measures when running a regression on your own OLS Regression* Define two distributions and look at resulting effect on CV –On independent variable x = triangular(4000, 5000, 7000) –On  student-t(midpoint = 0, scale = prediction error, degrees of freedom) Y = x  Example: Modeling Uncertainty for a Linear CER

11 Deciles CV5.2% Status Quo Example Risk Only on Independent Variable Deciles 90%$2, %$2, %$2, %$2, %$2, %$2, %$2, %$2, %$1, CV7.0% Risk only on Independent Variable (Weight): - Only with weighted (triangular) distribution - Low CV of 0.07 Basis and Values of Risk Parameters Risk ParameterMinMost LikelyMax Weight DistWeight Low (10%) 4000 Weight Most Likely 5000 Weight High (90%) 7000 Note: Regression of the original dataset had a R² =

12 Deciles CV5.2% Prediction Interval Example Risk on Independent Var. & Error Term Deciles 90%$3, %$2, %$2, %$2, %$2, %$1, %$1, %$1, %$1, CV39.9% Basis and Values of Risk Parameters Risk ParameterMinMost LikelyMax PI Dist Weight Dist Weight Low (10%) 4000 Weight Most Likely 5000 Weight High (90%) 7000 Risk on Independent Variable and Error Term: - Weighted (triangular) distribution and PI (Student-t) distribution - High CV of.40 Student-t Distribution Parameters: Midpoint = 0, Scale = (Prediction Error) Degrees of Freedom = 7 (n-2) Note: Regression of the original dataset had a R² =

13 Summary Implementing risk on the error term using the prediction interval is not difficult Even for regressions with reasonable fit statistics, implementing risk on the error term can produce desirable CVs

14 BACKUP

15 Example: Linear xPrediction Erroreyy + e 5000 =SEE*(SQRT(((n+1)/n)+(((X- Avg)^2)/Devsq)))0 =Slope*X+Intercept=y+e xPrediction Erroreyy + e Example: X = 5000 Student-t Distribution Midpoint = 0, Scale = Prediction Error Degrees of Freedom = n-k-1 Triangular Distribution 10% = 4,000 Likeliest = 5,000 90% = 7,000

16 Student-t Distribution Explained Inputs to the Student-t distribution: Midpoint: 0 Scale: Prediction Error Deg. Freedom: n-k-1

17 Prediction Interval Equation Prediction Error –standard error of the CER –CER sample size (i.e., the number of data points used to derive the CER) –desired confidence level –distance from the center of the CER’s independent variables to the location of the independent variable of the point being estimated

18 Generating the S-Curve from the Prediction Interval The S-curve can be generated by varying the critical value of the t distribution for the prediction interval equation, holding the CER input(s) constant: Prediction Error

19 Example: Non-Linear Dataset Inputs Prediction Error = Prediction Error =SEE*(SQRT(((n+1)/n)+((( X-Avg)^2)/Devsq))) n=COUNT(ln(x)) Slope=SLOPE(ln(y), ln(x)) Intercept=INTERCEPT(ln(y), ln(x)) SEE=STEYX(ln(y), ln(x)) Avg=AVERAGE(ln(x)) Devsq=DEVSQ(ln(x)) n=9 Slope=0.50 Intercept=3.47 SEE=0.15 Avg=8.29 Devsq=10.03 Development $ In BY12$M Weight In Lbs. $1,0001,000 $2,0003,000 $1,6002,500 $1, $2,0003,500 $3,5009,000 $5,00030,000 $4,00010,000 $1,6004,000 ln(Dev $)ln(Weight)

20 xln(x)Prediction Erroreyy with eAnti-log of Y 12000=LN(x) =SEE*(SQRT(((n+1)/n)+(((X- Avg)^2)/Devsq)))0 =Slope*X+Intercept=y+e=EXP(y+e) Example: Non-Linear Example: X = 20 Student-t Distribution Midpoint = 0, Scale = Prediction Error Degrees of Freedom = n-k-1 xln(x)Prediction Erroreyy with eAnti-log of Y

21 Deciles CV5.2% Basis and Values of Risk Parameters Risk ParameterMinMost LikelyMax PI Dist Prediction Interval Example Non-Linear Risk on Error Term: - PI (Student-t) distribution - CV of.20 Student-t Distribution Parameters: Midpoint = 0, Scale = 0.16 (Prediction Error) Degrees of Freedom = 7 (n-2) Note: Regression of the original dataset had a R² =

22 Example 2: Non-Linear Learning Curve Example Dataset Inputs Prediction Error = Prediction Error =SEE*(SQRT(((n+1)/n)+((( X-Avg)^2)/Devsq))) n=COUNT(ln(x)) Slope=SLOPE(ln(y), ln(x)) Intercept=INTERCEPT(ln(y), ln(x)) SEE=STEYX(ln(y), ln(x)) Avg=AVERAGE(ln(x)) Devsq=DEVSQ(ln(x)) n=6 Slope=-0.13 Intercept=7.13 SEE=0.05 Avg=2.71 Devsq=9.53 ln(x)ln(y) x Lot midpoint y Unit Cost 1.87 $1, $ $ $ $ $750

23 xln(x)Prediction Erroreyy with eAnti-log of Y 20=LN(x) =SEE*(SQRT(((n+1)/n)+(((X- Avg)^2)/Devsq)))0 =Slope*X+Intercept=y+e=EXP(y+e) Example 2: Non-Linear Learning Curve Example Example: X = 20 Student-t Distribution Midpoint = 0, Scale = Prediction Error Degrees of Freedom = n-k-1 xln(x)Prediction Erroreyy with eAnti-log of Y

24 Deciles CV5.2% Basis and Values of Risk Parameters Risk ParameterMinMost LikelyMax PI Dist Prediction Interval Example 2 Non-Linear Risk on Error Term: - PI (Student-t) distribution - CV of.09 Student-t Distribution Parameters: Midpoint = 0, Scale = 0.06 (Prediction Error) Degrees of Freedom = 4 (n-2) Note: Regression of the original dataset had a R² =

25 Linear Regression Example Fit Statistics

26 Nonlinear Regression Example 2 Fit Statistics