Hubbard Decision Research The Applied Information Economics Company Bootstrap Hints
Hubbard Decision Research The Applied Information Economics Company Overview of Bootstrapping Hints The objective of a good bootstrap model is to be a realistic model of intuitive judgments which are even more accurate than the judges The measure of effectiveness in this area is the R squared Roughly, R squared means the % of variance explained by the model These hints should help improve R squared
Hubbard Decision Research The Applied Information Economics Company Strategies for Improving R Squared Hints for choosing the right variables Hints for improving data gathering Hints for improving quantification Hints for finding higher-order variables
Hubbard Decision Research The Applied Information Economics Company Hints for Choosing Variables For some commonly bootstrapped variables – such as Confidence Index and Cancellation Probability – these variables may be considered: 3 Project cost and/or duration 3 Is it a compliance project and/or is the project a documented strategic requirement? 3 What is the scope of the business covered? (eg. Number of departments involved, number of users, etc.) 3 Sponsor characteristics such as level, whether the sponsor is business or IT, or the sponsors past success record in past projects 3 Whether the investment is new software development, package modification, upgrades to previous systems, hardware only, etc. 3 Technology risk such as proven track records, IT familiarity with the technology, the maturity of the technology Watch how many variables are added - much more than 8 variables starts to become unproductive and may degrade the accuracy of the model – stick to the important ones
Hubbard Decision Research The Applied Information Economics Company Data Gathering Hints You will probably always get a higher R square when averaging larger groups Be sure to allow time for calibration Use a trial bootstrap list that they discuss as a group They can check results with “pair-wise comparisons” – they pick pairs of investments at random, determine which they would prefer, then they confirm that their evaluators scores reflect this
Hubbard Decision Research The Applied Information Economics Company Hints for Quantifying Variables Regression assumes that all variables are basically linear Reviewing each variable for non- linearity and finding a way to make them linear will improve R squared Variables that can be captured as 0 or 1 (binary) need no review Continuous variables need to be graphed to check for non-linearity Discrete variables that are not binary require pivot table analysis (see pivot table procedure for details)
Hubbard Decision Research The Applied Information Economics Company Continuous Variables One way to improve R square is to convert your non-linear variables into linear variables To check which variables are non-linear make an XY graph of the continuous variable on the X axis and the bootstrapped variable (from the evaluators) on the Y axis If you find an obviously non-linear relationship, you can change the variable so that it becomes linear Depending on how the graph looks, you can take the appropriate steps
Hubbard Decision Research The Applied Information Economics Company Linear This is an obvious linear relationship, leave it just like it is
Hubbard Decision Research The Applied Information Economics Company Scattered Distribution If the XY plot is not obviously non-linear, then just leave it like it is If the Excel regression output indicates that this variable has little or no effect, consider removing it
Hubbard Decision Research The Applied Information Economics Company Clustered distribution Here, a “threshold” would be the best quantification of this variable Instead of being linear, this variable appears to make a difference only when it is above or below a certain value (in this case, about 6% on the horizontal scale Try converting the continuous variable to a binary. In this case you would use “=if(x<.06, 1,0)”
Hubbard Decision Research The Applied Information Economics Company Upward Sloping If the graph slopes upward, then you might try putting the scale of the X axis on “logarithmic” If this makes it look linear then use the formula “=log(X)” If that doesn’t work try “=X^.5” or some other power of X less than 1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 01020
Hubbard Decision Research The Applied Information Economics Company Leveling Off Try setting the scale of the Y axis to “logarithmic” If this makes it look linear then use “=exp(X)” If it doesn’t work, try “=X^2” or some other power of X 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 0%50%100%150%200%250%300%
Hubbard Decision Research The Applied Information Economics Company Downward Sloping Try setting the scale of the Y axis to “logarithmic” If this makes it look linear then use “=exp(x)” If it doesn’t work, try “=1/X” 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 0%50%100%150%200%250%300%
Hubbard Decision Research The Applied Information Economics Company Hints for Higher-Order Terms After your first attempt at a regression, you may improve your R squared by adding some “higher-order” variables A higher-order variable includes variables that are the products of other variables, conditional statements involving other variables, etc. To find potential candidates for higher-order terms, ask yourself if the importance of some variables depend on the values of other variables Try several new terms and plot each one. If there looks like an obvious linear relationship, then add it If you make a higher-order variable, run a new regression, and the R square is higher, it was probably a good choice
Hubbard Decision Research The Applied Information Economics Company Continuous Higher-Order Terms If the importance of one variable depends on the value of another, and they are both continuous, try the following – we’ll call these two variables X and Y If the bootstrapped variable should increase when both X and Y are high (or when both are low) then try “=X*Y” If the bootstrapped variable should increase when one variable is high and the other is low then try “=X/Y” If X is especially important when Y is over/under a certain value N then try “=if(Y>N, X, 0)
Hubbard Decision Research The Applied Information Economics Company Discrete Higher-Order Terms You might try a pivot table that compares the average bootstrapped output variable in combinations of the two variables – put one variable in the columns of a pivot and the other in the rows You can then try a nested IF statement that allows you to put a separate discrete value on each combination of the two variables For example, suppose you found a compounding relationship between “strategic” (Y) and “multiple departments” (X) You might try “=if(X=1,if(Y=1,.41,.11),.5)” Strategic Multiple Departments These 2 are not significantly different so you can average them and use the same value Average
Hubbard Decision Research The Applied Information Economics Company Improvements Due to Bootstrap This chart shows the percentage reduction in error of intuitive estimates compared to bootstrapped estimates Results vary depending on how objective and systematic the model was – like ours 0%5%10%15%20%25%30%35%40% Cancer patient life-expectancy Life-insurance salesrep performance Graduate students grades Changes in stock prices Mental illness using personality tests Student ratings of teaching effectiveness IQ scores using Rorschach tests Psychology course grades Business failures using financial ratios Mean across many studies
Hubbard Decision Research The Applied Information Economics Company Actual Classification Plots An Illinois insurance company created a classification chart to help prioritize the current list of proposed investments They wanted to determine which investments could be accepted without more analysis and which need more analysis 18 investments were plotted on the classification chart The results had a profound effect on investment priorities Some investments that were assumed to be beneficial now required analysis and some that required analysis could now be approved immediately
Hubbard Decision Research The Applied Information Economics Company Classification of Example Projects ,00010, Expected Investment Size ($000) Confidence Index No Classification Needed Do Abbreviated Risk-Return Analysis: 6. DLSW Router Network Redesign 9. Extended Hours 18. Doc. Access Strategy Do Abbreviated Risk-Return Analysis: 6. DLSW Router Network Redesign 9. Extended Hours 18. Doc. Access Strategy Do Full Risk- Return Analysis: 8. Pearl Indicator and Pearl I/O interface 11. Richardson Data Center Consolidation 15. MVS DB2 Tools Do Full Risk- Return Analysis: 8. Pearl Indicator and Pearl I/O interface 11. Richardson Data Center Consolidation 15. MVS DB2 Tools Reject; Consider Other Options: 1. Data Strategy 2. Enterprise Security Strategy 3. Remote Server Redundancy 12. MQ Series: Base 13. Development Environment 2000 (mf) 14. “Source Control” Source Code Mgmt 16. Enterprise InterNet Reject; Consider Other Options: 1. Data Strategy 2. Enterprise Security Strategy 3. Remote Server Redundancy 12. MQ Series: Base 13. Development Environment 2000 (mf) 14. “Source Control” Source Code Mgmt 16. Enterprise InterNet Success Factor Adjustments: 4. Network OS migration to Novell 5.x 10. Optimize Single Code Base Success Factor Adjustments: 4. Network OS migration to Novell 5.x 10. Optimize Single Code Base Accept without Further Analysis: 5. Lucent switch upgrade 7. Image Server Relocation 17. Enterprise IntraNet to all sites Accept without Further Analysis: 5. Lucent switch upgrade 7. Image Server Relocation 17. Enterprise IntraNet to all sites
Hubbard Decision Research The Applied Information Economics Company Bootstrapping Deliverables Final presentation including 3 An XY chart showing correlation of original estimates to the bootstrap model 3 Any “solution space” that was developed such as classification charts A worksheet for input of various values which uses the bootstrap model to estimate some output variable(s) Any customization to RAVI documentation for that client for proper use of the worksheets and solution spaces Any recommendations based on the bootstrap