Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches * CEDM Annual Meeting Pittsburgh, PA May 20, 2012 Umit Guvenc,

Similar presentations


Presentation on theme: "A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches * CEDM Annual Meeting Pittsburgh, PA May 20, 2012 Umit Guvenc,"— Presentation transcript:

1 A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches * CEDM Annual Meeting Pittsburgh, PA May 20, 2012 Umit Guvenc, Mitchell Small, Granger Morgan Carnegie Mellon University *Work supported under a cooperative agreement between NSF and Carnegie Mellon University through the Center for Climate and Energy Decision Making (SES-0949710)

2 Multi-Expert Weighting: A Common Challenge in Public Policy Within climate change context, many critical quantities and probability distributions elicited from multiple experts (e.g., climate sensitivity) No consensus on best methodology if one wanted to aggregate multiple, sometimes conflicting, expert opinions Critical to demonstrate advantages and disadvantages of different approaches under different circumstances 2

3 General Issues Regarding Multi-Expert Weighting 1.Should we aggregate expert judgments at all? 2.If we do, should we use a differential weighting scheme? 3.If we do, should we use “seed questions” to assess expert skill? 4.If we do, how should we choose “appropriate” seed questions? 5.If we do, how do different weighting schemes perform under different circumstances? Equal weights Likelihood weights “Classical” (Cooke) weights 3

4 Presentation Outline 1.Alternative Weighting Methods – Likelihood, “Classical”, Equal Weighting Schemes 2.Our Approach 3.Characterizing Experts – Bias, Precision, Confidence 4.Multi-Expert Scenario Analysis 5.Conclusions 4

5 Likelihood Weights Traditional approach for multi-model aggregation in classical statistics – Equivalent to Bayesian model aggregation with uninformed priors Uses relative likelihoods for Prob[true value| expert estimate] – We assume expert’s actual likelihood depends on their skill -Bias and Precision – Expert’s self-perceived likelihood depends on his/her Confidence Parametric error distribution function required – Normal distribution assumed in analysis that follows (many risk-related quantities ~lognormal, so directly applicable to these) “Micro” validation incorporated 5

6 “Classical” Weights Cooke RM (1991), Experts in Uncertainty, Oxford University Press, Oxford Cooke RM and Grossens LLHJ (2008) “TU Delft Expert Judgment Database”, Reliability Engineering and System Safety, v.93, p.657-674 Per study: 7-55 seeds, 6-47 “effective” seeds, 4-77 experts Parameters chosen to maximize expert weights Within-sample validation “Macro” validation only – Based on frequencies across percentiles across all questions Non-parametric, based on Chi-square distribution 6

7 Our Approach MC Simulation with 10 hypothetical questions Experts characterized along three dimensions – Bias – Precision – Confidence Multi-Expert Scenario Analysis 7

8 Characterizing Experts: Bias, Precision, Confidence 8 µ mean σ mean (Precision) 0 Bias µ X 50% X 5% X 95% X0 σ X (Confidence) fµfµ L=f X (0) * fXfX µXµX True Value Expert thinks about the mean (i.e. best estimate) Expert thinks about distribution of variable X

9 Multi-Expert Scenario Analysis 9 experts, characterized by Bias, Precision, Confidence 10 hypothetical questions (i = 1 to 10) – True Value X True (i) = 0 – Expert Estimate X Estimate (i): X 5%, X 50%, X 95% – Predictive Error(i) = X True (i) - X Guess (i); MSE Leave one question out at a time to predict (cross-validation) Determine expert weights using 9 questions Compare weights and predictive error for an assumed group of experts – Equal Weights – Likelihood Weights – “Classical” Weights 9

10 Multi-Expert Scenarios 1.Base Case 2.Impact of Bias 3.Impact of Precision 4.Impact of Confidence 5.Experts with bias, precision and confidence all varying 10

11 Scenario #1: Base Case 11 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 000 10.9%11.9%10.7%MSE(L) =0.03 000 11.3%11.1%11.4% 000 10.7%10.5% Precision Avg Classical Weights 0.3 10.8%11.3%11.1%MSE (C) =0.01 0.3 11.5%11.0%11.1% 0.3 11.0%11.1% Confidence Precision Equal Weights11.11%MSE(E) =0.82 C/P 0.3 1 1 Confidence: 0.3 1 Model validation: Equal weights to equal skills

12 Scenario #2: Impact of Bias 12 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 000 17.2%18.1%16.9%MSE(L) =0.04 0.1 14.0%13.9%14.0% 0.3 2.2%1.8% Precision Avg Classical Weights 0.3 15.8%16.4%16.2%MSE (C) =0.02 0.3 14.0%13.5%13.4% 0.3 3.5%3.6% Confidence Precision Equal Weights11.11%MSE(E) =2.26 C/P 0.3 1 1 Confidence: 0.3 1 When small and moderate bias introduced to multiple experts, weights change to penalize bias (more prominent in likelihood method)

13 Scenario #3: Impact of Precision 13 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 000 30.7%2.2%0.0%MSE(L) =0.02 000 31.9%2.1%0.0% 000 31.2%1.9%0.0% Precision Avg Classical Weights 0.20.31 15.5%13.3%4.3%MSE (C) =0.02 0.20.31 16.5%12.9%4.3% 0.20.31 15.7%13.1%4.3% Confidence Precision Equal Weights11.11%MSE(E) =3.42 C/P 0.20.31 1 0.20.31 1 Confidence: 0.20.31 1 0.20.31 When Bias=0 for all and imprecision introduced to multiple experts, weights change to reward precision and penalize imprecision (more prominent in likelihood method)

14 Scenario #4: Impact of Confidence 14 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 000 10.3%11.5%10.2%MSE(L) =0.05 000 22.2%21.3%22.1% 000 0.8% Precision Avg Classical Weights 0.3 7.0%7.9%7.4%MSE (C) =0.02 0.3 18.2%17.4%17.5% 0.3 8.1%8.2% Confidence Precision Equal Weights11.11%MSE(E) =0.82 C/P 0.3 0.5 0.15 1 Confidence: 0.3 2 0.6 When Bias=0 for all and over- and under-confidence introduced to multiple experts, weights change to penalize inappropriate confidence (more prominent in likelihood method for under-confidence)

15 Scenario #5a: Impact of Precision & Confidence (Bias = 0 for all) 15 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 000 17.7%6.1%0.0%MSE(L) =0.03 000 62.5%7.4%0.0% 000 6.1%0.2%0.0% Precision Avg Classical Weights 0.20.31 6.8%6.9%4.2%MSE (C) =0.03 0.20.31 21.5%17.8%9.4% 0.20.31 15.9%13.2%4.3% Confidence Precision Equal Weights11.11%MSE(E) =3.42 C/P 0.20.31 0.5 0.10.150.5 1 Confidence: 0.20.31 2 0.40.62 When Bias=0 and imprecision and over-and under-confidence introduced to multiple experts Weights change to reward “ideal” expert (more prominent in likelihood) For “Classical”, proper confidence can somewhat compensate for imprecision, not so for Likelihood (imprecise experts are penalized highly, even if they know they are imprecise)

16 Scenario #5b: Impact of Precision & Confidence (Bias for all) 16 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 0.5 0.0%0.2%1.2%MSE(L) =0.30 0.5 0.3%8.8%1.0% 0.5 42.3%46.3%0.0% Precision Avg Classical Weights 0.20.31 0.0%0.1%12.7%MSE (C) =0.72 0.20.31 0.0%1.9%46.5% 0.20.31 2.0%8.3%28.6% Confidence Precision Equal Weights11.11%MSE(E) =23.80 C/P 0.20.31 0.5 0.10.150.5 1 Confidence: 0.20.31 2 0.40.62 When bias for all, and varying amounts of precision and improper relative confidence introduced to multiple experts Likelihood weights change to reward relatively precise, but underconfident experts Classical weights shift to reward imprecise experts.

17 Scenario #5c: Precision & Confidence (Bias for 3 Experts) 17 Experts 147 258 369 Expert Characteristics Results: Weights & Error Bias Avg Likelihood Weights 0.300 0.0%8.2%0.0%MSE(L) =0.04 00.30 86.7%2.0%0.0% 0.300 2.4%0.6%0.0% Precision Avg Classical Weights 0.20.31 0.0%9.9%6.2%MSE (C) =0.06 0.20.31 32.8%6.5%14.6% 0.20.31 2.8%20.4%6.7% Confidence Precision Equal Weights11.11%MSE(E) =4.26 C/P 0.20.31 0.5 0.10.150.5 1 Confidence: 0.20.31 2 0.40.62 When there is moderate bias in a subset of “good” experts, and both imprecision and over-and under-confidence introduced to all Likelihood rewards “best” expert significantly Classical spreads weights across much more

18 Conclusions (1) Overall: Likelihood and “Classical” similar performance (much better than equal weights), but with very different weights assigned to experts with different degrees of bias, precision and relative confidence Model Check: Both assign equal weights to experts with equal skill (equal bias, precision, and relative confidence) Bias: Both penalize biased experts, stronger penalty in Likelihood Precision: Both penalize imprecise experts, but again stronger penalty in Likelihood Confidence: “Classical” penalizes overconfidence and underconfidence equally. Likelihood penalizes overconfidence a similar amount, but underconfidence much more so. 18

19 Conclusions (2) Precision & Confidence: For “Classical”, proper (or under-) confidence can compensate somewhat for imprecision, not so for the Likelihood weights (and over-confidence remains better for Likelihood weighting). Future Direction: Consider 3-parameter distributions to be fit from expert’s 5th, 50th, and 95th percentile values to enable a more flexible Likelihood approach – Conduct an elicitation in which 2- and 3-parameter likelihood functions are used and compared. – Consider how new information affects experts' performance on seed questions (explore VOI for correcting experts' biases, imprecision, and under- or overconfidence). 19

20 Thank you Questions? 20

21 “Classical” Weights (2): Illustration 4 Intervals:1234 Interval Probability: P1= 5%P2=45%P3=45%P4=5% MinX(50 th )MaxX(5 th )X(95 th ) X(True) ? Across ALL questions… 5% of true values should be in Interval 1 45% of true values should be in Interval 2 45% of true values should be in Interval 3 5% of true values should be in Interval 4 Expert 1: Perfect S1= 5%S2=45%S3=45%S4=5% Expert 2: Poor (tends to underestimate) S1=0%S2=10%S3=60%S4=30% r= 2*N*∑[s i *LN(s i /p i )] ~ ChiSq(df=3) 21 Expert assessment 

22 “Classical” Weights (3) Relative “Cooke scores” Cooke Score = Calibration * Binary * Information – Calibration = p-value of r ~ ChiSq (df=3) (fast function) r= 2*N*∑[s i *LN(s i /p i )] Null hypothesis: “Expert is calibrated” – Binary = 1 if Calibration > alpha, 0 otw – Information = f(range) (slow function) Independent of true value, focuses only on width of range “Macro” validation only – Based on frequencies across percentiles across all questions Non-parametric, based on Chi-sq distribution 22


Download ppt "A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches * CEDM Annual Meeting Pittsburgh, PA May 20, 2012 Umit Guvenc,"

Similar presentations


Ads by Google