Probability Forecasting, Probability Evaluation, and Scoring Rules: Expanding the Toolbox Robert L. Winkler Duke University Subjective Bayes Workshop –

Slides:



Advertisements
Similar presentations
Evaluating and Institutionalizing
Advertisements

Management Control Systems
Transformations & Data Cleaning
14 May 2001QPF Verification Workshop Verification of Probability Forecasts at Points WMO QPF Verification Workshop Prague, Czech Republic May 2001.
Hawawini & VialletChapter 7© 2007 Thomson South-Western Chapter 7 ALTERNATIVES TO THE NET PRESENT VALUE RULE.
In the “Eye of the Tiger” It is (mostly) about Improvement.
Chapter 9 Project Analysis Chapter Outline
Segment 3 Introduction to Random Variables - or - You really do not know exactly what is going to happen George Howard.
1 Chapter 12 Value of Information. 2 Chapter 12, Value of information Learning Objectives: Probability and Perfect Information The Expected Value of Information.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Decision-Making under Uncertainty – Part I Topic 4.
Sensitivity and Scenario Analysis
Presented by Diane Burtchin Adapted from the book “Math Intervention: Building Number Power with Formative Assessments, Differentiation, and Games” by.
Statistics [0,I/2] The Essential Mathematics. Two Forms of Statistics Descriptive Statistics What is physically happening within the data? Inferential.
BABS 502 Judgment in Forecasting and Forecast Accuracy Lecture 2 February 28, 2008.
CS 589 Information Risk Management 30 January 2007.
1 Chapter 8 Subjective Probability. 2 Chapter 8, Subjective Probability Learning Objectives: Uncertainty and public policy Subjective probability-assessment.
Introduction to Decision Analysis
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
CS 589 Information Risk Management 23 January 2007.
16722 Sensing and Sensors Mel Siegel )
ORGANIZATIONAL BEHAVIOR
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Lecture Slides Elementary Statistics Twelfth Edition
PROJECT EVALUATION. Introduction Evaluation  comparing a proposed project with alternatives and deciding whether to proceed with it Normally carried.
Chapter 5.
Ex Performance Measurement & Rewards Chapter Articles  “Performance Review: Perilous Curves Ahead”  “From Balanced Scorecard to Strategic Gauges: Is.
CMSC 671 Fall 2003 Class #26 – Wednesday, November 26 Russell & Norvig 16.1 – 16.5 Some material borrowed from Jean-Claude Latombe and Daphne Koller by.
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
Federal Crop Insurance Programs: Historic Performance, Contemporary Issues prepared by: Gary Schnitkey, Bruce Sherrick, Bob Hauser, Paul Ellinger Agricultural.
Collective Revelation: A Mechanism for Self-Verified, Weighted, and Truthful Predictions Sharad Goel, Daniel M. Reeves, David M. Pennock Presented by:
SESSION ONE PERFORMANCE MANAGEMENT & APPRAISALS.
Determining Sample Size
National Institute of Economic and Social Research “Consensus estimates of forecast uncertainty: working out how little we know ” James Mitchell NIESR.
Managing a DI Team: Facilitating the Creative Process.
March 2009 PERFORMANCE EVALUATION OF PUBLIC SERVICE DELIVERY (Social Services)– KENYAN EXPERIENCE March 2009 PERSPECTIVES ON IMPACT EVALUATION Presenter:
© 2003 The McGraw-Hill Companies, Inc. All rights reserved. Project Analysis and Evaluation Chapter Eleven Prepared by Anne Inglis, Ryerson University.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
IIASA Yuri Ermoliev International Institute for Applied Systems Analysis Mathematical methods for robust solutions.
Asymmetric Information
11 0 Project Analysis and Evaluation. 1 Key Concepts and Skills  Understand forecasting risk and sources of value  Understand and be able to do scenario.
1 Technical & Business Writing (ENG-315) Muhammad Bilal Bashir UIIT, Rawalpindi.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Chapter(3) Qualitative Risk Analysis. Risk Model.
Chapter 7 Probability and Samples: The Distribution of Sample Means.
National Institute of Economic and Social Research Combining forecast densities from VARs with uncertain instabilities Anne Sofie Jore (Norges Bank) James.
Scoring Rules, Generalized Entropy, and Utility Maximization Robert Nau Fuqua School of Business Duke University (with Victor Jose and Robert Winkler)
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Fundamentals of Game Design by Ernest Adams and Andrew Rollings Chapter 1: Games and Video Games.
1 Civil Systems Planning Benefit/Cost Analysis Scott Matthews Courses: / / Lecture 12.
Thinking Critically. What is critical thinking? Critical thinking is the process of examining, analyzing, questioning, and challenging situations, issues,
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Grades: Their Effects on Students as Measures of Achievement.
Learning and Transfer This is Chapter 3 in the very useful book: National Research Council (2000). How People Learn: Brain, Mind, Experience, and School.
Trust, Influence, and Noise: Implications for Safety Surveilance Bill Rand Asst. Prof. of Marketing and Computer Science Director of the Center for Complexity.
Joe Mahoney University of Illinois at Urbana-Champaign
Scoring Rules, Generalized Entropy, and Utility Maximization Victor Jose, Robert Nau, & Robert Winkler Fuqua School of Business Duke University Presentation.
Unit 11: Use observation, assessment and planning
On Investor Behavior Objective Define and discuss the concept of rational behavior.
Scoring Rules, Generalized Entropy and Utility Maximization Victor Richmond R. Jose Robert F. Nau Robert L. Winkler The Fuqua School of Business Duke University.
Approaches Workbook Conditioned Games – Teacher Answers.
Preparing for the Learning Experience Chapter 7. Objectives Discuss the concept of the learning experience Explain the role of the movement practitioner.
Money and Banking Lecture 11. Review of the Previous Lecture Application of Present Value Concept Internal Rate of Return Bond Pricing Real Vs Nominal.
CHAPTER 5 Transfer of Training.
Measurement Systems for Sustainability Arrow’10 Inclusive wealth – one particular metric Parris & Kates Review 12 indicator initiatives  How do we choose.
Chapter 26: Generalizations and Surveys. Inductive Generalizations (pp ) Arguments to a general conclusion are fairly common. Some people claim.
Uncertain Judgements: Eliciting experts’ probabilities Anthony O’Hagan et al 2006 Review by Samu Mäntyniemi.
Using Relationships of Support to Nurture the Language of Emotions
Building Knowledge about ESD Indicators
Presentation transcript:

Probability Forecasting, Probability Evaluation, and Scoring Rules: Expanding the Toolbox Robert L. Winkler Duke University Subjective Bayes Workshop – Warwick December 2009

Outline of Presentation Probability Forecasting Why Probability Evaluation? Scoring Rules: Incentives and Evaluation Some Issues and Recent Developments Extended Families of Scoring Rules Relative Evaluation Taking Order into Account Probability Assessment vs. Quantile Assessment Calibration, Sharpness, etc. – What Matters? Competition among Forecasters Concluding Thoughts

Probability Forecasting Formal representation of uncertainty Various sources of forecasts – all with some subjective element Subjective forecasts from “experts” Probabilities from Bayesian analyses Probabilities from other modeling

Why Probability Evaluation? Recognition of importance of uncertainty Some increase in use of probability forecasts Concern about levels of risk Importance of ex post evaluation Connection of forecasts with reality Concern about impact of “poor” forecasts Improvement of future probability forecasts Identification of better probability forecasters Keep forecasters honest (accountability!)

Scoring Rules As incentives Strictly proper scoring rules Maximizing Expected Score  honest reporting, incentive for “better” forecasts As evaluation measures Overall measures of accuracy Decompositions for specific characteristics Commonly-used rules Some background, historical notes

QuadraticSpherical Logarithmic Score Expected Score (n=2) Commonly-Used Rules

Some Issues & Recent Developments My own biased view! Based in part on recent work with Casey Lichtendahl, Victor Richmond Jose, Bob Nau, and others Influenced greatly by many years of work with Allan Murphy

Extended Families of Scoring Rules Power and Pseudospherical families, each with a single parameter β (–  < β <  ) Power family includes quadratic rule (β = 2) Pseudospherical family includes spherical rule (β = 2) Both families include logarithmic rule (β → 1) Provide rich families of strictly proper rules

Power and Pseudospherical Families

Relative Evaluation Standard scoring rules are symmetric Maximum ES is smallest when the distribution is uniform – forecasts are rewarded for improvements over a uniform distribution –Is the uniform distribution a suitable baseline? Evaluation relative to a non-uniform baseline Often makes more sense Facilitates comparability of scores from different situations – improvements over their baselines Motivated “Skill Score” (but not strictly proper)

Scoring Rules with Baseline Distributions Strictly proper asymmetric rules Generalized Power & Pseudospherical families Strictly proper Score = 0 when forecast = baseline ES > 0 when forecast ≠ baseline

Power and Pseudospherical Families with Baselines

Expected Scores (β = 2) POWER SCORE, q = (0.2, 0.7, 0.1) QUADRATIC SCORE SPHERICAL SCORE PSEUDOSPHERICAL SCORE, q = (0.2,0.7, 0.1) State 1 State 2 State 3 State 1 State 2 State 3 State 1 State 2 State 3 State 1 State 2 State 3

Taking Order Into Account What if the events of interest are ordered? Standard scoring rules ignore the order For events other than the one that occurs, the probabilities matter but not “where they are” Do we want the scoring rule to reflect order? Then we want it to be “sensitive to distance” Having more probability on events “close” to the event that happens than on events “far” from the event that happens results in a higher score.

Scoring Rules that Reflect Order First such rule: ranked probability score Based on quadratic scoring rule Reflects order by using cumulative probabilities The same approach can be used with any basic scoring rule Can generate Power and Pseudospherical families that include baseline distributions and are sensitive to order

Sensitive to Distance Rules

Expected Scores (β = 2) QUADRATIC SCORERANKED PROBABILITY SCORE POWER SCORE, q = (1/3,1/3,1/3)POWER SCORE, q = (0.7, 0.2, 0.1) State 1 State 2 State 3 State 1 State 2 State 3 State 1 State 2 State 3 State 1 State 2 State 3

Probabilities vs. Quantiles Often we have ordered events or values of a variable, as noted earlier We might have quantile forecasts, or a mix of probability forecasts and quantile forecasts Why not just use the previously-discussed scoring rules to evaluate quantiles? They provide improper incentives for quantiles Can be gamed to get perfect calibration artificially

Scoring Rules for Quantiles Scores are based on the quantiles and on the actual value of the variable Rules based on linear loss functions Can be used for multiple quantiles Special case: interval forecasts (2 quantiles) Strictly proper for quantile assessment

Calibration & Sharpness How much should we care about calibration? Evaluation of probability forecasts often focuses almost exclusively on calibration Forecasters can game calibration measures A good Bayesian should try to recalibrate probabilities (viewing them as new information) A key issue: will the forecasts be taken at face value?

Calibration & Sharpness, cont. What about sharpness? Sharpness is a better indicator of how informative the forecasts are (or can be) Sharpness can often be improved with effort (more data-gathering, improved models) Limited exploration suggests that sharpness has greater impact on overall scores than calibration Ideal: Maximize sharpness subject to good calibration Pay more attention to sharpness measures

Competition Among Forecasters Forecasters can have goals other than maximizing ES Utility could be nonlinear in the score Risk averse Step function with step at “target score” Goal could be to outscore other forecasters Brings in game-theoretic considerations Results in forecasts that are more extreme than the forecaster’s probabilities Goal could be to be similar to other forecasters, or not to be near the bottom of the group in scores Leads to so-called “herding” behavior Tricky to model We don’t know exactly what a forecaster’s goals are Can’t recover the forecaster’s probabilities from those that are reported

Concluding Thoughts/Recommendations Encourage greater use of probabilities Encourage consistent evaluation of probabilities Rich families of scoring rules Need better understanding of characteristics of different rules Choice of a baseline distribution and sensitivity to distance more important than value of β Tailor rules to other probability-related forecasts, such as quantiles, for proper incentives Pay more attention to sharpness, less to calibration Be aware of possible other goals and their impact on reported probabilities Other issues in probability forecasting & evaluation