Download presentation
Presentation is loading. Please wait.
Published byBarbara Beasley Modified over 9 years ago
1
Scoring Rules, Generalized Entropy and Utility Maximization Victor Richmond R. Jose Robert F. Nau Robert L. Winkler The Fuqua School of Business Duke University Durham, NC, USA FUR XII Presentation June 23, 2006
2
The general problem: how to measure information value Suppose there is uncertainty about which of n states of the world will occur, either in a single occurrence or repeated trials. An initial description of the uncertainty is represented by a “baseline” probability distribution q, while a forecaster or decision maker possesses a “true” distribution p based on additional information (e.g.,experimental data or expert judgment). What is an appropriate measure of the value of the information that changes q to p ?
3
Three strands of information-value literature 1.Decision analysis: information value = increase in expected utility obtained by using p rather than q to select among available acts. 2.Information theory: information value = decrease in expected number of bits needed to communicate the state which has occurred (Kullback-Leibler divergence between p and q ). 3.Scoring rules: information value = expected score obtained when the distribution p is elicited via a proper scoring rule whose expectation is minimized at the baseline distribution q.
4
Historical perspective All three strands of literature date back to pioneering work on subjective probability, expected utility, and information theory in the 1940’s and 1950’s (Shannon, Savage, Brier, Good...) In recent years, scoring rules have received new attention in experimental economics and neuroeconomics, while generalized divergence measures have found application in machine learning, robust Bayesian statistics, and mathematical psychology. A number of recent papers have explored other aspects of their interconnections (e.g., Grünwald- David 2004 Ann. Stat., Gneiting-Raftery 2005 w.p.)
5
This paper presents a unification of the three approaches We introduce generalized (weighted) versions of the power and pseudospherical scoring rules with power parameter and baseline distribution q. These scoring rule families are shown to correspond to two generalized divergence measures which converge to the KL divergence at = 1. The cases = 0, = ½, and = 2 are also of special interest. They also correspond exactly to canonical decision analysis problems involving a risk averse decision maker whose risk tolerance coefficient is equal to .
6
Part 1: scoring rules Notation: p = (p 1,..., p n ) is the forecaster’s true distribution r = (r 1,..., r n ) is her reported distribution q = (q 1,..., q n ) is a baseline (reference) distribution e i = the i th unit vector (point mass on state i ) A scoring rule is a function S with arguments r and p, which is linear in p, and has parameters q, such that S(r, p; q) is the expected score for reporting r when the true distribution is p S(r, e i ; q) is the actual score yielded by r in state i S(p, p; q) is minimized at p = q
7
Proper scoring rules S is [strictly] proper if it [strictly] encourages honesty in the sense that S(p, p; q) [>] S(r, p; q), r p Classic proper scoring rules (for which q is uniform): Quadratic (Brier) score: Spherical score: Logarithmic score:
8
Generalized families of scoring rules The three classic scoring rules can be generalized by introducing a non-uniform baseline distribution q and by substituting an arbitrary real number for the power of 2 in the quadratic and spherical rules. This leads to the weighted power score and weighted pseudospherical score. The weighted power and pseudospherical scores depend on the state i as affine functions of (r i /q i ) 1. They both converge to the weighted logarithmic score at = 1.
9
Weighted scoring rules Weighted power score: Weighted pseudospherical score: At = 1 both converge to the weighted log score:
12
Corresponding expected score functions Weighted power expected score: Weighted pseudospherical expected score: At = 1 both converge to weighted log exp. score:
14
Part 2. Information-theoretic measures of divergence & distance Kullback-Leibler divergence between p and q : Affinity between p and q : Squared Hellinger distance: Chi-square divergence:
15
Parametric families of generalized divergence Havrda-Chavrat (1967) & others: Arimoto (1971) & others: Both converge to the KL divergence at = 1 :
16
First main result: scoring-rule/entropy link Theorem 1: The Havrda-Chavrat and Arimoto divergences of order are identical to the weighted power and pseudospherical scoring expected-score functions of order , respectively, for all real . The special cases = 0, = ½, = 1, and = 2 are of particular interest.
17
Part 3. Decision-analytic information value with exponential/log/power utility, i.e., linear risk tolerance Standard LRT (HARA) utility function with risk tolerance coefficient : Special cases:
19
Properties of standard LRT utility functions The graphs of the utility functions { g } are mutually tangent at the origin for all : The risk tolerance function (reciprocal of the Pratt-Arrow measure) is a linear function with slope = and intercept = 1 : g (y) and g 1 (y) are power utility functions whose exponents are reciprocal to each other
20
Canonical decision models for determining information value of p over q Suppose a risk averse decision maker with utility function g (y) and probability distribution p bets so as to maximize her own expected utility vs. a risk-neutral, non-strategic opponent with distribution q : Equivalently, a risk neutral decision maker with probability distribution p bets so as to maximize her own expected utility vs. a risk-averse, non-strategic opponent with utility function g 1 (y) and distribution q :
21
Second main result: decision analysis/ scoring rule/entropy link Theorem 2(a): The solution of Models Y and Z yields the same optimal utility payoffs as the weighted pseudospherical scoring rule with parameters q and , and its expected utility is the Arimoto divergence of order between p and q. Note that risk tolerance is non-decreasing in both models Y and Z (only) when is between 0 and 1. The interesting special case = ½ corresponds to reciprocal utility in both models, and the special cases = 0 and = 1 correspond to exponential utility in one model and logarithmic utility in the other.
22
Alternative decision models that maximize the sum of expected utilities Suppose a risk averse decision maker with utility function g (y) and distribution p bets against a risk- neutral, non-strategic opponent with distribution q so as to maximize the sum of their expected utilities: Equivalently, a risk neutral decision maker with distribution p bets against a risk-averse, non-strategic opponent with utility function g 1 (y) and distribution q so as to maximize the sum of their expected utilities:
23
Second main result, continued: Theorem 2(b): The solution of Models Y and Z yields the same optimal utility payoffs as the weighted power scoring rule with parameters q and , and its expected utility is the Havrda-Chavrat divergence of order between p and q. Note that Models Y and Z are more “cooperative” in spirit than Y and Z, but also somewhat less natural. The sum of two persons’ expected utilities is maximized, each computed according to a different probability distribution for the same states.
24
Observations and conclusions 1.The pseudospherical rule & Arimoto divergence have a more compelling decision-theoretic basis than the power rule & Havrda-Chavrat divergence, insofar as they arise from a more natural utility- maximization problem. 2.The most appropriate values of for either rule appear to be those in the closed unit interval, rather than the more commonly used = 2. 3.The special case = ½ is of interest because of its symmetry properties and connection with the Hellinger distance measure (reciprocal utility!) 4.A well-chosen and not-necessarily-uniform baseline distribution q is the most important parameter of the scoring rule in any case.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.