Presentation is loading. Please wait.

Presentation is loading. Please wait.

Validation of uncertain predictions against uncertain observations Scott Ferson, William Oberkampf and Lev Ginzburg 20 February 2008, REC 2008, Savannah,

Similar presentations


Presentation on theme: "Validation of uncertain predictions against uncertain observations Scott Ferson, William Oberkampf and Lev Ginzburg 20 February 2008, REC 2008, Savannah,"— Presentation transcript:

1 Validation of uncertain predictions against uncertain observations Scott Ferson, William Oberkampf and Lev Ginzburg 20 February 2008, REC 2008, Savannah, Georgia

2 V & V Verification (checking the math) –Code testing –Interval analysis, probability bounds analysis –Units/dimension checking Validation (checking against data)

3 Goals Objectively measure the conformance of predictions with empirical data Use this measure to characterize the reliability of other predictions

4 Initial setting The model is fixed, at least for the time being –No changing it on the fly during validation A prediction is a probability distribution –Expressing stochastic uncertainty Observations are precise (scalar) numbers –Measurement uncertainty is negligible relaxed later

5 Validation metric A measure of the mismatch between the observed data and the model’s predictions –Low value means a good match –High value means they disagree Distance between prediction and data

6 Desirable properties of a metric Expressed in physical units Generalizes deterministic comparisons Reflects full distribution Not too sensitive to long tails Mathematical metric Unbounded (you can be really off)

7 200 250 300 350 400 1000900800700600 Time [seconds] Temperature [degrees Celsius] How the data come

8 How we look at them 0 1 Probability 200250300350450400 Temperature

9 0 1 Probability 200250300350450400 Temperature One suggestion for a metric Area or average horizontal distance between the empirical distribution S n and the predicted distribution

10 Minkowski L 1 metric between distributions Univariate version of the Wasserstein distance between the prediction F and data distribution S n, where the minimum is over all possible stochastic dependencies between X and Y Smallest mean absolute difference of deviates Area metric

11 Reflects full distribution Matches in mean Both mean and variance Matches well overall a = L(2,1.6) + 5 a ~(range=[5.25515,14.5592], mean=7, var=[1.94,2.56]) b = N(7,2.25) c = mix( N(4,0.5), N(10,0.5)) d = 0.9*a + 0.4 show a in black show c in blue hide c show b in blue hide b show d in blue hide d 515 0 1 01020 0 1 01020 0 1 10 Probability

12 Single observation 0 1 01234 Probability A single datum can’t match an entire distribution (unless it’s degenerate)

13 When the prediction is really bad The metric degenerates to simple distance Probability is dimensionless, so units are the same 0 1 6810121416182022242628024 d  24 Probability

14 Depends on the local scale The metric depends on the units Could standardize (divide by s.d.), but this means the metric will no longer be in physical units 0 1 01234 0 1 0100200300400 d  0.45 d  45

15 Why physical units? Distributions in the left graph don’t overlap but they seem closer than those on the right 0 1 01234 Probability 01234 1 0

16 Why an unbounded metric? Neither overlaps, but left is better fit than right Smirnov’s metric D max considers these two cases indistinguishable (they’re both just ‘far’) 0 1 01234 Probability 010203040

17 200 250 300 350 400 1000900800700600 Time [seconds] Temperature [degrees Celsius] The model says different things

18 0 1 Probability 200250300350450400 Temperature

19 Pooling data comparisons When data are to be compared against a single distribution, they’re pooled into S n When data are compared against different distributions, this isn’t possible Conformance must be expressed on some universal scale

20 Universal scale u i =F i (x i ) where x i are the data and F i are their respective predictions 1101001000 0 1 01234 0 1 010 0 1 N(2, 0.6) ~normal(range=[0.454502,3.5455], mean=2, var=0.36) max(0.0001,exponential(1.7)) ~(range=[0.0001,9.00714], mean=[1.699999,1.7001], var=[2.43,2.89]) mix(U(1,5),N(10,1)) * 2.3 ~(range=[2.3,28.9244], mean=14.95, var=70.9742) u1u1 u2u2 u3u3 5 Probability

21 Backtransforming to physical scale u 0 5 0 1 1324 Probability G

22 Backtransforming to physical scale The distribution of G  1 (F i (x i )) represents the empirical data (like S n does) but in a common, transformed scale Could pick any of many scales, and each leads to a different value for the metric The distribution of interest is the one used for the regulatory statement

23 Number of function evaluations Some models are difficult to evaluate Extracting distributional predictions may be expensive in terms of function evaluations Is the validation metric applicable when only very coarse predictions based on few function evaluations are available?

24 Coarse prediction 0 1 01234 Probability Prediction can be expressed as an ‘empirical’ distribution too

25 Statistical test for model accuracy Kolmogorov-Smirnov test of distribution of u i ’s against uniform over [0,1] This tests whether the empirical data are as though they were drawn from the respective prediction distributions Probability integral transform theorem (Angus 1994) says the u’s will be distributed as uniform(0,1) if x i ~ F i Assumes the empirical data are independent of each other

26 Epistemic uncertainty

27 Prediction Data How should we compare intervals?

28 Validation for intervals Validation measure is the smallest difference Overlapping intervals match perfectly Validity is distinct from precision –Otherwise no value in an uncertainty analysis

29 http://encarta.msn.com/map_701512318/English_Channel.html

30 Epistemic uncertainty about distributions Cumulative probability 0510 0 1 05 0 1 05 0 1 z=0.0001; zz =9.999 show z,zz a = N([6,7],1)-1 show a b = -1+mix(1,[5,7], 1,[6.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[7.5,9], 1,[4,8], 1,[5,9], 1,[6,9.99]) show b in blue b = -0.2+mix(1, [9,9.6],1, [5.3,6.2], 1,[5.6,6], 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) 2.137345705795 c = -4 b = -0.2+mix(1, [9,9.6],1, [5.3,6.2]+c, 1,[5.6,6]+c, 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8]+c, 1,[5,7]+c, 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) / 2 1.329372857714 Probability boxes (p-boxes) Left and right bounds on the uncertain CDF

31 Epistemic uncertainty in predictions In left, the datum evidences no discrepancy at all In middle, the discrepancy is relative to the edge In right, the discrepancy is even smaller Probability 01020 0 1 01020 0 1 01020 0 1 a = N([5,11],1) show a b = 8.1 show b in blue b = 15 breadth(env(rightside(a),b)) 4.023263478773 b = 11 breadth(env(rightside(a),b)) / 2 0.4087173895951 d = 0 d  4 d  0.4

32 Epistemic uncertainty in both Probability 0510 0 1 05 0 1 05 0 1 z=0.0001; zz =9.999 show z,zz a = N([6,7],1)-1 show a b = -1+mix(1,[5,7], 1,[6.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[7.5,9], 1,[4,8], 1,[5,9], 1,[6,9.99]) show b in blue b = -0.2+mix(1, [9,9.6],1, [5.3,6.2], 1,[5.6,6], 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8], 1,[5,7], 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) 2.137345705795 c = -4 b = -0.2+mix(1, [9,9.6],1, [5.3,6.2]+c, 1,[5.6,6]+c, 1,[7.8,8.4], 1,[5.9,7.8], 1,[8.3,8.7], 1,[5,7], 1,[7.5,8], 1,[7.6,9.99], 1, [3.3,6], 1,[4,8], 1,[4.5,8]+c, 1,[5,7]+c, 1,[8.5,9], 1,[7,8], 1,[7,9], 1,[8,9.99]) breadth(env(rightside(a),b)) / 2 1.329372857714 d = 0 d  0.05d  0.07 Predictions in white Observations in blue

33 01234567 012345 1234 012 12345 0 1 12 3 0 1 123 01 Probability 01234567 0123456 0 1234 0123 123456 0 1 123 4 0 1 123 012 Probability Area and distribution of differences

34 Measure for uncertain numbers Smallest possible expected absolute difference –Infimum taken over all possible distributions and over all possible dependencies between them –Not a metric In general, hard to compute for imprecise probs Quite easy for p-boxes

35 Three other schemes Pompieu’s method –Standard metric for possibly overlapping sets Range of ideas –Natural approach for interval analysts –Upper limit is hard to compute Double metric –Consider left and rights edges separately

36 Double metric Д = (5, 6)Д = (7, 12) Д = (9, 9) Probability 01020 0 1 0102001020 (0,0) only when left edges coincide and right edges coincide Prediction and data match in location and precision

37 Validation for imprecise probabilities MeasureSchemeMetricComputeStrictness Shortest distanceShapeNoMediumReasonable Max-sup-infElementYesHardToo strict Range of areasElementNoHardReasonable Double metricShapeYesEasyToo strict

38 Validation: summary Both assessment and reliability of extrapolation –How good is the model? –Should we trust its pronouncements? Updating is a separate activity Need metric to be both ad hoc and universal Epistemic uncertainty introduces some wrinkles –Full credit for being modest about predictions

39 End

40

41 Definition of a true metric Positive, d(x, y)  0 Symmetric, d(x, y) = d(y, x) Identicals indistinguishable, d(x, y) = 0  x = y Triangle inequality, d(x, y) + d(y, z)  d(x, z) Quasi-, semi-, pseudo-, ultra-metric

42 Other metrics Area is only one of many possible metrics Area favors central tendency (median) Could also use the medial distance from a datum to the distribution, or maybe the 95 th percentile of distances Might prefer conformance in the tails, or one tail in particular

43 Degrees of impossibility If a datum is completely outside the range of the prediction, it’s ‘impossible’ Transforming to the u scale makes it 0 or 1 We’d like to preserve how far outside it is

44 0 1 01020 2 3040 Extended distribution functions F < (x), x < 0 F*(x) =F(x), 0  x  1 F > (x), x > 1 01020 0 1 Probability FF* Extension slopes can be set by the distribution’s dispersion, to mimic tails, or as just relocated 45  lines

45 Using extensions in the metric Extended functions F i * can be used to get u’s (now no longer ranging only on [0,1]) The common backtransformation scale can also be extended to G* to accept these u’s This allows values considered impossible by the prediction to be represented

46 Vector of outputs Usually want to treat dimensions separately Possible to unify (pool) prediction-observation pairs even if they’re from different dimensions –Degrees, seconds, pascals, meters, etc. But there’s no G for backcalculation and so there can’t be a physically meaningful scale

47 Comparing accuracies Questions like “Is the match for temperature as good as the match for conductivity?” also require a universal scale to which all physical dimensions must be transformed If we do this, the metric becomes a norm


Download ppt "Validation of uncertain predictions against uncertain observations Scott Ferson, William Oberkampf and Lev Ginzburg 20 February 2008, REC 2008, Savannah,"

Similar presentations


Ads by Google