Presentation is loading. Please wait.

Presentation is loading. Please wait.

Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13,

Similar presentations


Presentation on theme: "Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13,"— Presentation transcript:

1 Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13, Santorini Island, Greece

2 Two famous quotes There are three kinds of lies: lies, damn lies, and statistics Benjamin Disraeli, 1804 – 1881 (reported by Mark Twain) Never believe any statistics you haven‘t forged yourself Winston Churchill, 1874 – 1965

3 A typical figure Theirs Ours Each point represents an average over a number of iterations Y-axis: some cost measure X-axis: input size 3 4

4 Changing the cost measure...  … by a monotone function, say from c to 2 c This is from authentic data! 3 4 c 10 15 2c2c

5 No deep mathematics here  Even for strict monotone f –certainly E f(X) ≠ f(E X) in general –but also E X ≤ E Y does not in general imply E f(X) ≤ E f(Y)  Example –X : 4, 4 → average 4 –Y : 1, 5 → average 3 –2 X : 2 4, 2 4 → average 16 –2 Y : 2 1, 2 5 → average 17

6 Examples of multiple cost measures  Language modeling –for a given probability distribution p 1,…, p n –find distribution q 1,…, q n from a constrained class that minimizes cross-entropy Σ p i log (p i /q i ) minimizes perplexity π (p i /q i ) p i = 2 cross-entropy  Algorithm A uses algorithm B as a subroutine –B produces result of average quality q –complexity of A depends on, say, q 2

7 Can this also happen with error bars?  error bars for c don't overlap, yet reversal for f(c)? Yes, this can also happen! c f(c)

8 Can this also happen with error bars?  complete reversal with error bars? c f(c)

9 Can this also happen with error bars?  complete reversal with error bars? c f(c)

10 Can this also happen with error bars?  complete reversal with error bars? E Y + δ Y E X – δ X E f(Y) – δ f(Y) E f(X) + δ f(X) c f(c) δ Z = E |Z – E Z| absolute deviation ≤ σ Z = sqrt E (Z – E Z) 2 standard deviation

11 Can this also happen with error bars?  complete reversal with error bars? if E X – δ X ≥ E Y + δ Y c f(c) then E f(X) – δ f(X) ≥ E f(Y) + δ f(Y) Theorem: complete reversal can never happen!

12 Can this also happen with error bars?  complete reversal with error bars? if E X – δ X ≥ E Y + δ Y c f(c) then E f(X) – δ f(X) ≥ E f(Y) + δ f(Y) if only one of the four δ is dropped, the theorem no longer holds in general

13 Our first proof

14 The canonical proof 1.The medians M X and M Y do commute with f …  Prob(X ≤ M X) = ½ = Prob( f(X) ≤ f(M X) )  f(M X) = M f(X) and f(M Y) = M f(Y) 2.… and hence cannot reverse their order  M X ≤ M Y → f(M X) ≤ f(M Y) because f is monotone → M f(X) ≤ M f(Y) because M and f commute 3.Expectation and median are related as  | E X – M X | ≤ δ X = E | X – E X |  | E Y – M Y | ≤ δ Y = E | Y – E Y | nothing new, but hardly any computer scientist seems to know

15 The canonical proof  now assume this would happen contradicts the fact that the medians cannot reverse E Y + δ Y E X – δ X E f(Y) – δ f(Y) E f(X) + δ f(X) then M Y ≤ M Xyet M f(Y) > M f(X) c f(c)

16 Conclusion  Average comparison is a deceptive thing –even with error bars!  There are more effects of this kind … –e.g. non-overlapping error bars are not statistically significant for a particular order of the expectations (or medians) –e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8% Better always look at the complete histogram and at least check maximum and minimum X Y

17 Ευχαριστώ! Conclusion  Average comparison is a deceptive thing –even with error bars!  There are more effects of this kind … –e.g. non-overlapping error bars are not statistically significant for a particular order of the expectations (or medians) –e.g. for normally distributed X, Y Prob( X + δ X ≤ Y – δ Y | E X > E Y ) is up to 8%


Download ppt "Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13,"

Similar presentations


Ads by Google