Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20 30.

Similar presentations


Presentation on theme: "Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20 30."— Presentation transcript:

1 Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20 30

2 Density estimate problem Convert a set of numeric data points to a smoothed approximation of the underlying probability density. 10 20 30 11 12 19 21 Example Points 17 18 22 26 27 29

3 Techniques Manual estimates Histograms 10 20 30 10 20 30 Curve fitting 10 20 30

4 Generalized histograms 10 20 30 0.2 chance: [11.. 12] 0.5 chance: [17.. 22] 0.3 chance: [26.. 29] General form prob 1 : [min 1.. max 1 ] prob 2 : [min 2.. max 2 ] … prob n : [min n.. max n ] Intervals do not overlap Probabilities sum to 1.0

5 Special cases Standard histogram Set of points Weighted points

6 Smoothing problem Given a generalized histogram, construct its coarser approximation. 10 20 30 10 20 30 10 20 30

7 Input Initial distribution: A point set or a fine-grained histogram Distance function: A measure of similarity between distributions Target size: The number of intervals in an approximation

8 Standard distance measures Simple difference: ∫ | p(x) − q(x) | dx Kullback-Leibler: ∫ p(x) · log (p(x) / q(x)) dx Jensen-Shannon: (Kullback-Leibler (p, (p+q)/2) + Kullback-Leibler (q, (p+q)/2)) / 2

9 Smoothing algorithm Repeat: Merge two adjacent intervals Until the histogram has the right size 1020 30

10 Interval merging min 1 min 2 max 1 max 2 prob 1 prob 2 min 1 max 2 prob 1 + prob 2 For each potential merge, calculate the distance Perform the smallest- distance merge

11 Smoothing examples: Normal distribution 5000 points 200 intervals 50 intervals 10 intervals

12 Smoothing examples: Geometric distribution 5000 points 200 intervals 10 intervals50 intervals

13 Running time Theoretical: O (n · log n) Practical: O (n)

14 Running time 3.4 GHz Pentium, C++ code (2.5 ± 0.5) · num-points microseconds Number of points Time (microsec) 10 2 10 4 10 6 10 2 10 4 10 6

15 Visual smoothing We convert a piecewise-uniform distribution to a smooth curve by spline fitting. The user usually prefers a smooth probability density. 10 20 30

16 Main results 10 20 30 10 20 30 10 20 30 Density estimation Lossy compression of generalized histograms

17 Advantages Explicit specification of - Distance measure - Compression level Effective representation for automated reasoning


Download ppt "Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20 30."

Similar presentations


Ads by Google