Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell 10 20 30.

Slides:



Advertisements
Similar presentations
The Normal Distribution
Advertisements

Statistics 1: Introduction to Probability and Statistics Section 3-3.
Markov-Chain Monte Carlo
1 Set #3: Discrete Probability Functions Define: Random Variable – numerical measure of the outcome of a probability experiment Value determined by chance.
Scheduling with uncertain resources Search for a near-optimal solution Eugene Fink, Matthew Jennings, Ulaş Bardak, Jean Oh, Stephen Smith, and Jaime Carbonell.
Scheduling with uncertain resources Elicitation of additional data Ulaş Bardak, Eugene Fink, Chris Martens, and Jaime Carbonell Carnegie Mellon University.
Rate-Distortion Optimal Skeleton-Based Shape Coding Haohong Wang, Aggelos K. Katsaggelos, and Thrasyvoulos N. Pappas Image Processing, Proceedings.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Sampling Distributions
Scheduling with uncertain resources: Representation and utility function Ulas Bardak, Eugene Fink, and Jaime Carbonell Reflective Agent with Distributed.
Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.
Organizing and Graphing Quantitative Data Sections 2.3 – 2.4.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Probability Distributions Continuous Random Variables.
The Central Limit Theorem For simple random samples from any population with finite mean and variance, as n becomes increasingly large, the sampling distribution.
Classification and Prediction: Regression Analysis
LECTURE UNIT 4.3 Normal Random Variables and Normal Probability Distributions.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
L7.1b Continuous Random Variables CONTINUOUS RANDOM VARIABLES NORMAL DISTRIBUTIONS AD PROBABILITY DISTRIBUTIONS.
Unit 5: Modelling Continuous Data
7.1 Discrete and Continuous Random Variable.  Calculate the probability of a discrete random variable and display in a graph.  Calculate the probability.
Probability theory 2 Tron Anders Moger September 13th 2006.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Chapter 7 Lesson 7.6 Random Variables and Probability Distributions 7.6: Normal Distributions.
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 6 Some Continuous Probability Distributions.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
CS654: Digital Image Analysis
SampleFK ChiMerge Discretization Statistical approach to Data Discretization Applies the Chi.
§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.
Lecture 2.  A descriptive technique  An organized tabulation showing exactly how many individuals are located in each category on the scale of measurement.
INTRODUCTORY MATHEMATICAL ANALYSIS For Business, Economics, and the Life and Social Sciences  2011 Pearson Education, Inc. Chapter 16 Continuous Random.
Random Variables Ch. 6. Flip a fair coin 4 times. List all the possible outcomes. Let X be the number of heads. A probability model describes the possible.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
ES 07 These slides can be found at optimized for Windows)
Binary logistic regression. Characteristic Regression model for target categorized variable explanatory variables – continuous and categorical Estimate.
Check it out! : Standard Normal Calculations.
Special Topics in Geo-Business Data Analysis Week 3 Covering Topic 6 Spatial Interpolation.
ELEC 303, Koushanfar, Fall’09 ELEC 303 – Random Signals Lecture 8 – Continuous Random Variables: PDF and CDFs Farinaz Koushanfar ECE Dept., Rice University.
Scheduling with Uncertain Resources Eugene Fink, Jaime G. Carbonell, Ulas Bardak, Alex Carpentier, Steven Gardiner, Andrew Faulring, Blaze Iliev, P. Matthew.
Unit 4 Review. Starter Write the characteristics of the binomial setting. What is the difference between the binomial setting and the geometric setting?
11.3 CONTINUOUS RANDOM VARIABLES. Objectives: (a) Understand probability density functions (b) Solve problems related to probability density function.
MATHPOWER TM 12, WESTERN EDITION Chapter 9 Probability Distributions
What does data from a normal distribution look like? The shape of histograms developed from small samples drawn from a normal population are somewhat.
ChiMerge Discretization
Chapter 7. Classification and Prediction
Random Variable 2013.
Random Variables and Probability Distribution (2)
Data Mining: Concepts and Techniques
ERGM conditional form Much easier to calculate delta (change statistics)
Distributions cont.: Continuous and Multivariate
Monte Carlo Simulation Managing uncertainty in complex environments.
Clustering.
Suppose you roll two dice, and let X be sum of the dice. Then X is
Scheduling with uncertain resources Search for a near-optimal solution
Chapter 10 - Introducing Probability
Sampling Distribution of a Sample Mean
Chapter 6 Some Continuous Probability Distributions.
Data Transformations targeted at minimizing experimental variance
Introduction to Probability Distributions
AP Statistics Chapter 16 Notes.
7.1: Discrete and Continuous Random Variables
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Satomi Matsuoka, Tatsuo Shibata, Masahiro Ueda  Biophysical Journal 
Introduction to Probability Distributions
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Analysis of Uncertain Data: Smoothing of Histograms Eugene Fink Ankur Sarin Jaime G. Carbonell

Density estimate problem Convert a set of numeric data points to a smoothed approximation of the underlying probability density Example Points

Techniques Manual estimates Histograms Curve fitting

Generalized histograms chance: [ ] 0.5 chance: [ ] 0.3 chance: [ ] General form prob 1 : [min 1.. max 1 ] prob 2 : [min 2.. max 2 ] … prob n : [min n.. max n ] Intervals do not overlap Probabilities sum to 1.0

Special cases Standard histogram Set of points Weighted points

Smoothing problem Given a generalized histogram, construct its coarser approximation

Input Initial distribution: A point set or a fine-grained histogram Distance function: A measure of similarity between distributions Target size: The number of intervals in an approximation

Standard distance measures Simple difference: ∫ | p(x) − q(x) | dx Kullback-Leibler: ∫ p(x) · log (p(x) / q(x)) dx Jensen-Shannon: (Kullback-Leibler (p, (p+q)/2) + Kullback-Leibler (q, (p+q)/2)) / 2

Smoothing algorithm Repeat: Merge two adjacent intervals Until the histogram has the right size

Interval merging min 1 min 2 max 1 max 2 prob 1 prob 2 min 1 max 2 prob 1 + prob 2 For each potential merge, calculate the distance Perform the smallest- distance merge

Smoothing examples: Normal distribution 5000 points 200 intervals 50 intervals 10 intervals

Smoothing examples: Geometric distribution 5000 points 200 intervals 10 intervals50 intervals

Running time Theoretical: O (n · log n) Practical: O (n)

Running time 3.4 GHz Pentium, C++ code (2.5 ± 0.5) · num-points microseconds Number of points Time (microsec)

Visual smoothing We convert a piecewise-uniform distribution to a smooth curve by spline fitting. The user usually prefers a smooth probability density

Main results Density estimation Lossy compression of generalized histograms

Advantages Explicit specification of - Distance measure - Compression level Effective representation for automated reasoning