Estimating the Unseen: Sublinear Statistics Paul Valiant.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Approximate Nash Equilibria in interesting games Constantinos Daskalakis, U.C. Berkeley.
Let X 1, X 2,..., X n be a set of independent random variables having a common distribution, and let E[ X i ] = . then, with probability 1 Strong law.
Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Visual Recognition Tutorial
Review of Basic Probability and Statistics
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Estimation of parameters. Maximum likelihood What has happened was most likely.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Parametric Inference.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Some standard univariate probability distributions
CHAPTER 6 Statistical Analysis of Experimental Data
The moment generating function of random variable X is given by Moment generating function.
Some standard univariate probability distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Bayesian Analysis of Dose-Response Calibration Curves Bahman Shafii William J. Price Statistical Programs College of Agricultural and Life Sciences University.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Some standard univariate probability distributions
The Lognormal Distribution
Modern Navigation Thomas Herring
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Body size distribution of European Collembola Lecture 9 Moments of distributions.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Some standard univariate probability distributions Characteristic function, moment generating function, cumulant generating functions Discrete distribution.
Random Sampling, Point Estimation and Maximum Likelihood.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011.
Probabilistic Graphical Models
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
HYPOTHESIS TESTING Distributions(continued); Maximum Likelihood; Parametric hypothesis tests (chi-squared goodness of fit, t-test, F-test) LECTURE 2 Supplementary.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
ENGR 610 Applied Statistics Fall Week 4 Marshall University CITE Jack Smith.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Sampling Theory and Some Important Sampling Distributions.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Modeling and Simulation CS 313
Stat 223 Introduction to the Theory of Statistics
Sampling and Sampling Distributions
Statistical Estimation
Oliver Schulte Machine Learning 726
Stat 223 Introduction to the Theory of Statistics
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Modeling and Simulation CS 313
Chapter 7: Sampling Distributions
Inferring Population Parameters
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Testing with Alternative Distances
Stat 223 Introduction to the Theory of Statistics
12. Principles of Parameter Estimation
Further Topics on Random Variables: 1
Berlin Chen Department of Computer Science & Information Engineering
Further Topics on Random Variables: Derived Distributions
Presentation transcript:

Estimating the Unseen: Sublinear Statistics Paul Valiant

Fisher’s Butterflies Turing’s Enigma Codewords How many new species if I observe for another period? Probability mass of unseen codewords F 1 -F 2 +F 3 -F 4 +F 5 - … F 1 /(number of samples) (“Fingerprint”)

Characteristic Functions For element p i : Pr[Not seen in first period, but seen in second period] Pr[Not seen]*p i F 1 -F 2 +F 3 -F 4 +F 5 - … F 1 /(number of samples)

Other Properties? Entropy: p i log p i Support size: step function log p i 1/p i 0 Accurate to O(1) for x=Ω(1)  linear samples Exponentially hard to approximate below 1/k Easier case? L 2 norm  p i 2

L 2 approximation Works very well if we have a bound on the j’s encountered Yields 1-sided testers for L 1, also, L 1 - distance to uniform, also, L 1 -distance to arbitrary known distribution [Batu, Fortnow, Rubinfeld, Smith, White, ‘00]

Are good testers computationally trivial?

Maximum Likelihood Distributions [Orlitsky et al., Science, etc]

Relaxing the Problem Given {F j }, find a distribution p such that the expected fingerprint of k samples from p approximates F j By concentration bounds, the “right” distribution should also satisfy this, be in the feasible region of the linear program Yields: n/log n-sample estimators for entropy, support size, L 1 distance, anything similar Does the extra computational power help??

Lower Bounds DUAL “Find distributions with very different property values, but almost identical fingerprint expectations” NEEDS: Theorem: close expected fingerprints  indistinguishable [Raskhodnikova, Ron, Shpilka, Smith’07]

“Roos’s Theorem” Generalized Multinomial Distributions Includes fingerprint distributions Also: binomial distributions, multinomial distributions, and any sums of such distributions. “Generalized Multinomial Distributions” appear all over CS, and characterizing them is central to many papers (for example, Daskalakis and Papadimitriou, Discretized multinomial distributions and Nash equilibria in anonymous games, FOCS 2008.) Comment:

Distributions of Rare Elements – provided every element is rare, even in k samples Yields best known lower-bounds for non-trivial 1-sided testing problems: Ω(n 2/3 ) for L 1 distance, Ω(n 2/3 m 1/3 ) for “independence” Note: impossible to confuse >log n with o(1). Can cut off above log n? Suggests these lower bounds are tight to within log n. Can we do better?

A Better Central Limit Theorem (?) Roos’s Theorem: Fingerprints are like Poissons (provided…) Poissons: 1-parameter family Gaussians: 2-parameter family New CLT: Fingerprints are like Gaussians (provided variance is high enough in every direction) How to ensure high variance? “Fatten” distributions by adding elements at many different probabilities.  can’t use for 1-sided bounds 

Results All testers are linear expressions in the fingerprint

Duality DUAL Yields estimator when d<½ Yields lower bound when d>½

Open Problems Dependence on ε (resolved for entropy) Beyond additive estimates – “case-by-case optimal”? We suspect linear programming is better than linear estimators Leveraging these results for non-symmetric properties Monotonicity, with respect to different posets Practical applications!