Probability and Measure September 2, 2011. Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.

Slides:



Advertisements
Similar presentations
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Advertisements

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Maximum likelihood (ML) and likelihood ratio (LR) test
Ch 5.2: Series Solutions Near an Ordinary Point, Part I
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Visual Recognition Tutorial
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Evaluating Hypotheses
Statistical Background
Thanks to Nir Friedman, HU
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Maximum likelihood (ML)
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Simulation Output Analysis
01/24/05© 2005 University of Wisconsin Last Time Raytracing and PBRT Structure Radiometric quantities.
Estimation Basic Concepts & Estimation of Proportions
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Convergence in Distribution
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Section 7.7 Improper Integrals. So far in computing definite integrals on the interval a ≤ x ≤ b, we have assumed our interval was of finite length and.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Generalized Linear Models (GLMs) and Their Applications.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Ch3: Model Building through Regression
Lebesgue measure: Lebesgue measure m0 is a measure on i.e., 1. 2.
More about Posterior Distributions
LECTURE 07: BAYESIAN ESTIMATION
Presentation transcript:

Probability and Measure September 2, 2011

Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued random variable) P ( X | E ) ≈ P ( E | X ) P ( X ) P ( X | E ) is the posterior distribution given evidence E. P ( E | X ) is the likelihood of E given the distribution X. P ( X ) is the prior distribution of X. For given data E, we don’t assume any particular form for X. For example, X needs not to be Gaussian, or other particular family of distributions.

1 2 3 P ( X | E ) ≈ P ( E | X ) P ( X ) As a simple example, consider a set with three elements {1, 2, 3 }. Suppose we observe a sequence of occurrences { 1, 1, 2, 2, 1, 2, 1, 2, 3, 1, 2, 2, 1, 2, 1} and we believe that the there is an unknown probability distribution X the drives this process. That is, the points are generated independently by sampling the distribution X. The goal is to estimate X. Clearly, one solution is X ≈ ( 7/15, 7/15, 1/15). This is an empirical estimate. For a small number of data point, this can be biased. Suppose we have a prior distribution on X (A distribution on distributions), Say a Dirichlet distirbution (with parameter alpha). How does the posterior change?

Nonparametric Bayesian The goal is to do similar thing with a more general base set than {1, 2, 3}. For example, with the set of real numbers R, or the unit interval [0, 1 ] as the base set. Difficulties: How to define a distribution over the set of distributions ? How to define integral on the set of distributions ? How to define convergence on the set of distributions ? What is the appropriate (mathematical) language ?

Probability as Integral We all know the given a distribution P(x) over R, say a normal distribution, the probability for the corresponding random variable X to have value insider an interval I is There are two problems: How does one define the integral (existence)? (Much more important) How does one compute the integral?

Riemann Integral We divide the interval into small subinterval (mesh) and compute the Riemann sum. The limit of the Riemann sum when the size of the mesh goes to zero gives us the integral. It can be proved that for continuous function f, its Riemann integral always exists. What are the shortcomings? What if we can’t define the mesh? Simple functions don’t have Riemann integrals.

Recall that our goal (an ambitious one) is to do probability on complicated space (not just R n ! ) In general, it is difficult to carry out Riemann’s definition to more general spaces Even in R 1, there are functions (although not continuous) that one should be able to integrate but can’t. For example, a function on [0, 1] F ( x ) = 1 if x is rational, 0 otherwise. This function is not integrable in the Riemann sense (why?) Therefore, one can’t even talk about

Modern Approach (Lebesque Integral) We want to integrate a (real-valued) function F(x) defined on some (abstract) space X. F: X  R Here, it is the range ( R ) that is a familiar space. The domain can be arbitrary. So if we are going to divide anything, it has to be in the range, not the domain. What data do we need to specify on X in order to define the integral ?

Measurable Spaces and Measureable Functions This is main component of Theory of Lebesque Integral It is a very general theory in the following sense: X can be any arbitrary set. On X, there are two things that have been specified: Σ: a collection of subsets of X, called its sigma-algebra μ : a measure, which is a function μ : Σ  R >=0 satisfying some properties. If you given the triple ( X, Σ, μ), then you are at least able to talk about