Statistical NLP: Lecture 4

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Review of Basic Probability and Statistics
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Presenting: Assaf Tzabari
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
CHAPTER 6 Statistical Analysis of Experimental Data
Class notes for ISE 201 San Jose State University
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Standard error of estimate & Confidence interval.
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
PBG 650 Advanced Plant Breeding
Statistical Decision Theory
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Random Sampling, Point Estimation and Maximum Likelihood.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Theory of Probability Statistics for Business and Economics.
1 Foundations of Statistical Natural Language Processing By Christopher Manning & Hinrich Schutze Course Book.
Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 2nd Probability and its fundamental.
Engineering Probability and Statistics Dr. Leonore Findsen Department of Statistics.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Math b (Discrete) Random Variables, Binomial Distribution.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Once again about the science-policy interface. Open risk management: overview QRAQRA.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.
2. Introduction to Probability. What is a Probability?
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 2nd Probability and its fundamental properties Kazuyuki Tanaka Graduate School of Information.
3.1 Statistical Distributions. Random Variable Observation = Variable Outcome = Random Variable Examples: – Weight/Size of animals – Animal surveys: detection.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Probability Theory and Parameter Estimation I
Graduate School of Information Sciences, Tohoku University
Probability for Machine Learning
Appendix A: Probability Theory
Ch3: Model Building through Regression
Maximum Likelihood Estimation
Of Probability & Information Theory
Mathematical Foundations
Basic Probability Theory
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Advanced Artificial Intelligence
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
11. Conditional Density Functions and Conditional Expected Values
11. Conditional Density Functions and Conditional Expected Values
Mathematical Foundations of BME Reza Shadmehr
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Presentation transcript:

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory January 17, 2000

Notions of Probability Theory Probability theory deals with predicting how likely it is that something will happen. The process by which an observation is made is called an experiment or a trial. The collection of basic outcomes (or sample points) for our experiment is called the sample space. An event is a subset of the sample space. Probabilities are numbers between 0 and 1, where 0 indicates impossibility and 1, certainty. A probability function/distribution distributes a probability mass of 1 throughout the sample space. January 17, 2000

Conditional Probability and Independence Conditional probabilities measure the probability of events given some knowledge. Prior probabilities measure the probabilities of events before we consider our additional knowledge. Posterior probabilities are probabilities that result from using our additional knowledge. The chain rule relates intersection with conditionalization (important to NLP) Independence and conditional independence of events are two very important notions in statistics. January 17, 2000

Baye’s Theorem Baye’s Theorem lets us swap the order of dependence between events. This is important when the former quantity is difficult to determine. P(B|A) = P(A|B)P(B)/P(A) P(A) is a normalization constant. January 17, 2000

Random Variables A random variable is a function X: sample space --> Rn A discrete random variable is a function X: sample space --> S where S is a countable subset of R. If X: sample space --> {0,1}, then X is called a Bernoulli trial. The probability mass function for a random variable X gives the probability that the random variable has different numeric values. January 17, 2000

Expectation and Variance The expectation is the mean or average of a random variable. The variance of a random variable is a measure of whether the values of the random variable tend to be consistent over trials or to vary a lot. January 17, 2000

Joint and Conditional Distributions More than one random variable can be defined over a sample space. In this case, we talk about a joint or multivariate probability distribution. The joint probability mass function for two discrete random variables X and Y is: p(x,y)=P(X=x, Y=y) The marginal probability mass function totals up the probability masses for the values of each variable separately. Similar intersection rules hold for joint distributions as for events. January 17, 2000

Estimating Probability Functions What is the probability that sentence “The cow chewed its cud” will be uttered? Unknown ==> P must be estimated from a sample of data. An important measure for estimating P is the relative frequency of the outcome, i.e., the proportion of times a certain outcome occurs. Assuming that certain aspects of language can be modeled by one of the well-known distribution is called using a parametric approach. If no such assumption can be made, we must use a non-parametric approach. January 17, 2000

Standard Distributions In practice, one commonly finds the same basic form of a probability mass function, but with different constants employed. Families of pmfs are called distributions and the constants that define the different possible pmfs in one family are called parameters. Discrete Distributions: the binomial distribution, the multinomial distribution, the Poisson distribution. Continuous Distributions: the normal distribution, the standard normal distribution. January 17, 2000

Baysian Statistics I: Bayesian Updating Assume that the data are coming in sequentially and are independent. Given an a-priori probability distribution, we can update our beliefs when a new datum comes in by calculating the Maximum A Posteriori (MAP) distribution. The MAP probability becomes the new prior and the process repeats on each new datum. January 17, 2000

Bayesian Statistics II: Bayesian Decision Theory Bayesian Statistics can be used to evaluate which model or family of models better explains some data. We define two different models of the event and calculate the likelihood ratio between these two models. January 17, 2000