Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014 Diana Trandabat.

Slides:



Advertisements
Similar presentations
Lecture 18 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics.
Advertisements

Lecture Discrete Probability. 5.2 Recap Sample space: space of all possible outcomes. Event: subset of of S. p(s) : probability of element s of.
1 Press Ctrl-A ©G Dear2009 – Not to be sold/Free to use Tree Diagrams Stage 6 - Year 12 General Mathematic (HSC)
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
Copyright © Cengage Learning. All rights reserved. 8.6 Probability.
22C:19 Discrete Structures Discrete Probability Fall 2014 Sukumar Ghosh.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Pattern Classification, Chapter 1 1 Basic Probability.
UNDERSTANDING INDEPENDENT EVENTS Adapted from Walch Education.
CHAPTER 10: Introducing Probability
Stat 1510: Introducing Probability. Agenda 2  The Idea of Probability  Probability Models  Probability Rules  Finite and Discrete Probability Models.
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS Introduction.
Probability.
IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability.
14/6/1435 lecture 10 Lecture 9. The probability distribution for the discrete variable Satify the following conditions P(x)>= 0 for all x.
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
S.CP.A.1 Probability Basics. Probability - The chance of an event occurring Experiment: Outcome: Sample Space: Event: The process of measuring or observing.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Probability The calculated likelihood that a given event will occur
Copyright © Cengage Learning. All rights reserved. 8.6 Probability.
Chapter 3 Probability Larson/Farber 4th ed. Chapter Outline 3.1 Basic Concepts of Probability 3.2 Conditional Probability and the Multiplication Rule.
CHAPTER 10: Introducing Probability ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
22C:19 Discrete Structures Discrete Probability Spring 2014 Sukumar Ghosh.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Probability Distributions
CHAPTER 10: Introducing Probability ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Discrete Math Section 16.3 Use the Binomial Probability theorem to find the probability of a given outcome on repeated independent trials. Flip a coin.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Math 145 September 18, Terminologies in Probability  Experiment – Any process that produces an outcome that cannot be predicted with certainty.
Binomial Probability Theorem In a rainy season, there is 60% chance that it will rain on a particular day. What is the probability that there will exactly.
Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat.
Chapter 6 Probability Mohamed Elhusseiny
Lesson 10: Using Simulation to Estimate a Probability Simulation is a procedure that will allow you to answer questions about real problems by running.
CHAPTER 10: Introducing Probability
Terminologies in Probability
Math 145 October 5, 2010.
Math 145 June 9, 2009.
PROBABILITY AND PROBABILITY RULES
Discrete and Continuous Random Variables
Math 145 September 25, 2006.
Basic Probability aft A RAJASEKHAR YADAV.
Math 145.
Chapter 9 Section 1 Probability Review.
Terminologies in Probability
Math 145 February 22, 2016.
Lesson 10.1 Sample Spaces and Probability
CHAPTER 10: Introducing Probability
Terminologies in Probability
Terminologies in Probability
Random Variable Two Types:
Random Variables and Probability Distributions
Math 145 September 4, 2011.
Terminologies in Probability
Math 145 February 26, 2013.
Math 145 June 11, 2014.
Discrete & Continuous Random Variables
©G Dear 2009 – Not to be sold/Free to use
Math 145 September 29, 2008.
Math 145 June 8, 2010.
Math 145 October 3, 2006.
Math 145 June 26, 2007.
Terminologies in Probability
6.2 Probability Models.
Math 145 February 12, 2008.
Terminologies in Probability
Math 145 September 24, 2014.
Math 145 October 1, 2013.
Math 145 February 24, 2015.
Math 145 July 2, 2012.
Presentation transcript:

Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat

Intro to probabilities Probability deals with prediction: –Which word will follow in this....? –How can parses for a sentence be ordered? –Which meaning is more likely? –Which grammar is more linguistically plausible? –See phrase “more lies ahead”. How likely is it that “lies” is noun? –See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? Any rational decision can be described probabilistically.

Notations Experiment (or trial) – repeatable process by which observations are made –e.g. tossing 3 coins Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) Examples of sample spaces: one coin toss, sample space Ω = { H, T }; three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} part-of-speech of a word, Ω = {N, V, Adj, etc…} next word in Shakespeare play, |Ω| = size of vocabulary number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }

Notation An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 – A= ∅ is the impossible event P(A= ∅ ) = 0 – For “not A”, we write Ā

Intro to probablities

Intro to probablities The probability of an event is hard to compute. It is easily to compute the estimation of probability, marked ^p(x). When |X| , ^p(x)  P(x)

Intro to probabilities “A coin is tossed 3 times. What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} – the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8

Probability distribution A probability distribution is an assignment of probabilities from a set of outcomes. –A uniform distribution assigns the same probability to all outcomes (eg a fair coin). –A gaussian distribution assigns a bell-curve over outcomes. –Many others. –Uniform and gaussians popular in SNLP.

Joint probabilities

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6}

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3 p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 ==> X and Y are independents

Conditioned events Non independent events are called conditioned events. p(X|Y) == “the probability of having X if an Y event occurred. p(X|Y)=p(X,Y) /p(Y) p(X) == apriori probability(prior) p(X|Y) = posterior probability

Conditioned events

Are X and Y independent? p(X)=1/2, p(Y)=1/3, p(X,Y)=1/6, p(X |Y)= 1/2 ==> independent. Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=1  1/2 ==> non-indep.

Bayes’ Theorem Bayes’ Theorem lets us swap the order of dependence between events We saw that Bayes’ Theorem:

Example S:stiff neck, M: meningitis P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 I have stiff neck, should I worry?

Example S:stiff neck, M: meningitis P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 I have stiff neck, should I worry?

Other useful relations: p(x)=  p(x|y) *p(y) or p(x)=  p(x,y) y  Y y  Y Chain rule: p(x 1,x 2,…x n ) = p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) The demonstration is easy, through successive reductions: Consider event y as coincident of events x 1,x 2,…x n -1 p(x 1,x 2,…x n )= p(y, x n )=p(y)*p(x n | y)= p(x 1,x 2,…x n -1 )*p(x n | x 1,x 2,…x n -1 ) similar for the event z p(x 1,x 2,…x n -1 )= p(z, x n -1 )=p(z)*p(x n -1 | z)= p(x 1,x 2,…x n -2 )*p(x n -1 | x 1,x 2,…x n -2 )... p(x 1,x 2,…x n )= p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) prior bigram, trigram, n-gram

Objections People don’t compute probabilities. Why would computers? Or do they? John went to … the market go red if number

Objections Statistics only count words and co-occurrences Two different concepts: –Statistical model and statistical method The first doesn’t need the second one. A person which used the intuition to raison is using a statistical model without statistical methods. Objections refer mainly to the accuracy of statistical models.