Statistical NLP Course for Master in Computational Linguistics 2nd Year 2015-2016 Diana Trandabat.

Slides:



Advertisements
Similar presentations
Lecture 18 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics.
Advertisements

Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat.
1 Press Ctrl-A ©G Dear2009 – Not to be sold/Free to use Tree Diagrams Stage 6 - Year 12 General Mathematic (HSC)
Copyright © Cengage Learning. All rights reserved. 8.6 Probability.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
22C:19 Discrete Structures Discrete Probability Fall 2014 Sukumar Ghosh.
Chapter 4 Using Probability and Probability Distributions
Chapter 3 Section 3.3 Basic Rules of Probability.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Probability Probability Principles of EngineeringTM
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
UNDERSTANDING INDEPENDENT EVENTS Adapted from Walch Education.
CHAPTER 10: Introducing Probability
2. Mathematical Foundations
Stat 1510: Introducing Probability. Agenda 2  The Idea of Probability  Probability Models  Probability Rules  Finite and Discrete Probability Models.
1 Algorithms CSCI 235, Fall 2012 Lecture 9 Probability.
14/6/1435 lecture 10 Lecture 9. The probability distribution for the discrete variable Satify the following conditions P(x)>= 0 for all x.
S.CP.A.1 Probability Basics. Probability - The chance of an event occurring Experiment: Outcome: Sample Space: Event: The process of measuring or observing.
CPSC 531: Probability Review1 CPSC 531:Probability & Statistics: Review Instructor: Anirban Mahanti Office: ICT Class.
Probability The calculated likelihood that a given event will occur
November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence.
Copyright © Cengage Learning. All rights reserved. 8.6 Probability.
Chapter 3 Probability Larson/Farber 4th ed. Chapter Outline 3.1 Basic Concepts of Probability 3.2 Conditional Probability and the Multiplication Rule.
CHAPTER 10: Introducing Probability ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
22C:19 Discrete Structures Discrete Probability Spring 2014 Sukumar Ghosh.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Random Variables an important concept in probability.
MM207 Statistics Welcome to the Unit 7 Seminar With Ms. Hannahs.
12/7/20151 Probability Introduction to Probability, Conditional Probability and Random Variables.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Binomial Distribution
Probability Distributions
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Chapter 3 Section 3.7 Independence. Independent Events Two events A and B are called independent if the chance of one occurring does not change if the.
CHAPTER 10: Introducing Probability ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Discrete Math Section 16.3 Use the Binomial Probability theorem to find the probability of a given outcome on repeated independent trials. Flip a coin.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Math 145 September 18, Terminologies in Probability  Experiment – Any process that produces an outcome that cannot be predicted with certainty.
Binomial Probability Theorem In a rainy season, there is 60% chance that it will rain on a particular day. What is the probability that there will exactly.
Chapter 6 Probability Mohamed Elhusseiny
Lesson 10: Using Simulation to Estimate a Probability Simulation is a procedure that will allow you to answer questions about real problems by running.
CHAPTER 10: Introducing Probability
Terminologies in Probability
Probability David Kauchak CS158 – Fall 2013.
Copyright © 2016, 2013, and 2010, Pearson Education, Inc.
PROBABILITY AND PROBABILITY RULES
Natural Language Processing
Math 145 September 25, 2006.
Basic Probability aft A RAJASEKHAR YADAV.
Natural Language Processing
Introduction to Probability
Terminologies in Probability
Lesson 10.1 Sample Spaces and Probability
Statistical Inference for Managers
CHAPTER 10: Introducing Probability
Terminologies in Probability
Terminologies in Probability
WARM - UP After an extensive review of weather related accidents an insurance company concluded the following results: An accident has a 70% chance of.
Terminologies in Probability
Discrete & Continuous Random Variables
©G Dear 2009 – Not to be sold/Free to use
Probability Probability Principles of EngineeringTM
Tesla slashed Model S and X staff in recent layoffs
Math 145 October 3, 2006.
Math 145 June 26, 2007.
Terminologies in Probability
6.2 Probability Models.
Math 145 February 12, 2008.
Terminologies in Probability
Presentation transcript:

Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat

Intro to probabilities Probability deals with prediction: –Which word will follow in this....? –How can parses for a sentence be ordered? –Which meaning is more likely? –Which grammar is more linguistically plausible? –See phrase “more lies ahead”. How likely is it that “lies” is noun? –See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? Any rational decision can be described probabilistically.

Notations Experiment (or trial) – repeatable process by which observations are made –e.g. tossing 3 coins Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) Examples of sample spaces: one coin toss, sample space Ω = { H, T }; three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} part-of-speech of a word, Ω = {N, V, Adj, etc…} next word in Shakespeare play, |Ω| = size of vocabulary number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }

Notation An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 – A= ∅ is the impossible event P(A= ∅ ) = 0 – For “not A”, we write Ā

Intro to probablities

Intro to probabilities “A coin is tossed 3 times. What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} – the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8

Probability distribution A probability distribution is an assignment of probabilities from a set of outcomes. –A uniform distribution assigns the same probability to all outcomes (eg a fair coin). –A gaussian distribution assigns a bell-curve over outcomes. –Many others. –Uniform and gaussians popular in SNLP.

Joint probabilities

Probabilities as sets P(A|B) = P(A∩B) / P(B) P(A∩B)= P(A|B) * P(B) P(B|A) = P(A∩B) / P(B) P(B∩A)= P(A∩B) = P(B|A) * P(A) = P(A|B) * P(B) AA ∩ BB

Probabilities as sets P(A|B) = P(A∩B) / P(B) P(A∩B)= P(A|B) * P(B) P(B|A) = ? P(B∩A)= P(A∩B) = P(B|A) * P(A) = P(A|B) * P(B) AA ∩ BB

Probabilities as sets P(A|B) = P(A∩B) / P(B) P(A∩B)= P(A|B) * P(B) P(B|A) = P(A∩B) / P(B) P(B∩A)= P(A∩B) = P(B|A) * P(A) = P(A|B) * P(B) AA ∩ BB Multiplication rule

Probabilities as sets P(A) = P(A∩B) + P(A∩B) P(A) = P(A|B) * P(B) + P(A|B) * P(B) AA ∩ BB Additivity rule

Bayes’ Theorem Bayes’ Theorem lets us swap the order of dependence between events We saw that Bayes’ Theorem:

Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.

Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6}

Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3

Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3 p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 ==> X and Y are independents

Independent events Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=1  1/2 ==> non-indep.

Other useful relations: p(x)=  p(x|y) *p(y) or p(x)=  p(x,y) y  Y y  Y Chain rule: p(x 1,x 2,…x n ) = p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) The demonstration is easy, through successive reductions: Consider event y as coincident of events x 1,x 2,…x n -1 p(x 1,x 2,…x n )= p(y, x n )=p(y)*p(x n | y)= p(x 1,x 2,…x n -1 )*p(x n | x 1,x 2,…x n -1 ) similar for the event z p(x 1,x 2,…x n -1 )= p(z, x n -1 )=p(z)*p(x n -1 | z)= p(x 1,x 2,…x n -2 )*p(x n -1 | x 1,x 2,…x n -2 )... p(x 1,x 2,…x n )= p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) prior bigram, trigram, n-gram

Objections People don’t compute probabilities. Why would computers? Or do they? John went to … the market go red if number

Objections Statistics only count words and co-occurrences Two different concepts: –Statistical model and statistical method The first doesn’t need the second one. A person which used the intuition to raison is using a statistical model without statistical methods. Objections refer mainly to the accuracy of statistical models.

Reference Christopher D. Manning and Hinrich Schiitze, Fundations of Statistical Natural Language ProcessingFundations of Statistical Natural Language Processing

Great! P(See you next time)…=

Great! P(See you next time)=…