Albert Gatt Corpora and statistical methods. In this lecture Overview of rules of probability multiplication rule subtraction rule Probability based on.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
Advertisements

SI485i : NLP Day 2 Probability Review. Introduction to Probability Experiment (trial) Repeatable procedure with well-defined possible outcomes Outcome.
Probability Sample Space Diagrams.
Lecture 1, Part 2 Albert Gatt Corpora and statistical methods.
1 1 PRESENTED BY E. G. GASCON Introduction to Probability Section 7.3, 7.4, 7.5.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
Introduction to Probability
Chapter 4 Using Probability and Probability Distributions
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Dependent and Independent Events. If you have events that occur together or in a row, they are considered to be compound events (involve two or more separate.
4.2 Probability Models. We call a phenomenon random if individual outcomes are uncertain but there is nonetheless a regular distribution of outcomes in.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
CONDITIONAL PROBABILITY and INDEPENDENCE In many experiments we have partial information about the outcome, when we use this info the sample space becomes.
Chapter 4 Probability.
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
Probability & Certainty: Intro Probability & Certainty.
Econ 140 Lecture 51 Bivariate Populations Lecture 5.
Chapter 6 Probabilit y Vocabulary Probability – the proportion of times the outcome would occur in a very long series of repetitions (likelihood of an.
Chapter 6 Probability.
Chapter 4 Probability See.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
5.1 Basic Probability Ideas
Conditional Probability and Independence If A and B are events in sample space S and P(B) > 0, then the conditional probability of A given B is denoted.
X of Z: MAJOR LEAGUE BASEBALL ATTENDANCE Rather than solving for z score first, we may be given a percentage, then we find the z score, then we find the.
“PROBABILITY” Some important terms Event: An event is one or more of the possible outcomes of an activity. When we toss a coin there are two possibilities,
Math The Multiplication Rule for P(A and B)
1 Probability. 2 Today’s plan Probability Notations Laws of probability.
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin A Survey of Probability Concepts Chapter 5.
Worked examples and exercises are in the text STROUD PROGRAMME 28 PROBABILITY.
Topic 4A: Independent and Dependent Events Using the Product Rule
Probability Rules!! Chapter 15.
CHAPTER 12: General Rules of Probability Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
C4, L1, S1 Chapter 3 Probability. C4, L1, S2 I am offered two lotto cards: –Card 1: has numbers –Card 2: has numbers Which card should I take so that.
Probability INDEPENDENT EVENTS. Independent Events  Life is full of random events!  You need to get a "feel" for them to be a smart and successful person.
Copyright © 2010 Pearson Education, Inc. Chapter 6 Probability.
1 Chapter 4 – Probability An Introduction. 2 Chapter Outline – Part 1  Experiments, Counting Rules, and Assigning Probabilities  Events and Their Probability.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
From Randomness to Probability Chapter 14. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Math I.  Probability is the chance that something will happen.  Probability is most often expressed as a fraction, a decimal, a percent, or can also.
Math I.  Probability is the chance that something will happen.  Probability is most often expressed as a fraction, a decimal, a percent, or can also.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
EMIS 7300 SYSTEMS ANALYSIS METHODS Spring 2006 Dr. John Lipp Copyright © Dr. John Lipp.
Lecture 1, Part 2 Albert Gatt Corpora and statistical methods.
Probability. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability of it raining.
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Multiplication Rule Statistics B Mr. Evans. Addition vs. Multiplication Rule The addition rule helped us solve problems when we performed one task and.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Probability Rules!
Chance We will base on the frequency theory to study chances (or probability).
Probability. Definitions Probability: The chance of an event occurring. Probability Experiments: A process that leads to well- defined results called.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
AP Statistics From Randomness to Probability Chapter 14.
1 What Is Probability?. 2 To discuss probability, let’s begin by defining some terms. An experiment is a process, such as tossing a coin, that gives definite.
Aim: What is the multiplication rule?
Natural Language Processing
PROBABILITY.
Probability of Independent Events
Multiplication Rule and Conditional Probability
A Survey of Probability Concepts
Natural Language Processing
Introduction to Probability
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
CSCI 5832 Natural Language Processing
Probability Probability underlies statistical inference - the drawing of conclusions from a sample of data. If samples are drawn at random, their characteristics.
Honors Statistics From Randomness to Probability
Presentation transcript:

Albert Gatt Corpora and statistical methods

In this lecture Overview of rules of probability multiplication rule subtraction rule Probability based on prior knowledge conditional probability Bayes’ theorem

Conditional probability and independence Part 1

Prior knowledge Sometimes, an estimation of the probability of something is affected by what is known. cf. the many linguistic examples in Jurafsky Example: Part-of-speech tagging Task: Assign a label indicating the grammatical category to every word in a corpus of running text. one of the classic tasks in statistical NLP

Part-of-speech tagging example Statistical POS taggers are first trained on data that has been previously annotated. Yields a language model. Language models vary based on the n-gram window: unigrams: probability based on tokens (a lexicon) E.g. input = the_DET tall_ADJ man_NN model represents the probability that the word man is a noun (NB: it could also be a verb) bigrams: probabilities across a span of 2 words input = the_DET tall_ADJ man_NN model represents the probability that a DET is followed by an adjective, adjective is followed by a noun, etc. Can also do trigrams, quadrigrams etc.

POS tagging continued Suppose we’ve trained a tagger on annotated data. It has: a lexicon of unigrams: P(the=DET), P(man=NN), etc a bigram model P(DET is followed by ADJ), etc Assume we’ve trained it on a large input sample. We now feed it a new phrase: the audacious alien Our tagger knows that the word the is a DET, but it’s never seen the other words. It can: make a wild guess (not very useful!) estimate the probability that the is followed by an ADJ, and that an ADJ is followed by a NOUN

Prior knowledge revisited Given that I know that the is DET, what’s the probability that the following word audacious is ADJ? This is very different from asking what’s the probability that audacious is ADJ out of context. We have prior knowledge that DET has occurred. This can significantly change the estimate of the probability that audacious is ADJ. We therefore distinguish: prior probability: “Naïve” estimate based on long-run frequency posterior probability: probability estimate based on prior knowledge

Conditional probability In our example, we were estimating: P(ADJ|DET) = probability of ADJ given DET P(NN|DET) = probability of NN given DET etc In general: the conditional probability P(A|B) is the probability that A occurs, given that we know that B has occurred

Example continued If I’ve just seen a DET, what’s the probability that my next word is an ADJ? Need to take into account: occurrences of ADJ in our training data VV+ADJ (was beautiful), PP+ADJ (with great concern), DET+ADJ etc occurrences of DET in our training corpus DET+N (the man), DET+V (the loving husband), DET+ADJ (the tall man)

Venn Diagram representation of the bigram training data AB the+tall a+simple an+excellent the+man the+woman a+road is+tall in+terrible were+nice Cases where w is ADJ NOT preceded by DET Cases where w is a DET NOT followed by ADJ Cases where w is a DET followed by ADJ

Estimation of conditional probability Intuition: P(A|B) is a ratio of the chances that both A and B happen, by the chances of B happening alone. P(ADJ|DET) = P(DET+ADJ) / P(DET)

Another example If we throw a die, what’s the probability that the number we get is even, given that the number we get is larger than 4? works out as the probability of getting the number 6 P(even|>4) = P(even & >4)/P(>4) = (1/6) / (2/6) = ½ = 0.5 Note the difference from simple, prior probability. Using only frequency, P(6)= 1/6

Mind the fallacies! When we speak of “prior” and “posterior”, we don’t necessarily mean “in time” e.g. the die example Monte Carlo fallacy: if 20 turns of the roulette wheel have fallen on black, what are the chances that the next turn will fall on red? in reality, prior experience here makes no difference at all every turn of the wheel is independent from every other

The multiplication rule

Multiplying probabilities Often, we’re interested in switching the conditional probability estimate around. Suppose we know P(A|B) or P(B|A) We want to calculate P(A AND B) For both A and B to occur, they must occur in some sequence (first A occurs, then B)

Estimating P(A AND B) Probability that both A and B occur Probability of A happening overall Probability of B happening given that A has happened

Multiplication rule: example 1 We have a standard deck of 52 cards What’s the probability of pulling out two aces in a row? NB Standard deck has 4 aces Let A1 stand for “an ace on the first pick”, A2 for “an ace on the second pick” We’re interested in P(A1 AND A2)

Example 1 continued P(A1 AND A2) = P(A1)P(A2|A1) P(A1) = 4/52 (since there are 4 aces in a 52-card pack) If we do pick an ace on the first pick, then we diminish the odds of picking a second ace (there are now 3 aces left in a 51-card pack). P(A2|A1) = 3/51 Overall: P(A1 AND A2) = (4/52) (3/51) =.0045

Example 2 We randomly pick two words, w1 and w2, out of a tagged corpus. What are the chances that both words are adjectives? Let ADJ be the set of all adjectives in the corpus (tokens, not types) |ADJ| = total number of adjectives A1 = the event of picking out an ADJ on the first try A2 = the event of picking out an ADJ on second try P(A1 AND A2) is estimated in the same way as per the previous example: in the event of A1, the chances of A2 are diminished the multiplication rule takes this into account

Some observations In these examples, the two events are not independent of eachother occurrence of one affects likelihood of the other e.g. drawing an ace first diminishes the likelihood of drawing a second ace this is sampling without replacement if we put the ace back into the pack after we’ve drawn it, then we have sampling with replacement In this case, the probability of one event doesn’t affect the probability of the other.

Extending the multiplication rule The logic of the “A AND B” rule is: Both conditions, A and B have to be met A is met a fraction of the time B is met a fraction of the times that A is met Can be extended indefinitely E.g. chances of drawing 4 straight aces from a pack P(A1 & A2 & A3 & A4) = P(A1) P(A2|A1) P(A3|A1 & A2) P(A4|A1 & A2 & A3)

The subtraction rule

Extending the addition rule It’s easy to extend the multiplication rule. Extending the addition rule isn’t so easy. We need to correct for double-counting events.

Example P(A OR B OR C) A B C Once we’ve discounted the 2-way intersection of A and B, etc, we need to recount the 3-way intersection!

Subtraction rule Fundamental underlying observation: E.g. Probability of getting at least one head in 3 flips of a coin (a three-set addition problem) Can be estimated using the observation that: P(Head out of 3 flips) = 1-P(no heads) = 1-P(3 tails)

Bayes’ theorem Part 4

Switching conditional probabilities Problem 1: We know the probability that a test will give us “positive” in case a person has a disease. We want to know the probability that there is indeed a disease, given that the test says “positive” Useful for finding “false positives” Problem 2: We know the probability P(ADJ|DET) that some word w2 is an ADJ, given that the previous word w1 is a DET We find a new word w’. We don’t know its category. It might be a DET. We do know that the following word is an ADJ. We would therefore like to know the “reverse”, i.e. P(DET|ADJ)

Deriving Bayes’ rule from the multiplication rule Given symmetry of intersection, multiplication rule can be written in two ways: Bayes’ rule involves the substitution of one equation into the other, to replace P(A and B)

Deriving P(A) Often, it’s not clear where P(A) should come from we start out from conditional probabilities! Given that we have two sets of outcomes of interest, A and B, P(A) can be derived from the following observation: i.e. The events in A are made up of those which are only in A (but not in B) and those which are in both A and B.

Finding P(A) -- I A B P(A) must either be in one or the other (or both), since A is composed of these two sets.

Finding P(A) -- II Step 1: Applying the addition rule: Step 2: Substituting into Bayes’ equation to replace P(A):

Summary This ends our first foray into the rules of probability addition rule subtraction & multiplication rule conditional probability Bayes’ theorem

Next up… Probability distributions Random variables Basic information theory