Methods in Computational Linguistics II Queens College Lecture 2: Counting Things.

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

Reasoning under Uncertainty: Marginalization, Conditional Prob., and Bayes Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 6, 2013.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Artificial Intelligence Uncertainty
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider data instances about patients: Can we certainly derive the diagnostic rule:
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
CPSC 422 Review Of Probability Theory.
Probability.
Uncertainty Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 13.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Uncertainty Management for Intelligent Systems : for SEP502 June 2006 김 진형 KAIST
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
Uncertainty Chapter 13.
Uncertainty Chapter 13.
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes Feb 28 and March 13-15, 2012.
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider:  x Symptom(x, Toothache)  Disease(x, Cavity). The problem is that this.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 15and20, 2012.
Probability Calculations Matt Huenerfauth CSE-391: Artificial Intelligence University of Pennsylvania April 2005.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Probability and naïve Bayes Classifier Louis Oliphant cs540 section 2 Fall 2005.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 5, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1.
Methods in Computational Linguistics II Queens College Lecture 3: Counting More Things.
Introduction to Artificial Intelligence CS 438 Spring 2008 Today –AIMA, Ch. 13 –Reasoning with Uncertainty Tuesday –AIMA, Ch. 14.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Outline [AIMA Ch 13] 1 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty & Probability CIS 391 – Introduction to Artificial Intelligence AIMA, Chapter 13 Many slides adapted from CMSC 421 (U. Maryland) by Bonnie.
Anifuddin Azis UNCERTAINTY. 2 Introduction The world is not a well-defined place. There is uncertainty in the facts we know: What’s the temperature? Imprecise.
Computer Science cpsc322, Lecture 25
Chapter 4 Probability.
Chapter 10: Using Uncertain Knowledge
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Uncertainty.
Probability and Information
Uncertainty in Environments
Representing Uncertainty
Probability and Information
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Presentation transcript:

Methods in Computational Linguistics II Queens College Lecture 2: Counting Things

Overview Role of probability and statistics in computational linguistics Basics of Probability nltk Frequency Distribution –How would this be implemented. 1

Role of probability in CL Empirical evaluation of linguistic hypotheses Data analysis Modeling communicative phenomena “Computational Linguistics” and “Natural Language Processing” 2

What is a probability? A degree of belief in a proposition. The likelihood of an event occurring. Probabilities range between 0 and 1. The probabilities of all mutually exclusive events sum to 1. 3

Random Variables A discrete random variable is a function that –takes discrete values from a countable domain and –maps them to a number between 0 and 1 –Example: Weather is a discrete (propositional) random variable that has domain. sunny is an abbreviation for Weather = sunny P(Weather=sunny)=0.72, P(Weather=rain)=0.1, etc. Can be written: P(sunny)=0.72, P(rain)=0.1, etc. Domain values must be exhaustive and mutually exclusive Other types of random variables: –Boolean random variable has the domain, e.g., Cavity (special case of discrete random variable) –Continuous random variable as the domain of real numbers, e.g., Tem 4

Propositions Elementary proposition constructed by assignment of a value to a random variable: –e.g., Weather = sunny, Cavity = false (abbreviated as  cavity) Complex propositions formed from elementary propositions & standard logical connectives –e.g., Weather = sunny  Cavity = false 5

Atomic Events Atomic event: –A complete specification of the state of the world about which the agent is uncertain –E.g., if the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events: Cavity = false  Toothache = false Cavity = false  Toothache = true Cavity = true  Toothache = false Cavity = true  Toothache = true Atomic events are mutually exclusive and exhaustive 6

Events and the Universe The universe consists of atomic events An event is a set of atomic events P: event  [0,1] Axioms of Probability –P(true) = 1 = P(U) –P(false) = 0 = P(  ) –P(A  B) = P(A) + P(B) – P(A  B) 7 B U A  B A

Some Axioms of Probability 0  P(A)  1 P(true) = 1 = P(U) P(false) = 0 = P(Ø) P( A  B ) = P(A) + P(B) – P(A  B) 8

Prior Probability Prior (unconditional) probability –corresponds to belief prior to arrival of any (new) evidence –P(sunny)=0.72, P(rain)=0.1, etc. Probability distribution gives values for all possible assignments: –Vector notation: Weather is one of –P(Weather) = –Sums to 1 over the domain 9

Joint Probability Probability assignment to all combinations of values of random variables The sum of the entries in this table has to be 1 Every question about a domain can be answered by the joint distribution Probability of a proposition is the sum of the probabilities of atomic events in which it holds –P(cavity) = 0.1 [add elements of cavity row] –P(toothache) = 0.05 [add elements of toothache column] 10 Toothache  Toothache Cavity  Cavity

Joint Probability Table How could we calculate P(A)? –Add up P(A  B) and P(A  ¬B). Same for P(B). How about P(A  B)? –Two options… –We can read P(A  B) from chart and find P(A) and P(B). P(A  B)=P(A)+P(B)-P(A  B) –Or just add up the proper three cells of the table. 11 B¬ B A ¬ A Each cell contains a ‘joint’ probability of both occurring. P(A  B)

Conditional Probability P(cavity)=0.1 and P(cavity  toothache)=0.04 are both prior (unconditional) probabilities Once the agent has new evidence concerning a previously unknown random variable, e.g., toothache, we can specify a posterior (conditional) probability –e.g., P(cavity | toothache) P(A | B) = P(A  B) / P(B) [prob of A w/ U limited to B] P(cavity | toothache) = 0.04 / 0.05 = Toothache  Toothache Cavity  Cavity A U A  B = Cavity = true = Toothache = true B

Review of Notation 13 What do these notations mean? A P( A ) P(  A ) P( A  B ) P( A  B ) P( A | B ) H P(H = h) Boolean Random Variable Unconditional Probability. The notation P(A) is a shortcut for P(A=true). Probability of A or B: P(A) + P(B) – P(A  B) Probability of A given that we know B is true. Non-Boolean Random Variable Probability H has some value Joint Probability. Probability of A and B together.

Product Rule P(A  B) = P(A|B) P(B) P(A|B) = P(A  B) P(B) So, if we can find two of these values someplace (in a chart, from a word problem), then we can calculate the third one. 14

Using the Product Rule When there’s a fire, there’s a 99% chance that the alarm will go off. P( A | F ) On any given day, the chance of a fire starting in your house is 1 in P( F ) What’s the chance of there being a fire and your alarm going off tomorrow? P( A  F ) = P( A | F ) * P( F ) 15

Conditioning Sometimes we call the 2 nd form of the product rule the “conditioning rule” because we can use it to calculate a conditional probability from a joint probability and an unconditional one. 16 P(A|B) = P(A  B) P(B)

Conditioning Problem Out of the 1 million words in some corpus, we know that 9100 of those words are “to” being used as a PREPOSITION. P( PREP  “to” ) Further, we know that 2.53% of all the words that appear in the whole corpus are the word “to”. P( “to” ) If we are told that some particular word in a sentence is “to” but we need to guess what part of speech it is, what is the probability the word is a PREPOSITION? What is P( PREP | “to” ) ? Just calculate: P(PREP|“to”) = P(PREP  “to”) / P(“to”) 17

Marginalizing What if we are told only joint probabilities about a variable H=h, is there a way to calculate an unconditional probability of H=h? Yes, when we’re told the joint probabilities involving every single value of the other variable… 18

Marginalizing Problem We have an AI weather forecasting program. We tell it the following information about this weekend… We want it to tell us the chance of rain. Probability that there will be rain and lightning is P( rain=true  lightning=true ) = 0.23 Probability that there will be rain and no lightening is P( rain=true  lightning=false ) = 0.14 What’s the probability that there will be rain? P(rain=true) ? Lightning is only ever true or false. P(rain=true) = =

Chain Rule 20 Is there a way to calculate a really big joint probability if we know lots of different conditional probabilities? P(f 1  f 2  f 3  f 4  …  f n-1  f n ) = P(f 1 ) * P(f 2 | f 1 ) * P(f 3 | f 1  f 2 ) * P(f 4 | f 1  f 2  f 3 ) *... P(f n | f 1  f 2  f 3  f 4 ...  f n-1 ) You can derive this using repeated substitution of the “Product Rule.” P(A  B) = P(A|B) P(B)

Chain Rule Problem If we have a white ball, the probability it is a baseball is P( baseball | white  ball ) If we have a ball, the probability it is white is P(white | ball) The probability we have a ball is P(ball) So, what’s the probability we have a white ball that is a baseball? P(white  ball  baseball) = 0.76 * 0.35 *

Bayes’ Rule Bayes’ Rule relates conditional probability distributions: P(h | e) = P(e | h) * P(h) P(e) or with additional conditioning information: P(h | e  k) = P(e | h  k) * P(h | k) P(e | k)

Bayes Rule Problem The probability I think that my cup of coffee tastes good is P(G) =.80 I add Equal to my coffee 60% of the time. P(E) =.60 I think when coffee has Equal in it, it tastes good 50% of the time. P(G|E) =.50 If I sip my coffee, and it tastes good, what are the odds that it has Equal in it? P(E|G) = P(G|E) * P(E) / P(G)

Bayes’ Rule P(disease | symptom) = P(symptom | disease)  P(disease) P(symptom) Assess diagnostic probability from causal probability: –P(Cause|Effect) = P(Effect|Cause)  P(Cause) P(Effect) Prior, Likelihood, Posterior

Bayes Example Imagine –disease = BirdFlu, symptom = coughing –P(disease | symptom) is different in BirdFlu-indicated country vs. USA –P(symptom | disease) should be the same It is more useful to learn P(symptom | disease) –What about the denominator: P(symptom)? How do we determine this? Use conditioning (next slide). Skip this detail, Spring 2007

Conditioning Idea: Use conditional probabilities instead of joint probabilities P(A) = P(A  B) + P(A   B) = P(A | B)  P(B) + P(A |  B)  P(  B) Example: P(symptom) = P( symptom | disease )  P(disease) + P( symptom |  disease )  P(  disease) More generally: P(Y) =  z P(Y|z)  P(z) Marginalization and conditioning are useful rules for derivations involving probability expressions.

Independence A and B are independent iff –P(A  B) = P(A)  P(B) –P(A | B) = P(A) –P(B | A) = P(B) Independence is essential for efficient probabilistic reasoning 32 entries reduced to 12; for n independent biased coins, O(2 n ) →O(n) Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? Cavity Toothache Xray Weather decomposes into Cavity Toothache Xray Weather P(T, X, C, W) = P(T, X, C)  P(W)

Conditional Independence A and B are conditionally independent given C iff –P(A | B, C) = P(A | C) –P(B | A, C) = P(B | C) –P(A  B | C) = P(A | C)  P(B | C) Toothache (T), Spot in Xray (X), Cavity (C) –None of these propositions are independent of one other –But: T and X are conditionally independent given C

Frequency Distribution Count up the number of occurrences of each member of a set of items. This counting can be used to calculate the probability of seeing any word. 29

nltk.FreqDist Let’s look at some code. Feel free to code along. 30

Next Time Counting *Some* Things Conditional Frequency Distribution Conditional structuring Word tokenization N-gram modeling with FreqDist and ConditionalFreqDist. 31