INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.

Slides:



Advertisements
Similar presentations
Introduction to Probability
Advertisements

Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen, but we don’t know which particular outcome.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
0 0 Review Probability Axioms –Non-negativity P(A)≥0 –Additivity P(A U B) =P(A)+ P(B), if A and B are disjoint. –Normalization P(Ω)=1 Independence of two.
1 Chapter 3 Probability 3.1 Terminology 3.2 Assign Probability 3.3 Compound Events 3.4 Conditional Probability 3.5 Rules of Computing Probabilities 3.6.
SI485i : NLP Day 2 Probability Review. Introduction to Probability Experiment (trial) Repeatable procedure with well-defined possible outcomes Outcome.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Copyright © Cengage Learning. All rights reserved.
Lecture 1, Part 2 Albert Gatt Corpora and statistical methods.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
Chapter 4 Using Probability and Probability Distributions
Parameter Estimation using likelihood functions Tutorial #1
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Chap 4-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 4 Probability.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Chapter 6 Probability.
“PROBABILITY” Some important terms Event: An event is one or more of the possible outcomes of an activity. When we toss a coin there are two possibilities,
Copyright © Cengage Learning. All rights reserved. Elementary Probability Theory 5.
From Randomness to Probability
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Dependent and Independent Events. Events are said to be independent if the occurrence of one event has no effect on the occurrence of another. For example,
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Basic Concepts of Probability Coach Bridges NOTES.
November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
The Cartoon Guide to Statistics
Copyright © 2010 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
1 CHAPTERS 14 AND 15 (Intro Stats – 3 edition) PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
Copyright © 2010 Pearson Education, Inc. Chapter 6 Probability.
Statistics for Management and Economics Chapter 6
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Copyright © Cengage Learning. All rights reserved. Elementary Probability Theory 5.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 14 From Randomness to Probability.
Copyright © 2010 Pearson Education, Inc. Slide
From Randomness to Probability Chapter 14. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,
1 CHAPTER 7 PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
Probability You’ll probably like it!. Probability Definitions Probability assignment Complement, union, intersection of events Conditional probability.
Introduction to Probability 1. What is the “chance” that sales will decrease if the price of the product is increase? 2. How likely that the Thai GDP will.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Fall 2002Biostat Probability Probability - meaning 1) classical 2) frequentist 3) subjective (personal) Sample space, events Mutually exclusive,
Measuring chance Probabilities FETP India. Competency to be gained from this lecture Apply probabilities to field epidemiology.
BIA 2610 – Statistical Methods
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
§2 Frequency and probability 2.1The definitions and properties of frequency and properties.
STATISTICS 6.0 Conditional Probabilities “Conditional Probabilities”
Copyright © 2010 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
Probability. Today we will look at… 1.Quick Recap from last week 2.Terminology relating to events and outcomes 3.Use of sample spaces when dealing with.
AP Statistics From Randomness to Probability Chapter 14.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and Statistics.
Virtual University of Pakistan
From Randomness to Probability
Dealing with Random Phenomena
Chapter 3 Probability.
Natural Language Processing
From Randomness to Probability
From Randomness to Probability
Natural Language Processing
Introduction to Probability
CSCI 5832 Natural Language Processing
Great Theoretical Ideas In Computer Science
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and.
Honors Statistics From Randomness to Probability
Chapter 2.3 Counting Sample Points Combination In many problems we are interested in the number of ways of selecting r objects from n without regard to.
From Randomness to Probability
A random experiment gives rise to possible outcomes, but any particular outcome is uncertain – “random”. For example, tossing a coin… we know H or T will.
Presentation transcript:

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder

MACHINE LEARNING The next series of lectures will be on methods to learn how to solve problems from data (MACHINE LEARNING) Most methods of this type presuppose some knowledge of probability and statistics

NLE3 WHY PROBABILITY THEORY Suppose you’ve already texted the characters “There in a minu” You’d like your mobile phone to guess the most likely completion of “minu” rather than MINUET or MINUS or MINUSCULE In other words, you’d like your mobile phone to know that given what you’ve texted so far, MINUTE is more likely than those other alternatives PROBABILITY THEORY was developed to formalize the notion of LIKELIHOOD

NLE4 TRIALS (or EXPERIMENTS) A trial is anything that may have a certain OUTCOME (on which you can make a bet, say) Classic examples: – Throwing a die (outcomes: 1, 2, 3, 4, 5, 6) – A horse race (outcomes?) In NLE: – Looking at the next word in a text – Having your NL system perform a certain task

NLE5 (ELEMENTARY) OUTCOMES The results of an experiment: – In a coin toss, HEAD or TAILS – In a race, the names of the horses involved Or if we are only interested in whether a particular horse wins: WIN and LOSE In NLE: – When looking at the next word: the possible words – In the case of a system: RIGHT or WRONG

NLE6 EVENTS Often, we want to talk about the likelihood of getting one of several outcomes: – E.g., with dice, the likelihood of getting an even number, or a number greater than 3 An EVENT is a set of possible OUTCOMES (possibly just a single elementary outcome): – E1 = {4} – E2 = {2,4,6} – E3 = {3,4,5,6}

NLE7 SAMPLE SPACES The SAMPLE SPACE is the set of all possible outcomes: – For the case of a dice, sample space S = {1,2,3,4,5,6} – For the case of a coin toss, sample space S = {H,T} For the texting case: – Texting a word is a TRIAL, – The word texted is an OUTCOME, – EVENTS which result from this trial are: texting the word “minute”, texting a word that begins with “minu”, etc – The set of all possible words is the SAMPLE SPACE (NB: the sample space may be very large, or even infinite)

NLE8 PROBABILITY FUNCTIONS The likelihood of an event is indicated using a PROBABILITY FUNCTION P The probability of an event E is specified by a function P(E), with values between 0 and 1 – P(E) = 1: the event is CERTAIN to occur – P(E) = 0: the event is certain NOT to occur Example: in the case of die casting, – P(E’ = ‘getting as a result a number between 1 and 6’) = P({1,2,3,4,5,6}) = 1 – P(E’’ = ‘getting as a result 7’) = 0 The sum of the probabilities of all elementary outcomes = 1

EXERCISES: ANALYTIC PROBABILITIES When we know the entire sample space, and we can assume that all outcomes are equally likely, we can compute the probability of events such as – P(1) – P(EVEN) – P(>3)

NLE10 PROBABILITIES AND RELATIVE FREQUENCIES In the case of a die, we know all of the possible outcomes ahead of time, and we also know a priori what the likelihood of a certain outcome is. But in many other situations in which we would like to estimate the likelihood of an event, this is not the case. For example, suppose that we would like to bet on horses rather than on dice. Harry is a race horse: we do not know ahead of time how likely it is for Harry to win. The best we can do is to ESTIMATE P(WIN) using the RELATIVE FREQUENCY of the outcome `Harry wins’ Suppose Harry raced 100 times, and won 20 races overall. Then – P(WIN) = WIN/TOTAL NUMBER OF RACES =.2 – P(LOSE) =.8 The use of probabilities we are interested in (estimate the probability of certain sequences of words) is of this type

LOADED DICE The assumption that all outcomes have equal probability is very strong In most real situations (and with most real dice) probabilities of the outcomes are slightly different – P(1) = 1/4, P(2) =.15, P(3) =.15, P(4) =.15, P(5) =.15, P(6) =.15

NLE12 JOINT PROBABILITIES We are often interested in the probability of TWO events happening: – When throwing a die TWICE, the probability of getting a 6 both times – The probability of finding a sequence of two words: `the’ and `car’ We use the notation A&B to indicate the conjunction of two events, and P(A&B) to indicate the probability of such conjunction – Because events are SETS, the probability is often also written as We use the same notation with WORDS: P(‘the’ & ‘car’)

JOINT PROBABILITIES: TOSSING A DIE TWICE Sample space = {,,,,,,, ….. …..,,, ……..}

EXERCISES: PROBABILITY OF TWO EVENTS – P(first toss=1 & second toss=3) – P(first toss=even & second toss=even)

NLE15 OTHER COMBINATIONS OF EVENTS A  B: either event A or event B happens – P(A  B) = P(A) + P(B) – P(A  B) – NB: If A  B = ∅, P(A  B) = P(A) + P(B)  A: event A does not happen – P(  A) = 1 –P(A)

EXERCISES: ADDITION RULE P(A  B) = P(A) + P(B) – P(A  B) P( first toss = 1  second toss = 1) P(sum of two tosses = 6  sum of two tosses = 3)

NLE17 PRIOR PROBABILITY VS CONDITIONAL PROBABILITY The prior probability P(WIN) is the likelihood of an event occurring irrespective of anything else we know about the world Often however we DO have additional information, that can help us making a more informed guess about the likelihood of a certain event E.g, take again the case of Harry the horse. Suppose we know that it was raining during 30 of the races that Harry raced, and that Harry won 15 of these races. Intuitively, the probability of Harry winning when it’s raining is.5 - HIGHER than the probability of Harry winning overall – We can make a more informed guess We indicate the probability of an event A happening given that we know that event B happened as well – the CONDITIONAL PROBABILITY of A given B – as P(A|B)

NLE18 Conditional probability ALL RACES RACES WHEN IT RAINS RACES WON BY HARRY

NLE19 Conditional probability Conditional probability is DEFINED as follows: Intuitively, you RESTRICT the range of trials in consideration to those in which event B took place, as well (most easily seen when thinking in terms of relative frequency)

NLE20 EXAMPLE Consider the case of Harry the horse again: Where: – P(WIN&RAIN) = 15/100 =.15 – P(RAIN) = 30/100 =.30 This gives: (in agreement with our intuitions)

EXERCISES P(sum of two dice = 3) P(sum of two dice = 3 | first die = 1)

NLE22 THE MULTIPLICATION RULE The definition of conditional probability can be rewritten as: – P(A&B) = P(A|B) P(B) – P(A&B) = P(B|A) P(A)

NLE23 INDEPENDENCE Additional information does not always help. For example, knowing the color of a dice usually doesn’t help us predicting the result of a throw; knowing the name of the jockey’s girlfriend doesn’t help predicting how well the horse he rides will do in a race; etc. When this is the case, we say that two events are INDEPENDENT The notion of independence is defined in probability theory using the definition of conditional probability Consider again the basic form of the chain rule: – P(A&B) = P(A|B) P(B) We say that two events are INDEPENDENT if: – P(A&B) = P(A) P(B) – P(A|B) = P(A)

EXERCISES P(H & H) P(sum of two tosses greater than 6 & first toss = 1)

NLE25 THE CHAIN RULE The multiplication rule generalizes to the so-called CHAIN RULE: – P(w 1,w 2,w 3,….w n ) = P(w 1 ) P(w 2 |w 1 ) P(w 3 |w 1,w 2 ) …. P(w n |w 1 …. w n-1 ) The chain rule plays an important role in statistical NLE: – P(the big dog) = P(the) P(big|the) P(dog|the big)

NLE26 Bayes’ theorem Suppose you’ve developed an IR system for searching a big database (say, the Web) Given any search, about 1/100,000 documents is relevant (REL) Suppose your system is pretty good: – P(YES|REL) =.95 – P(YES|  REL) =.005 What is the probability that the document is relevant, when the system says YES? – P(REL|YES)?

NLE27 Bayes’ Theorem Bayes’ Theorem is a pretty trivial consequence of the definition of conditional probability, but it is very useful in that it allows us to use one conditional probability to compute another We already saw that the definition of conditional probability can be rewritten equivalently as: – P(A&B) = P(A|B) P(B) – P(A&B) = P(B|A) P(A) If we equate the two left sides, we get Bayes’ theorem

NLE 28 Application of Bayes’ theorem

STATISTICS

READINGS