PROBABILITY David Kauchak CS451 – Fall 2013. Admin Midterm Grading Assignment 6 No office hours tomorrow from 10-11am (though I’ll be around most of the.

Slides:



Advertisements
Similar presentations
1. Probability of an Outcome 2. Experimental Probability 3. Fundamental Properties of Probabilities 4. Addition Principle 5. Inclusion-Exclusion Principle.
Advertisements

Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
Economics 105: Statistics Any questions? Go over GH2 Student Information Sheet.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
Section 5.1 Constructing Models of Random Behavior.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Lec 18 Nov 12 Probability – definitions and simulation.
4.3 Random Variables. Quantifying data Given a sample space, we are often interested in some numerical property of the outcomes. For example, if our collection.
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Math 310 Section 7.1 Probability. What is a probability? Def. In common usage, the word "probability" is used to mean the chance that a particular event.
September SOME BASIC NOTIONS OF PROBABILITY THEORY Universita’ di Venezia 29 Settembre 2003.
Random Variables A Random Variable assigns a numerical value to all possible outcomes of a random experiment We do not consider the actual events but we.
Information Theory and Security
Econ 140 Lecture 51 Bivariate Populations Lecture 5.
Chapter 4 Probability Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Special Topics. Definitions Random (not haphazard): A phenomenon or trial is said to be random if individual outcomes are uncertain but the long-term.
More probability CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.
Chapter 9 Introducing Probability - A bridge from Descriptive Statistics to Inferential Statistics.
2. Mathematical Foundations
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Chapter 1 Basics of Probability.
Two Sisters Reunited After 18 Years in Checkout Counter Deer Kill 130,000 Prostitutes Appeal to Pope Man Struck by Lightning Faces Battery Charge Milk.
Probability Calculations Matt Huenerfauth CSE-391: Artificial Intelligence University of Pennsylvania April 2005.
1 Chapters 6-8. UNIT 2 VOCABULARY – Chap 6 2 ( 2) THE NOTATION “P” REPRESENTS THE TRUE PROBABILITY OF AN EVENT HAPPENING, ACCORDING TO AN IDEAL DISTRIBUTION.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
 Basic Concepts in Probability  Basic Probability Rules  Connecting Probability to Sampling.
PROBABILITY David Kauchak CS159 – Spring Admin  Posted some links in Monday’s lecture for regular expressions  Logging in remotely  ssh to vpn.cs.pomona.edu.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Copyright © 2010 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
PRIORS David Kauchak CS159 Fall Admin Assignment 7 due Friday at 5pm Project proposals due Thursday at 11:59pm Grading update.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Probability Section 7.1. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability.
NLP. Introduction to NLP Very important for language processing Example in speech recognition: –“recognize speech” vs “wreck a nice beach” Example in.
Class 2 Probability Theory Discrete Random Variables Expectations.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 14 From Randomness to Probability.
Today’s Topics Graded HW1 in Moodle (Testbeds used for grading are linked to class home page) HW2 due (but can still use 5 late days) at 11:55pm tonight.
8-3: Probability and Probability Distributions English Casbarro Unit 8.
Fall 2002Biostat Probability Probability - meaning 1) classical 2) frequentist 3) subjective (personal) Sample space, events Mutually exclusive,
Probability. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability of it raining.
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Stat 13, Thu 4/19/ Hand in HW2! 1. Resistance. 2. n-1 in sample sd formula, and parameters and statistics. 3. Probability basic terminology. 4. Probability.
Week 21 Rules of Probability for all Corollary: The probability of the union of any two events A and B is Proof: … If then, Proof:
Chapter 14 From Randomness to Probability. Dealing with Random Phenomena A random phenomenon: if we know what outcomes could happen, but not which particular.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat.
AP Statistics From Randomness to Probability Chapter 14.
Conditional Probability 423/what-is-your-favorite-data-analysis-cartoon 1.
Dealing with Random Phenomena
Probability David Kauchak CS158 – Fall 2013.
Chapter 4 Probability.
What is Probability? Quantification of uncertainty.
Quick Review Probability Theory
Multiple Choice Practice
Advanced Artificial Intelligence
Honors Statistics From Randomness to Probability
Tesla slashed Model S and X staff in recent layoffs
From Randomness to Probability
Sets, Combinatorics, Probability, and Number Theory
Presentation transcript:

PROBABILITY David Kauchak CS451 – Fall 2013

Admin Midterm Grading Assignment 6 No office hours tomorrow from 10-11am (though I’ll be around most of the rest of the day)

Basic Probability Theory: terminology An experiment has a set of potential outcomes, e.g., throw a dice, “look at” another sentence The sample space of an experiment is the set of all possible outcomes, e.g., {1, 2, 3, 4, 5, 6} For machine learning the sample spaces can very large

Basic Probability Theory: terminology An event is a subset of the sample space Dice rolls  {2}  {3, 6}  even = {2, 4, 6}  odd = {1, 3, 5} Machine learning  A particular feature has a particular values  An example, i.e. a particular setting of features values  label = Chardonnay

Events We’re interested in probabilities of events  p({2})  p(label=survived)  p(label=Chardonnay)  p(parasitic gap)  p(“Pinot” occurred)

Random variables A random variable is a mapping from the sample space to a number (think events) It represents all the possible values of something we want to measure in an experiment For example, random variable, X, could be the number of heads for a coin Really for notational convenience, since the event space can sometimes be irregular spaceHHHHHTHTHHTTTHHTHTTTHTTT X

Random variables We’re interested in probability of the different values of a random variable The definition of probabilities over all of the possible values of a random variable defines a probability distribution spaceHHHHHTHTHHTTTHHTHTTTHTTT X XP(X) 3P(X=3) = 1/8 2P(X=2) = 3/8 1P(X=1) = 3/8 0P(X=0) = 1/8

Probability distribution To be explicit  A probability distribution assigns probability values to all possible values of a random variable  These values must be >= 0 and <= 1  These values must sum to 1 for all possible values of the random variable XP(X) 3P(X=3) = 1/2 2P(X=2) = 1/2 1P(X=1) = 1/2 0P(X=0) = 1/2 XP(X) 3P(X=3) = -1 2P(X=2) = 2 1P(X=1) = 0 0P(X=0) = 0

Unconditional/prior probability Simplest form of probability is  P(X) Prior probability: without any additional information, what is the probability  What is the probability of a heads?  What is the probability of surviving the titanic?  What is the probability of a wine review containing the word “banana”?  What is the probability of a passenger on the titanic being under 21 years old?  …

Joint distribution We can also talk about probability distributions over multiple variables P(X,Y)  probability of X and Y  a distribution over the cross product of possible values MLPassP(MLPass) true0.89 false0.11 EngPassP(EngPass) true0.92 false0.08 MLPass AND EngPassP(MLPass, EngPass) true, true.88 true, false.01 false, true.04 false, false.07

Joint distribution MLPass AND EngPassP(MLPass, EngPass) true, true.88 true, false.01 false, true.04 false, false.07 What is P(ENGPass)? Still a probability distribution  all values between 0 and 1, inclusive  all values sum to 1 All questions/probabilities of the two variables can be calculate from the joint distribution

Joint distribution Still a probability distribution  all values between 0 and 1, inclusive  all values sum to 1 All questions/probabilities of the two variables can be calculate from the joint distribution MLPass AND EngPassP(MLPass, EngPass) true, true.88 true, false.01 false, true.04 false, false.07 How did you figure that out? 0.92

Joint distribution MLPass AND EngPassP(MLPass, EngPass) true, true.88 true, false.01 false, true.04 false, false.07

Conditional probability As we learn more information, we can update our probability distribution P(X|Y) models this (read “probability of X given Y”)  What is the probability of a heads given that both sides of the coin are heads?  What is the probability the document is about Chardonnay, given that it contains the word “Pinot”?  What is the probability of the word “noir” given that the sentence also contains the word “pinot”? Notice that it is still a distribution over the values of X

Conditional probability x y In terms of pior and joint distributions, what is the conditional probability distribution?

Conditional probability Given that y has happened, in what proportion of those events does x also happen x y

Conditional probability Given that y has happened, what proportion of those events does x also happen What is: p(MLPass=true | EngPass=false)? MLPass AND EngPassP(MLPass, EngPass) true, true.88 true, false.01 false, true.04 false, false.07 x y

Conditional probability What is: p(MLPass=true | EngPass=false)? MLPass AND EngPassP(MLPass, EngPass) true, true.88 true, false.01 false, true.04 false, false.07 = Notice this is very different than p(MLPass=true) = 0.89

Both are distributions over X Conditional probability Unconditional/prior probability MLPassP(MLPass) true0.89 false0.11 MLPassP(MLPass|EngPas s=false) true0.125 false0.875

A note about notation When talking about a particular assignment, you should technically write p(X=x), etc. However, when it’s clear, we’ll often shorten it Also, we may also say P(X) or p(x) to generically mean any particular value, i.e. P(X=x) = 0.125

Properties of probabilities P(A or B) = ?

Properties of probabilities P(A or B) = P(A) + P(B) - P(A,B)

Properties of probabilities P(  E) = 1– P(E) More generally:  Given events E = e 1, e 2, …, e n P(E1, E2) ≤ P(E1)

Chain rule (aka product rule) We can view calculating the probability of X AND Y occurring as two steps: 1.Y occurs with some probability P(Y) 2.Then, X occurs, given that Y has occurred or you can just trust the math…

Chain rule

Applications of the chain rule We saw that we could calculate the individual prior probabilities using the joint distribution What if we don’t have the joint distribution, but do have conditional probability information:  P(Y)  P(X|Y) This is called “summing over” or “marginalizing out” a variable

Bayes’ rule (theorem)

Bayes’ rule Allows us to talk about P(Y|X) rather than P(X|Y) Sometimes this can be more intuitive Why?

Bayes’ rule p(disease | symptoms)  For everyone who had those symptoms, how many had the disease?  p(symptoms|disease) For everyone that had the disease, how many had this symptom? p( label| features )  For all examples that had those features, how many had that label?  p(features | label) For all the examples with that label, how many had this feature  p(cause | effect) vs. p(effect | cause)

Gaps I just won’t put these away. These, I just won’t put away. V direct object I just won’t put away. gap filler

Gaps What did you put away? gap The socks that I put away. gap

Gaps Whose socks did you fold and put away? gap Whose socks did you fold ? gap Whose socks did you put away? gap

Parasitic gaps These I’ll put away without folding. gap These without folding. These I’ll put away. gap

Parasitic gaps These I’ll put away without folding. gap 1. Cannot exist by themselves (parasitic) These I’ll put my pants away without folding. gap 2. They’re optional These I’ll put away without folding them. gap

Parasitic gaps ugs-parasitic-gap/

Frequency of parasitic gaps Parasitic gaps occur on average in 1/100,000 sentences Problem: Maggie Louise Gal (aka “ML” Gal) has developed a machine learning approach to identify parasitic gaps. If a sentence has a parasitic gap, it correctly identifies it 95% of the time. If it doesn’t, it will incorrectly say it does with probability Suppose we run it on a sentence and the algorithm says it is a parasitic gap, what is the probability it actually is?

Prob of parasitic gaps Maggie Louise Gal (aka “ML” Gal) has developed a machine learning approach to identify parasitic gaps. If a sentence has a parasitic gap, it correctly identifies it 95% of the time. If it doesn’t, it will incorrectly say it does with probability Suppose we run it on a sentence and the algorithm says it is a parasitic gap, what is the probability it actually is? G = gap T = test positive What question do we want to ask?

Prob of parasitic gaps G = gap T = test positive Maggie Louise Gal (aka “ML” Gal) has developed a machine learning approach to identify parasitic gaps. If a sentence has a parasitic gap, it correctly identifies it 95% of the time. If it doesn’t, it will incorrectly say it does with probability Suppose we run it on a sentence and the algorithm says it is a parasitic gap, what is the probability it actually is?

Prob of parasitic gaps G = gap T = test positive Maggie Louise Gal (aka “ML” Gal) has developed a machine learning approach to identify parasitic gaps. If a sentence has a parasitic gap, it correctly identifies it 95% of the time. If it doesn’t, it will incorrectly say it does with probability Suppose we run it on a sentence and the algorithm says it is a parasitic gap, what is the probability it actually is?

Prob of parasitic gaps G = gap T = test positive Maggie Louise Gal (aka “ML” Gal) has developed a machine learning approach to identify parasitic gaps. If a sentence has a parasitic gap, it correctly identifies it 95% of the time. If it doesn’t, it will incorrectly say it does with probability Suppose we run it on a sentence and the algorithm says it is a parasitic gap, what is the probability it actually is?