CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 17 Jim Martin.

Slides:



Advertisements
Similar presentations
Naïve-Bayes Classifiers Business Intelligence for Managers.
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 20 Jim Martin.
Support Vector Machines and Margins
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Planning under Uncertainty
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Speech Recognition Training Continuous Density HMMs Lecture Based on:
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 21 Jim Martin.
Learning from Observations Chapter 18 Section 1 – 4.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 12 Jim Martin.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 9 Jim Martin.
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 18 Jim Martin.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Doug Downey, adapted from Bryan Pardo,Northwestern University
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 19 Jim Martin.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Classification Techniques: Bayesian Classification
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining and Decision Support
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
Chapter 18 Section 1 – 3 Learning from Observations.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CS Fall 2015 (Shavlik©), Midterm Topics
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining Lecture 11.
CSCI 5582 Artificial Intelligence
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Classification Techniques: Bayesian Classification
Hidden Markov Models Part 2: Algorithms
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
CS 416 Artificial Intelligence
CSCI 5582 Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 17 Jim Martin

CSCI 5582 Fall 2006 Today 10/31 HMM Training (EM) Break Machine Learning

CSCI 5582 Fall 2006 Urns and Balls Π Urn 1: 0.9; Urn 2: 0.1 A B Urn 1Urn 2 Urn Urn Urn 1Urn 2 Red Blue0.30.6

CSCI 5582 Fall 2006 Urns and Balls Let’s assume the input (observables) is Blue Blue Red (BBR) Since both urns contain red and blue balls any path through this machine could produce this output Urn 1Urn

CSCI 5582 Fall 2006 Urns and Balls 1 1 1(0.9*0.3)*(0.6*0.3)*(0.6*0.7)= (0.9*0.3)*(0.6*0.3)*(0.4*0.4)= (0.9*0.3)*(0.4*0.6)*(0.3*0.7)= (0.9*0.3)*(0.4*0.6)*(0.7*0.4)= (0.1*0.6)*(0.3*0.7)*(0.6*0.7)= (0.1*0.6)*(0.3*0.7)*(0.4*0.4)= (0.1*0.6)*(0.7*0.6)*(0.3*0.7)= (0.1*0.6)*(0.7*0.6)*(0.7*0.4)= Blue Blue Red

CSCI 5582 Fall 2006 Urns and Balls Baum-Welch Re-estimation (EM for HMMs) –What if I told you I lied about the numbers in the model ( π,A,B). –Can I get better numbers just from the input sequence?

CSCI 5582 Fall 2006 Urns and Balls Yup –Just count up and prorate the number of times a given transition was traversed while processing the inputs. –Use that number to re-estimate the transition probability

CSCI 5582 Fall 2006 Urns and Balls But… we don’t know the path the input took, we’re only guessing –So prorate the counts from all the possible paths based on the path probabilities the model gives you But you said the numbers were wrong –Doesn’t matter; use the original numbers then replace the old ones with the new ones.

CSCI 5582 Fall 2006 Urn Example Urn 1Urn Let’s re-estimate the Urn1->Urn2 transition and the Urn1->Urn1 transition (using Blue Blue Red as training data).

CSCI 5582 Fall 2006 Urns and Balls 1 1 1(0.9*0.3)*(0.6*0.3)*(0.6*0.7)= (0.9*0.3)*(0.6*0.3)*(0.4*0.4)= (0.9*0.3)*(0.4*0.6)*(0.3*0.7)= (0.9*0.3)*(0.4*0.6)*(0.7*0.4)= (0.1*0.6)*(0.3*0.7)*(0.6*0.7)= (0.1*0.6)*(0.3*0.7)*(0.4*0.4)= (0.1*0.6)*(0.7*0.6)*(0.3*0.7)= (0.1*0.6)*(0.7*0.6)*(0.7*0.4)= Blue Blue Red

CSCI 5582 Fall 2006 Urns and Balls That’s –(.0077*1)+(.0136*1)+(.0181*1)+(.0020*1) =.0414 Of course, that’s not a probability, it needs to be divided by the probability of leaving Urn 1 total. There’s only one other way out of Urn 1… go from Urn 1 to Urn 1

CSCI 5582 Fall 2006 Urn Example Urn 1Urn Let’s re-estimate the Urn1->Urn1 transition

CSCI 5582 Fall 2006 Urns and Balls 1 1 1(0.9*0.3)*(0.6*0.3)*(0.6*0.7)= (0.9*0.3)*(0.6*0.3)*(0.4*0.4)= (0.9*0.3)*(0.4*0.6)*(0.3*0.7)= (0.9*0.3)*(0.4*0.6)*(0.7*0.4)= (0.1*0.6)*(0.3*0.7)*(0.6*0.7)= (0.1*0.6)*(0.3*0.7)*(0.4*0.4)= (0.1*0.6)*(0.7*0.6)*(0.3*0.7)= (0.1*0.6)*(0.7*0.6)*(0.7*0.4)= Blue Blue Red

CSCI 5582 Fall 2006 Urns and Balls That’s just –(2*.0204)+(1*.0077)+(1*.0052) =.0537 Again not what we need but we’re closer… we just need to normalize using those two numbers.

CSCI 5582 Fall 2006 Urns and Balls The 1->2 transition probability is.0414/( ) = The 1->1 transition probability is.0537/( ) = So in re-estimation the 1->2 transition went from.4 to.435 and the 1->1 transition went from.6 to.565

CSCI 5582 Fall 2006 Urns and Balls As with Problems 1 and 2, you wouldn’t actually compute it this way. The Forward-Backward algorithm re- estimates these numbers in the same dynamic programming way that Viterbi and Forward do.

CSCI 5582 Fall 2006 Speech And… in speech recognition applications you don’t actually guess randomly and then train. You get initial numbers from real data: bigrams from a corpus, and phonetic outputs from a dictionary, etc. Training involves a couple of iterations of Baum-Welch to tune those numbers.

CSCI 5582 Fall 2006 Break Start reading Chapter 18 for next time (Learning) Quiz 2 –I’ll go over it as soon as the CAETE students get in done Quiz 3 –We’re behind schedule. So quiz 3 will be delayed. I’ll update the schedule soon.

CSCI 5582 Fall 2006 Where we are Agents can –Search –Represent stuff –Reason logically –Reason probabilistically Left to do –Learn –Communicate

CSCI 5582 Fall 2006 Connections As we’ll see there’s a strong connection between –Search –Representation –Uncertainty You should view the ML discussion as a natural extension of these previous topics

CSCI 5582 Fall 2006 Connections More specifically –The representation you choose defines the space you search –How you search the space and how much of the space you search introduces uncertainty –That uncertainty is captured with probabilities

CSCI 5582 Fall 2006 Kinds of Learning Supervised Semi-Supervised Unsupervised

CSCI 5582 Fall 2006 What’s to Be Learned? Lots of stuff –Search heuristics –Game evaluation functions –Probability tables –Declarative knowledge (logic sentences) –Classifiers –Category structures –Grammars

CSCI 5582 Fall 2006 Supervised Learning: Induction General case: –Given a set of pairs (x, f(x)) discover the function f. Classifier case: –Given a set of pairs (x, y) where y is a label, discover a function that correctly assigns the correct labels to the x.

CSCI 5582 Fall 2006 Supervised Learning: Induction Simpler Classifier Case: –Given a set of pairs (x, y) where x is an object and y is either a + if x is the right kind of thing or a – if it isn’t. Discover a function that assigns the labels correctly.

CSCI 5582 Fall 2006 Error Analysis: Simple Case CorrectFalse Positive False NegativeCorrect Chosen

CSCI 5582 Fall 2006 Learning as Search Everything is search… –A hypothesis is a guess at a function that can be used to account for the inputs. –A hypothesis space is the space of all possible candidate hypotheses. –Learning is a search through the hypothesis space for a good hypothesis.

CSCI 5582 Fall 2006 Hypothesis Space The hypothesis space is defined by the representation used to capture the function that you are trying to learn. The size of this space is the key to the whole enterprise.

CSCI 5582 Fall 2006 Kinds of Classifiers Tables Nearest neighbors Probabilistic methods Decision trees Decision lists Neural networks Genetic algorithms Kernel methods

CSCI 5582 Fall 2006 What Are These Objects By object, we mean a logical representation. –Normally, simpler representations are used that consist of fixed lists of feature-value pairs –This assumption places a severe restriction on the kind of stuff that can be learned A set of such objects paired with answers, constitutes a training set.

CSCI 5582 Fall 2006 The Simple Approach Take the training data, put it in a table along with the right answers. When you see one of them again retrieve the answer.

CSCI 5582 Fall 2006 Neighbor-Based Approaches Build the table, as in the table-based approach. Provide a distance metric that allows you compute the distance between any pair of objects. When you encounter something not seen before, return as an answer the label on the nearest neighbor.

CSCI 5582 Fall 2006 Naïve-Bayes Approach Argmax P(Label | Object) P(Label | Object) = P(Object | Label)*P(Label) P(Object) Where Object is a feature vector.

CSCI 5582 Fall 2006 Naïve Bayes Ignore the denominator because of the argmax. P(Label) is just the prior for each class. I.e.. The proportion of each class in the training set P(Object|Label) = ??? –The number of times this object was seen in the training data with this label divided by the number of things with that label.

CSCI 5582 Fall 2006 Nope Too sparse, you probably won’t see enough examples to get numbers that work. Answer –Assume the parts of the object are independent given the label, so P(Object|Label) becomes

CSCI 5582 Fall 2006 Naïve Bayes So the final equation is to argmax over all labels

CSCI 5582 Fall 2006 Training Data # F1 (In/Out) F2 (Meat/Veg) F3 (Red/Green /Blue) Label 1 InVegRedYes 2 OutMeatGreenYes 3 InVegRedYes 4 InMeatRedYes 5 InVegRedYes 6 OutMeatGreenYes 7 OutMeatRedNo 8 OutVegGreenNo

CSCI 5582 Fall 2006 Example P(Yes) = ¾, P(No)=1/4 P(F1=In|Yes)= 4/6 P(F1=Out|Yes)=2/6 P(F2=Meat|Yes)=3/6 P(F2=Veg|Yes)=3/6 P(F3=Red|Yes)=4/6 P(F3=Green|Yes)=2/6 P(F1=In|No)= 0 P(F1=Out|No)=1 P(F2=Meat|No)=1/2 P(F2=Veg|No)=1/2 P(F3=Red|No)=1/2 P(F3=Green|No)=1/2

CSCI 5582 Fall 2006 Example In, Meat, Green –First note that you’ve never seen this before –So you can’t use stats on In, Meat, Green since you’ll get a zero for both yes and no.

CSCI 5582 Fall 2006 Example: In, Meat, Green P(Yes|In, Meat,Green)= P(In|Yes)P(Meat|Yes)P(Green|Yes)P(Yes) P(No|In, Meat, Green)= P(In|No)P(Meat|No)P(Green|No)P(No) Remember we’re dumping the denominator since it can’t matter

CSCI 5582 Fall 2006 Naïve Bayes This technique is always worth trying first. –Its easy –Sometimes it works well enough –When it doesn’t, it gives you a baseline to compare more complex methods to

CSCI 5582 Fall 2006 Naïve Bayes This equation should ring some bells…