Machine Learning in Practice Lecture 11 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Plan for the Day Announcements Finish up Evaluation Quiz Assignment 5 assigned We’ll cover concepts needed for this in this lecture and the next two lectures Finish up Evaluation Start talking about text
Finishing Up Evaluation
Charts and Curves Primarily designed for binary decision tasks Multi-class classification is a combination of binary decision tasks You would evaluate this separately for each binary decision
Lift Factor A lift chart shows for a specific model how the number of successes increases with the percentage of the sample you select If your model is good at sorting instances according to probability of success, you can safely chop off those instances at the bottom end without losing many successes The better your model, the more aggressively you can chop (higher lift factor)
Lift Factor Related to cost Lift factor = Success Rate of Subset/ Success Rate of Whole Set Let’s say normally .1% of people you mail a survey to will respond If you can use machine learning to pick out a subset of people for whom the probability of a response is .4%, this is a lift factor of 4 .4/.1 = 4
Lift Factor * Overall success rate = .59
Lift Factor * Overall success rate = .59 * If we sort by Prob Return, then more 1s appear towards the top than towards the bottom
Lift Factor * Overall success rate = .59 * Success rate above threshold = 1 * Lift factor = 1/.59 = 1.69
Lift Factor * Overall success rate = .59 * Success rate above threshold = .88 * Lift factor = .88/.59 = 1.48
Lift Factor * Overall success rate = .59 * Success rate above threshold = .62 * Lift factor = .62/.59 = 1.06
ROC Curves: Receiver Operating Characteristic ** You want to be to the left of the diagonal. That means you win more than you lose. As you adjust your threshold, you get more true positives, but you get more False positives too.
Drawing the Curve If your algorithm gives you a ranking by probability of success Sort your instances based on probability assigned to the correct prediction Move your threshold down the list The assumption at every point is that your model assumes everything with a probability above the threshold was classified correctly That’s not true, which is why you have some true positives and some false positives At every position, you get a false positive and true positive rate Place a dot on the graph for this pair
Drawing the Curve Classify using Naïve Bayes Visualize Threshold Curve
Drawing the Curve
Comparing Two Classifiers Based on Precision/False Alarm trade offs
Cost Curve: Each Line Assumes Fixed Cost Matrix You want your line to be close to the bottom (resample data to manipulate this probability)
Cost Curves Pc[+] Pc[-] pc[+] depends on composition of data and cost matrix. Looks very similar to previous image but means something subtly different. Pc[+] Pc[-]
Drawing the Curve
Starting Text
Basic Idea Represent text as a vector where each position corresponds to a term This is called the “bag of words” approach Cheese Cows Eat Hamsters Make Seeds Cows make cheese. 110010 Hamsters eat seeds. 001101
Basic Idea Represent text as a vector where each position corresponds to a term This is called the “bag of words” approach But same representation for “Cheese makes cows.”! Cheese Cows Eat Hamsters Make Seeds Cows make cheese. 110010 Hamsters eat seeds. 001101
Looking Ahead Next week we’ll learn how to use TagHelper tools Will make it easier to extract text features This week we will learn how to use Weka’s text processing functionality Offers some different functionality you may need eventually We will also learn what features are useful to extract
Need to strip out punctuation!
Using String-to-Word-Vector * Click here
Using String-to-Word-Vector
Using String-to-Word-Vector * Scroll down and select StringToWordVector
Using String-to-Word-Vector * Now click here
Using String-to-Word-Vector
Using String-to-Word-Vector
Using String-to-Word-Vector * Click on Apply
Using String-to-Word-Vector
What are good features for text categorization? What distinguishes Questions and Statements?
What are good features for text categorization? What distinguishes Questions and Statements? Not all questions end in a question mark.
What are good features for text categorization? What distinguishes Questions and Statements? I versus you is not a reliable predictor
What are good features for text categorization? What distinguishes Questions and Statements? Not all WH words occur in questions
What can’t you conclude from “bag of words” representations? Causality: “X caused Y” versus “Y caused X” Roles and Mood: “Which person ate the food that I prepared this morning and drives the big car in front of my cat” versus “The person, which prepared food that my cat and I ate this morning, drives in front of the big car.” Who’s driving, who’s eating, and who’s preparing food?
X’ Structure A complete phrase X’’ X’ X Pre-head Mod Spec Post-head Mod Head Sometimes called “a maximal projection” The black cat in the hat
Basic Anatomy: Layers of Linguistic Analysis Phonology: The sound structure of language Basic sounds, syllables, rhythm, intonation Morphology: The building blocks of words Inflection: tense, number, gender Derivation: building words from other words, transforming part of speech Syntax: Structural and functional relationships between spans of text within a sentence Phrase and clause structure Semantics: Literal meaning, propositional content Pragmatics: Non-literal meaning, language use, language as action, social aspects of language (tone, politeness) Discourse Analysis: Language in practice, relationships between sentences, interaction structures, discourse markers, anaphora and ellipsis
Wrap-Up We examined evaluation methods that allow us to explore how the performance of algorithms change with the composition of the data We looked at a very simple vector based representation of text We started to think about the linguistic structure of language