Machine Learning in Practice Lecture 11

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Methods in Computational Linguistics II Queens College Lecture 1: Introduction.

Meaning and Language Part 1.

SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.

TagHelper & SIDE Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Presented By Wanchen Lu 2/25/2013

Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.

Evaluation – next steps

TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

Early Behaviours and What to Look For EARLY READING BEHAVIOURS…

Classification Techniques: Bayesian Classification

Introduction to Computational Linguistics

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Evaluating Classification Performance

***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.

Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

Machine Learning in Practice Lecture 25 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Automatic Writing Evaluation

Alan P. Reynolds*, David W. Corne and Michael J. Chantler

AP CSP: Data Assumptions & Good and Bad Data Visualizations

Introduction to Machine Learning and Text Mining

Deep learning David Kauchak CS158 – Fall 2016.

Advanced data mining with TagHelper and Weka

Introduction to Machine Learning

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Evaluating Results of Learning

Erasmus University Rotterdam

Natural Language Processing (NLP)

Performance Measures II

Support Vector Machines (SVM)

WHAT IS READING COMPREHENSION?

Machine Learning in Practice Lecture 12

Pattern Recognition and Image Analysis

CSSE463: Image Recognition Day 14

Machine Learning in Practice Lecture 26

Thought and Language Chapter 11.

Advanced Artificial Intelligence Classification

Machine Learning in Practice Lecture 23

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 17

Developing Listening strategies

Machine Learning in Practice Lecture 6

Text Mining CSC 576: Data Mining.

Machine Learning in Practice Lecture 27

Lecture 10 – Introduction to Weka

Natural Language Processing (NLP)

Word embeddings (continued)

Word representations David Kauchak CS158 – Fall 2016.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CS249: Neural Language Model

Natural Language Processing (NLP)

Presentation transcript:

Machine Learning in Practice Lecture 11 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for the Day Announcements Finish up Evaluation Quiz Assignment 5 assigned We’ll cover concepts needed for this in this lecture and the next two lectures Finish up Evaluation Start talking about text

Finishing Up Evaluation

Charts and Curves Primarily designed for binary decision tasks Multi-class classification is a combination of binary decision tasks You would evaluate this separately for each binary decision

Lift Factor A lift chart shows for a specific model how the number of successes increases with the percentage of the sample you select If your model is good at sorting instances according to probability of success, you can safely chop off those instances at the bottom end without losing many successes The better your model, the more aggressively you can chop (higher lift factor)

Lift Factor Related to cost Lift factor = Success Rate of Subset/ Success Rate of Whole Set Let’s say normally .1% of people you mail a survey to will respond If you can use machine learning to pick out a subset of people for whom the probability of a response is .4%, this is a lift factor of 4 .4/.1 = 4

Lift Factor * Overall success rate = .59

Lift Factor * Overall success rate = .59 * If we sort by Prob Return, then more 1s appear towards the top than towards the bottom

Lift Factor * Overall success rate = .59 * Success rate above threshold = 1 * Lift factor = 1/.59 = 1.69

Lift Factor * Overall success rate = .59 * Success rate above threshold = .88 * Lift factor = .88/.59 = 1.48

Lift Factor * Overall success rate = .59 * Success rate above threshold = .62 * Lift factor = .62/.59 = 1.06

ROC Curves: Receiver Operating Characteristic ** You want to be to the left of the diagonal. That means you win more than you lose. As you adjust your threshold, you get more true positives, but you get more False positives too.

Drawing the Curve If your algorithm gives you a ranking by probability of success Sort your instances based on probability assigned to the correct prediction Move your threshold down the list The assumption at every point is that your model assumes everything with a probability above the threshold was classified correctly That’s not true, which is why you have some true positives and some false positives At every position, you get a false positive and true positive rate Place a dot on the graph for this pair

Drawing the Curve Classify using Naïve Bayes Visualize Threshold Curve

Drawing the Curve

Comparing Two Classifiers Based on Precision/False Alarm trade offs

Cost Curve: Each Line Assumes Fixed Cost Matrix You want your line to be close to the bottom (resample data to manipulate this probability)

Cost Curves Pc[+] Pc[-] pc[+] depends on composition of data and cost matrix. Looks very similar to previous image but means something subtly different. Pc[+] Pc[-]

Drawing the Curve

Starting Text

Basic Idea Represent text as a vector where each position corresponds to a term This is called the “bag of words” approach Cheese Cows Eat Hamsters Make Seeds Cows make cheese. 110010 Hamsters eat seeds. 001101

Basic Idea Represent text as a vector where each position corresponds to a term This is called the “bag of words” approach But same representation for “Cheese makes cows.”! Cheese Cows Eat Hamsters Make Seeds Cows make cheese. 110010 Hamsters eat seeds. 001101

Looking Ahead Next week we’ll learn how to use TagHelper tools Will make it easier to extract text features This week we will learn how to use Weka’s text processing functionality Offers some different functionality you may need eventually We will also learn what features are useful to extract

Need to strip out punctuation!

Using String-to-Word-Vector * Click here

Using String-to-Word-Vector

Using String-to-Word-Vector * Scroll down and select StringToWordVector

Using String-to-Word-Vector * Now click here

Using String-to-Word-Vector

Using String-to-Word-Vector

Using String-to-Word-Vector * Click on Apply

Using String-to-Word-Vector

What are good features for text categorization? What distinguishes Questions and Statements?

What are good features for text categorization? What distinguishes Questions and Statements? Not all questions end in a question mark.

What are good features for text categorization? What distinguishes Questions and Statements? I versus you is not a reliable predictor

What are good features for text categorization? What distinguishes Questions and Statements? Not all WH words occur in questions

What can’t you conclude from “bag of words” representations? Causality: “X caused Y” versus “Y caused X” Roles and Mood: “Which person ate the food that I prepared this morning and drives the big car in front of my cat” versus “The person, which prepared food that my cat and I ate this morning, drives in front of the big car.” Who’s driving, who’s eating, and who’s preparing food?

X’ Structure A complete phrase X’’ X’ X Pre-head Mod Spec Post-head Mod Head Sometimes called “a maximal projection” The black cat in the hat

Basic Anatomy: Layers of Linguistic Analysis Phonology: The sound structure of language Basic sounds, syllables, rhythm, intonation Morphology: The building blocks of words Inflection: tense, number, gender Derivation: building words from other words, transforming part of speech Syntax: Structural and functional relationships between spans of text within a sentence Phrase and clause structure Semantics: Literal meaning, propositional content Pragmatics: Non-literal meaning, language use, language as action, social aspects of language (tone, politeness) Discourse Analysis: Language in practice, relationships between sentences, interaction structures, discourse markers, anaphora and ellipsis

Wrap-Up We examined evaluation methods that allow us to explore how the performance of algorithms change with the composition of the data We looked at a very simple vector based representation of text We started to think about the linguistic structure of language