Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

University of Sheffield NLP Module 4: Machine Learning.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Text Categorization Karl Rees Ling 580 April 2, 2001.
Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19,
Ling 570: Day 8 Classification, Mallet 1. Roadmap  Open questions?  Quick review of classification  Feature templates 2.
Classification & Mallet Shallow Processing Techniques for NLP Ling570 November 14, 2011.
Clustering Shallow Processing Techniques for NLP Ling570 November 30, 2011.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Alias Detection in Link Data Sets Master’s Thesis Paul Hsiung.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Introduction to Machine Learning Approach Lecture 5.
Machine learning Image source:
Information Theory, Classification & Decision Trees Ling 572 Advanced Statistical Methods in NLP January 5, 2012.
Introduction to machine learning
Machine learning Image source:
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Final review LING572 Fei Xia Week 10: 03/11/
Chapter 4 Pattern Recognition Concepts continued.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Naïve Bayes Advanced Statistical Methods in NLP Ling 572 January 17, 2012.
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Text Classification, Active/Interactive learning.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
Ling 570 Day 17: Named Entity Recognition Chunking.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Universit at Dortmund, LS VIII
Machine Learning.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
Logical Structure Recovery in Scholarly Articles with Rich Document Features Minh-Thang Luong, Thuy Dung Nguyen and Min-Yen Kan.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Mebi 591D – BHI Kaggle Class Baselines kaggleclass.weebly.com/
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
Instructor: Pedro Domingos
An Exercise in Machine Learning
Disambiguating inventor names using deep neural networks Steve Petrie T’Mir Julius.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Data Mining Introduction to Classification using Linear Classifiers
Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.
Perceptrons Lirong Xia.
Features & Decision regions
Machine Learning Week 1.
Text Categorization Assigning documents to a fixed set of categories
Introduction to Sentiment Analysis
Practice Project Overview
Perceptrons Lirong Xia.
Presentation transcript:

Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011

Roadmap Classification problems: Definition Solutions Case studies Based on slides by F. Xia

Example: Text Classification Task: Given an article, predict its category Categories:

Example: Text Classification Task: Given an article, predict its category Categories: Sports, entertainment, news, weather,.. Spam/not spam

Example: Text Classification Task: Given an article, predict its category Categories: Sports, entertainment, news, weather,.. Spam/not spam What kind of information is useful for this task?

Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C

Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C Instance: (x,y) x: thing to be labeled/classified y: label/class

Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C Instance: (x,y) x: thing to be labeled/classified y: label/class Data: set of instances labeled data: y is known unlabeled data: y is unknown

Classification Task Task: C is a finite set of labels (aka categories, classes) Given x, determine its category y in C Instance: (x,y) x: thing to be labeled/classified y: label/class Data: set of instances labeled data: y is known unlabeled data: y is unknown Training data, test data

Text Classification Examples Spam filtering Call routing Sentiment classification Positive/Negative Score: 1 to 5

POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem?

POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem? Categories: N, V, Adj,… What information is useful?

POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem? Categories: N, V, Adj,… What information is useful? How do POS tagging, text classification differ?

POS Tagging Task: Given a sentence, predict tag of each word Is this a classification problem? Categories: N, V, Adj,… What information is useful? How do POS tagging, text classification differ? Sequence labeling problem

Word Segmentation Task: Given a string, break into words Categories:

Word Segmentation Task: Given a string, break into words Categories: B(reak), NB (no break) B(eginning), I(nside), E(nd) e.g. c1 c2 || c3 c4 c5

Word Segmentation Task: Given a string, break into words Categories: B(reak), NB (no break) B(eginning), I(nside), E(nd) e.g. c1 c2 || c3 c4 c5 c1/NB c2/B c3/NB c4/NB c5/B c1/B c2/E c3/B c4/I c5/E What type of task?

Word Segmentation Task: Given a string, break into words Categories: B(reak), NB (no break) B(eginning), I(nside), E(nd) e.g. c1 c2 || c3 c4 c5 c1/NB c2/B c3/NB c4/NB c5/B c1/B c2/E c3/B c4/I c5/E What type of task? Also sequence labeling

Solving a Classification Problem

Two Stages Training: Learner: training data  classifier

Two Stages Training: Learner: training data  classifier Testing: Decoder: test data + classifier  classification output

Two Stages Training: Learner: training data  classifier Testing: Decoder: test data + classifier  classification output Also Preprocessing Postprocessing Evaluation

Representing Input Potentially infinite values to represent

Representing Input Potentially infinite values to represent Represent input as feature vector x=

Representing Input Potentially infinite values to represent Represent input as feature vector x= What are good features?

Example I Spam Tagging Classes: Spam/Not Spam Input: messages

Doc1 Western Union Money Transfer One Bishops Square Akpakpa E1 6AO, Cotonou Benin Republic Website: info/selectCountry.asP Phone: Attention Beneficiary, This to inform you that the federal ministry of finance Benin Republic has started releasing scam victim compensation fund mandated by United Nation Organization through our office. I am contacting you because our agent have sent you the first payment of $5,000 for your compensation funds total amount of $ USD (Five hundred thousand united state dollar) We need your urgent response so that we shall release your payment information to you. You can call our office hot line for urgent attention( )

Doc2 Hello! my dear. How are you today and your family? I hope all is good, kindly pay Attention and understand my aim of communicating you today through this Letter, My names is Saif al-Islam al-Gaddafi the Son of former Libyan President. i was born on 1972 in Tripoli Libya,By Gaddafi’s second wive. I want you to help me clear this fund in your name which i deposited in Europe please i would like this money to be transferred into your account before they find it. the amount is ,000 million GBP British Pounds sterling through a

Doc3 from: Apply for loan at 3% interest Rate..Contact us for details.

Doc4 from: REMINDER: If you have not received a PIN number to vote in the elections and have not already contacted us, please contact either Drago Radev or Priscilla Rasmussen right away. Everyone who has not received a pin but who has contacted us already will get a new pin over the weekend. Anyone who still wants to join for 2011 needs to do this by Monday (November 7th) in order to be eligible to vote. And, if you do have your PIN number and have not voted yet, remember every vote

What are good features?

Possible Features Words!

Possible Features Words! Feature for each word

Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.*

Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.* Errors: Spelling, grammar

Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.* Errors: Spelling, grammar Images

Possible Features Words! Feature for each word Binary: presence/absence Integer: occurrence count Particular word types: money/sex/: [Vv].*gr.* Errors: Spelling, grammar Images Header info

Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1 x 2 =Doc2.. x n =Doc4

Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc1Spam x 2 =Doc2Spam.. x n =Doc4NotSpam

Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc Spam x 2 =Doc2Spam.. x n =Doc4NotSpam

Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc Spam x 2 =Doc Spam.. x n =Doc4NotSpam

Representing Input: Attribute-Value Matrix f 1 Currency f 2 Country …f m Date Label x 1 = Doc Spam x 2 =Doc Spam.. x n =Doc40002NotSpam

Classifier Result of training on input data With or without class labels

Classifier Result of training on input data With or without class labels Formal perspective: f(x) =y: x is input; y in C

Classifier Result of training on input data With or without class labels Formal perspective: f(x) =y: x is input; y in C More generally: f(x)={(c i,score i )}, where x is input, c i in C, score i is score for category assignment

Testing Input: Test data: e.g. AVM Classifier Output:

Testing Input: Test data: e.g. AVM Classifier Output: Decision matrix Can assign highest scoring class to each input

Testing Input: Test data: e.g. AVM Classifier Output: Decision matrix Can assign highest scoring class to each input x1x1 x2x2 x3x3 …. c1c … c2c … c3c … c 4 … …

Testing Input: Test data: e.g. AVM Classifier Output: Decision matrix Can assign highest scoring class to each input x1x1 x2x2 x3x3 …. c1c … c2c … c3c … c 4 … … x1x1 x2x2 x3x3 c4c4 c2c2 c3c3

Evaluation Confusion matrix: Precision: TP/(TP+FP) Gold System TP FP - FN TN

Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) Gold System TP FP - FN TN

Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) F-score: 2PR/(P+R) Gold System TP FP - FN TN

Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) F-score: 2PR/(P+R) Accuracy = (TP+TN)/(TP+TN+FP+TN) Gold System TP FP - FN TN

Evaluation Confusion matrix: Precision: TP/(TP+FP) Recall: TP/(TP+FN) F-score: 2PR/(P+R) Accuracy = (TP+TN)/(TP+TN+FP+TN) Why F-score? Accuracy? Gold System TP FP - FN TN

Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN)=1/6 F-score: 2PR/(P+R)=2*1/5*1/6/(1/5+1/6)=2/11 Accuracy = 91% Gold System

Evaluation Example Confusion matrix: Precision: Gold System

Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN) Gold System

Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN)=1/6 F-score: 2PR/(P+R)= Gold System

Evaluation Example Confusion matrix: Precision: 1/(1+4)=1/5 Recall: TP/(TP+FN)=1/6 F-score: 2PR/(P+R)=2*1/5*1/6/(1/5+1/6)=2/11 Accuracy = 91% Gold System

Classification Problem Steps Input processing: Split data into training/dev/test

Classification Problem Steps Input processing: Split data into training/dev/test Convert data into an Attribute-Value Matrix Identify candidate features Perform feature selection Create AVM representation

Classification Problem Steps Input processing: Split data into training/dev/test Convert data into an Attribute-Value Matrix Identify candidate features Perform feature selection Create AVM representation Training

Classification Problem Steps Input processing: Split data into training/dev/test Convert data into an Attribute-Value Matrix Identify candidate features Perform feature selection Create AVM representation Training Testing Evaluation

Classification Algorithms Will be covered in detail in 572 Nearest Neighbor Naïve Bayes Decision Trees Neural Networks Maximum Entropy

Feature Design & Representation What sorts of information do we want to encode?

Feature Design & Representation What sorts of information do we want to encode? words, frequencies, ngrams, morphology, sentence length, etc Issue

Feature Design & Representation What sorts of information do we want to encode? words, frequencies, ngrams, morphology, sentence length, etc Issue: Learning algorithms work on numbers Many work only on binary values (0/1)

Feature Design & Representation What sorts of information do we want to encode? words, frequencies, ngrams, morphology, sentence length, etc Issue: Learning algorithms work on numbers Many work only on binary values (0/1) Others work on any real-valued input How can we represent different information Numerically? Binary?

Representation Words/tags/ngrams/etc

Representation Words/tags/ngrams/etc One feature per item:

Representation Words/tags/ngrams/etc One feature per item: Binary: presence/absence Real: counts Binarizing numeric features:

Representation Words/tags/ngrams/etc One feature per item: Binary: presence/absence Real: counts Binarizing numeric features: Single threshold Multiple thresholds Binning: 1 binary feature/bin

Feature Template Example: Prevword (or w -1 )

Feature Template Example: Prevword (or w -1 ) Template corresponds to many features e.g. time flies like an arrow

Feature Template Example: Prevword (or w -1 ) Template corresponds to many features e.g. time flies like an arrow w -1 = w -1 =time w -1 =flies w -1 =like w -1 =an…

Feature Template Example: Prevword (or w -1 ) Template corresponds to many features e.g. time flies like an arrow w -1 = w -1 =time w -1 =flies w -1 =like w -1 =an… Shorthand for: w -1 = 0 or w -1 =time 1

AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 x2 x3

AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2 x3

AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2TimefliesTime flieslikeV x3

AVM Example Time flies like an arrow w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2TimefliesTime flieslikeV x3flieslikeflies likeanP

AVM Example Time flies like an arrow Note: this is a compact form of the true sparse vector w -1 =w 0 or 1, for w in |V| w -1 w0w0 w -1 w 0 w +1 label x1 Time fliesN x2TimefliesTime flieslikeV x3flieslikeflies likeanP

Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem

Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I

Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I Input?

Example: NER Named Entity tagging: John visited New York last Friday  [person John] visited [location New York] [time last Friday] As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I Input? Categories?

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs?

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? What are the categories?

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? What are the categories? What features would be useful?

HW#7 Viterbi! Implement Viterbi algorithm Use a standard, precomputed, presmoothed model Made available after HW#6 handed in Testing & Evaluation Convert output format to enable comparison w/gold Compare to gold-standard and produce a score