Ling 570: Day 8 Classification, Mallet 1. Roadmap  Open questions?  Quick review of classification  Feature templates 2.

Slides:



Advertisements
Similar presentations
Machine Learning Homework
Advertisements

Florida International University COP 4770 Introduction of Weka.
University of Sheffield NLP Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell.
Classification Classification Examples
Perceptron Learning Rule
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011.
Classification & Mallet Shallow Processing Techniques for NLP Ling570 November 14, 2011.
Mallet & MaxEnt POS Tagging Shallow Processing Techniques for NLP Ling570 November 16, 2011.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Introduction to Mallet
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202 Fall 2007 Introduction to Classification Greg Grudic.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Three kinds of learning
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
1 Introduction LING 575 Week 1: 1/08/08. Plan for today General information Course plan HMM and n-gram tagger (recap) EM and forward-backward algorithm.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Overview of Search Engines
MAchine Learning for LanguagE Toolkit
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Final review LING572 Fei Xia Week 10: 03/11/
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Appendix: The WEKA Data Mining Software
Text Classification, Active/Interactive learning.
Copyright (c) 2003 David D. Lewis (Spam vs.) Forty Years of Machine Learning for Text Classification David D. Lewis, Ph.D. Independent Consultant Chicago,
Universit at Dortmund, LS VIII
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
1 CSC 321: Data Structures Fall 2013 See online syllabus (also available through BlueLine2): Course goals:  To understand.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
© Lingfeng Mo Classifying Programming Newsgroup Discussions using Text Categorization Algorithms 1/19/ A Study of Text Categorization Classifying.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
An Exercise in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Page 1 Cloud Study: Algorithm Team Mahout Introduction 박성찬 IDS Lab.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
Learning to Detect and Classify Malicious Executables in the Wild by J
Machine Learning Models
Perceptrons Lirong Xia.
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Machine Learning with Weka
CSE 491/891 Lecture 25 (Mahout).
Statistical Learning Introduction to Weka
Perceptrons Lirong Xia.
Machine Learning for Cyber
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

Ling 570: Day 8 Classification, Mallet 1

Roadmap  Open questions?  Quick review of classification  Feature templates 2

Classification Problem Steps  Input processing:  Split data into training/dev/test  Convert data into a feature representation (aka Attribute Value Matrix)  Training  Testing  Evaluation 3

Feature templates  Problem: predict the POS tag distribution of an unknown word  Input: “unfrobulate”  Input: “turduckenly” 4 wordw[-3..-1]w[-2..-1]w[-3..-1]==atew[-3..-1]==nlyw[-2,-1]=tew[-2,-1]=ly unfrobulateatete1010 turduckenlynlyly0101

Feature templates  Problem: predict the POS tag distribution of an unknown word  Input: “unfrobulate”  Input: “turduckenly”  Features might include: 5 wordw[-3..-1]w[-2..-1]w[-3..-1]==atew[-3..-1]==nlyw[-2,-1]=tew[-2,-1]=ly unfrobulateatete1010 turduckenlynlyly0101

Feature templates  Problem: predict the POS tag distribution of an unknown word  Input: “unfrobulate”  Input: “turduckenly”  Features might include:  Last three characters are “ate”  Last two characters are “ly” 6 wordw[-3..-1]w[-2..-1]w[-3..-1]==atew[-3..-1]==nlyw[-2,-1]=tew[-2,-1]=ly unfrobulateatete1010 turduckenlynlyly0101

Feature templates  Problem: predict the POS tag distribution of an unknown word  Input: “unfrobulate”  Input: “turduckenly”  Features might include:  Last three characters are “ate”  Last two characters are “ly”  Feature templates generate features given an input  Template : Last three characters == XXX. 7 wordw[-3..-1]w[-2..-1]w[-3..-1]==atew[-3..-1]==nlyw[-2,-1]=tew[-2,-1]=ly unfrobulateatete1010 turduckenlynlyly0101

Feature templates  Problem: predict the POS tag distribution of an unknown word  Input: “unfrobulate”  Input: “turduckenly”  Features might include:  Last three characters are “ate”  Last two characters are “ly”  Feature templates generate features given an input  Template : Last three characters == XXX.  Plug in XXX to get a binary valued feature.  Templates generate many features 8 wordw[-3..-1]w[-2..-1]w[-3..-1]==atew[-3..-1]==nlyw[-2,-1]=tew[-2,-1]=ly unfrobulateatete1010 turduckenlynlyly0101

Machine learning 9

Classifiers  Wide variety  Differ on several dimensions  Supervision  Learning Function  Input Features 10

Supervision in Classifiers  Supervised:  True label/class of each training instance is provided to the learner at training time  Naïve Bayes, MaxEnt, Decision Trees, Neural nets, etc 11

Supervision in Classifiers  Supervised:  True label/class of each training instance is provided to the learner at training time  Naïve Bayes, MaxEnt, Decision Trees, Neural nets, etc  Unsupervised:  No true labels are provided for examples during training  Clustering: k-means; Min-cut algorithms 12

Supervision in Classifiers  Supervised:  True label/class of each training instance is provided to the learner at training time  Naïve Bayes, MaxEnt, Decision Trees, Neural nets, etc  Unsupervised:  No true labels are provided for examples during training  Clustering: k-means; Min-cut algorithms  Semi-supervised: (bootstrapping)  True labels are provided for only a subset of examples  Co-training, semi-supervised SVM/CRF, etc 13

Inductive Bias  What form of function is learned?  Function that separates members of different classes  Linear separator  Higher order functions  Vornoi diagrams, etc 14

Inductive Bias  What form of function is learned?  Function that separates members of different classes  Linear separator  Higher order functions  Vornoi diagrams, etc  Graphically, decision boundary

Machine Learning Functions  Problem: Can the representation effectively model the class to be learned? 16

Machine Learning Functions  Problem: Can the representation effectively model the class to be learned?  Motivates selection of learning algorithm

Machine Learning Functions  Problem: Can the representation effectively model the class to be learned?  Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! 18

Machine Learning Functions  Problem: Can the representation effectively model the class to be learned?  Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! Rectangular boundaries (e.g. ID trees) TERRIBLE! 19

Machine Learning Functions  Problem: Can the representation effectively model the class to be learned?  Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! Rectangular boundaries (e.g. ID trees) TERRIBLE! Pick the right representation! 20

Machine Learning Features  Inputs:  E.g.words, acoustic measurements, parts-of-speech, syntactic structures, semantic classes,..  Vectors of features:  E.g. word: letters  ‘cat’: L1=c; L2 = a; L3 = t  Parts of syntax trees? 21

Machine Learning Features 22

Machine Learning Toolkits  Many learners, many tools/implementations 23

Machine Learning Toolkits  Many learners, many tools/implementations  Some broad tool sets  weka  Java, lots of classifiers, pedagogically oriented 24

Machine Learning Toolkits  Many learners, many tools/implementations  Some broad tool sets  weka  Java, lots of classifiers, pedagogically oriented  mallet  Java, classifiers, sequence learners  More heavy duty 25

Mallet: intro and data prep 26

Mallet  Machine learning toolkit  Developed at UMass Amherst by Andrew McCallum 27

Mallet  Machine learning toolkit  Developed at UMass Amherst by Andrew McCallum  Java implementation, open source 28

Mallet  Machine learning toolkit  Developed at UMass Amherst by Andrew McCallum  Java implementation, open source  Large collection of machine learning algorithms  Targeted to language processing  Naïve Bayes, MaxEnt, Decision Trees, Winnow, Boosting  Also, clustering, topic models, sequence learners 29

Mallet  Machine learning toolkit  Developed at UMass Amherst by Andrew McCallum  Java implementation, open source  Large collection of machine learning algorithms  Targeted to language processing  Naïve Bayes, MaxEnt, Decision Trees, Winnow, Boosting  Also, clustering, topic models, sequence learners  Widely used, but  Research software: some bugs/gaps; odd documentation 30

Installation  Installed on patas  /NLP_TOOLS/tool_sets/mallet/latest/  Directories:  bin/: script files  src/: java source code  class/: java classes  lib/: jar files  sample-data/: wikipedia docs for languages id, etc 31

Environment  Should be set up on patas  $PATH should include  /NLP_TOOLS/tool_sets/mallet/latest/bin  $CLASSPATH should include  /NLP_TOOLS/tool_sets/mallet/latest/lib/mallet-deps.jar; /NLP_TOOLS/tool_sets/mallet/latest/lib/mallet.jar  Check:  which text2vectors  /NLP_TOOLS/tool_sets/mallet/latest/bin 32

Mallet Commands  Mallet command types:  Data preparation  Data/model inspection  Training  Classification 33

Mallet Commands  Mallet command types:  Data preparation  Data/model inspection  Training  Classification  Command line scripts  Shell scripts  Set up java environment  Invoke java programs  --help lists command line parameters for scripts 34

Mallet Data  Mallet data instances:  Instance_id label f1 v1 f2 v2 …..  Stored in internal binary format: “vectors”  Binary format used by learners, decoders  Need to convert text files to binary format 35

Data Preparation  Built-in data importers  One class per directory, one instance per file  bin/mallet import-dir --input IF --output OF  Label is directory name  (Also text2vectors)  One instance per line  bin/mallet import-file --input IF --output OF  Line: instance label text …..  (Also csv2vectors)  Create binary representation of text feature counts 36

Data Preparation  bin/mallet import-svmlight --input IF --output OF  Allows import of user constructed feature value pairs 37

Data Preparation  bin/mallet import-svmlight --input IF --output OF  Allows import of user constructed feature value pairs  Format:  label f1:v1 f2:v2 …..fn:vn  Features can strings or indexes  (Also bin/svmlight2vectors) 38

Data Preparation  bin/mallet import-svmlight --input IF --output OF  Allows import of user constructed feature value pairs  Format:  label f1:v1 f2:v2 …..fn:vn  Features can strings or indexes  (Also bin/svmlight2vectors)  If building test data separately from original  bin/mallet import-svmlight --input IF --output OF  --use-pipe-from previously_built.vectors 39

Data Preparation  bin/mallet import-svmlight --input IF --output OF  Allows import of user constructed feature value pairs  Format:  label f1:v1 f2:v2 …..fn:vn  Features can strings or indexes  (Also bin/svmlight2vectors)  If building test data separately from original  bin/mallet import-svmlight --input IF --output OF  --use-pipe-from previously_built.vectors  Ensures consistent feature representation  Note: can’t mix svmlight models with others 40

Accessing Binary Formats  vectors2info --input IF 41

Accessing Binary Formats  vectors2info --input IF  -- print-labels TRUE  Prints list of category labels in data set 42

Accessing Binary Formats  vectors2info --input IF  -- print-labels TRUE  Prints list of category labels in data set  -- print-matrix sic  prints all features and values by string and number  Returns original text feature-value list  Possibly out of order 43

Accessing Binary Formats  vectors2info --input IF  -- print-labels TRUE  Prints list of category labels in data set  -- print-matrix sic  prints all features and values by string and number  Returns original text feature-value list  Possibly out of order  vectors2vectors --input IF --training-file TNF --testing-file TTF --training-portion pct 44

Accessing Binary Formats  vectors2info --input IF  -- print-labels TRUE  Prints list of category labels in data set  -- print-matrix sic  prints all features and values by string and number  Returns original text feature-value list  Possibly out of order  vectors2vectors --input IF --training-file TNF --testing-file TTF --training-portion pct  Creates random training/test splits in some ratio 45

Building & Accessing Models  bin/mallet train-classifier --trainer classifiertype - - training-portion output-classifier OF  Builds classifier model  Can also store model, produce scores, confusion matrix, etc 46

Building & Accessing Models  bin/mallet train-classifier --input vector_data_file -- trainer classifiertype --training-portion output- classifier OF  Builds classifier model  Can also store model, produce scores, confusion matrix, etc  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc 47

Building & Accessing Models  bin/mallet train-classifier --trainer classifiertype - - training-portion output-classifier OF  Builds classifier model  Can also store model, produce scores, confusion matrix, etc  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc  --report: train:accuracy, test:f1:en 48

Building & Accessing Models  bin/mallet train-classifier --trainer classifiertype - - training-portion output-classifier OF  Builds classifier model  Can also store model, produce scores, confusion matrix, etc  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc  --report: train:accuracy, test:f1:en  Can also use pre-split training & testing files  e.g. output of vectors2vectors  --training-file, --testing-file 49

Building & Accessing Models  bin/mallet train-classifier --trainer classifiertype - -training- portion output-classifier OF  Builds classifier model  Can also store model, produce scores, confusion matrix, etc  --trainer: MaxEnt, DecisionTree, NaiveBayes, etc  --report: train:accuracy, test:f1:en  Confusion Matrix, row=true, column=predicted accuracy=1.0  label 0 1 |total  0 de 1. |1  1 en. 1 |1  Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0  Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0 50

Accessing Classifiers  classifier2info --classifier maxent.model  Prints out contents of model file 51

Accessing Classifiers  classifier2info --classifier maxent.model  Prints out contents of model file  FEATURES FOR CLASS en   book  the  i

Mallet: testing 53

Testing  Use new data to test a previously built classifier  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model 54

Testing  Use new data to test a previously built classifier  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model  Also instance file, directories: classify-file, classify-dir 55

Testing  Use new data to test a previously built classifier  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model  Also instance file, directories: classify-file, classify-dir  Prints class,score matrix 56

Testing  Use new data to test a previously built classifier  bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model  Also instance file, directories: classify-file, classify-dir  Prints class,score matrix  Inst_id class1 score1 class2 score2  array:0en0.995de  array:1en0.970de  array:2en0.064de0.935  array:3en0.094de

General Use  bin/mallet import-svmlight --input svmltrain.vectors.txt --output svmltrain.vectors  Builds binary representation from feature:value pairs 58

General Use  bin/mallet import-svmlight --input svmltrain.vectors.txt --output svmltrain.vectors  Builds binary representation from feature:value pairs  bin/mallet train-classifier --input svmltrain.vectors – trainer MaxEnt --output-classifier svml.model  Trains MaxEnt classifier and stores model 59

General Use  bin/mallet import-svmlight --input svmltrain.vectors.txt --output svmltrain.vectors  Builds binary representation from feature:value pairs  bin/mallet train-classifier --input svmltrain.vectors – trainer MaxEnt --output-classifier svml.model  Trains MaxEnt classifier and stores model  bin/mallet classify-svmlight --input svmltest.vectors.txt --output - --classifier svml.model  Tests on the new data 60

Other Information  Website:  Download and documentation (such as it is) 

Other Information  Website:  Download and documentation (such as it is)   API tutorial: 

Text Categorization  Task:  Given a document, assign to one of finite set of classes  What are the classes?  What are the features? 63

Text 1  Several hundred protesters, some wearing goggles and gas masks, marched past authorities in a downtown street Sunday, hours after riot police forced Occupy Portland demonstrators out of a pair of weeks-old encampments in nearby parks.  Police moved in shortly before noon and drove protesters into the street after dozens remained in the camp in defiance city officials. Mayor Sam Adams had ordered that the camp shut down Saturday at midnight, citing unhealthy conditions and the encampment’s attraction of drug users and thieves.  Anti-Wall Street protesters and their supporters flooded a city park area in Portland early Sunday in defiance of an eviction order, and authorities elsewhere stepped up pressure against the demonstrators, arresting nearly two dozen. (Nov. 13)  More than 50 protesters were arrested in the police action, but officers did not use tear gas, rubber bullets or other so-called non-lethal weapons, police said. Washington Post, online 11/13/

Text 2  George Washington coach Mike Lonergan looked at the stat sheet, tried to muster a smile then clicked off the reasons why the Colonials lost to No. 24 California on Sunday night.  A piercing 21-0 run by the Golden Bears at the end of the first half was at the top of the list.  Not even a second straight 20-point effort from Tony Taylor was enough to dig George Washington out of the early hole, and the Colonials spent the rest of the night in a futile game of catch-up.  “I’ve never really been involved with a run quite like that,” Lonergan said after Cal’s win over George Washington. “I tried calling a couple timeouts. It was very disappointing that we just never really got our composure back the rest of that half. To end it that way and not even score any points, that was basically the game right there.” Washington Post, online 11/13/

Test 3  ‘Jersey Boys’ at the National Theatre  By Jane Horwitz, Sunday, November 13, 5:29 PMJane Horwitz  “Jersey Boys” is irresistible, and the touring company now at the National Theatre gets it almost entirely right.  This Broadway hit (it has been running since fall 2005 and has played Washington before as well) rises well above the so-called jukebox show genre. Subtitled “The Story of Frankie Valli & the Four Seasons,” the musical tells a tale that transcends show business gossip to become a close character study of four talented but very different blue- collar guys from New Jersey — who just happen to have sung some of the best close-harmony rock/pop tunes of the late 1950s, the 1960s and into the 1970s. Washington Post, online 11/13/

 What categories?  What features? 67

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... 68

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... 69

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem 70

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? 71

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? What are the categories? 72

Example: Coreference Queen Elizabeth set about transforming her husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment... Can be viewed as a classification problem What are the inputs? What are the categories? What features would be useful? 73

Example: NER  Named Entity tagging:  John visited New York last Friday   [person John] visited [location New York] [time last Friday]  As a classification problem  John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B Friday/TIME-I  Input? Features? Classes? 74