Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Classifying text NLTK Chapter 6. Chapter 6 topics How can we identify particular features of language data that are salient for classifying it? How can.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Presenters: Arni, Sanjana.  Subtask of Information Extraction  Identify known entity names – person, places, organization etc  Identify the boundaries.
Classifying text NLTK Chapter 6. Chapter 6 topics How can we identify particular features of language data that are salient for classifying it? How can.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
Natural Language Processing Lecture 8—9/24/2013 Jim Martin.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1 I256: Applied Natural Language Processing Marti Hearst Sept 27, 2006.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
MAchine Learning for LanguagE Toolkit
Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.
ELN – Natural Language Processing Giuseppe Attardi
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit.
April 2005CSA2050:NLTK1 CSA2050: Introduction to Computational Linguistics NLTK.
Text Classification, Active/Interactive learning.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
Albert Gatt Corpora and Statistical Methods Lecture 10.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter.
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
HW7 Extracting Arguments for % Ang Sun March 25, 2012.
Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22.
Lecture 10 NLTK POS Tagging Part 3 Topics Taggers Rule Based Taggers Probabilistic Taggers Transformation Based Taggers - Brill Supervised learning Readings:
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Natural language processing tools Lê Đức Trọng 1.
Information extraction 2 Day 37 LING Computational Linguistics Harry Howard Tulane University.
Lecture 14 Relation Extraction
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
CSA3202 Human Language Technology HMMs for POS Tagging.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Part-of-Speech Tagging & Sequence Labeling Hongning Wang
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
Digital Text and Data Processing
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
CIS 700 Advanced Machine Learning Structured Machine Learning:   Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.
Text Analytics Giuseppe Attardi Università di Pisa
CSCI 5832 Natural Language Processing
CSCE 590 Web Scraping - NLTK
CSCI 5832 Natural Language Processing
Lecture 13 Information Extraction
CSCI 5832 Natural Language Processing
Chunk Parsing CS1573: AI Application Development, Spring 2003
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
590 Web Scraping – NLTK IE - II
CSCI 5832 Natural Language Processing
CSCE 590 Web Scraping - NLTK
CSCE 590 Web Scraping – NLTK IE
Knowledge and Information Retrieval
590 Web Scraping – Test 2 Review
Presentation transcript:

Lecture 12 Classifiers Part 2 Topics Classifiers Maxent Classifiers Maximum Entropy Markov Models Information Extraction and chunking intro Readings: Chapter Chapter 6, 7.1 February 25, 2013 CSCE 771 Natural Language Processing

– 2 – CSCE 771 Spring 2013 Overview Last Time Confusion Matrix Brill Demo NLTK Ch 6 - Text ClassificationToday Confusion Matrix Brill Demo NLTK Ch 6 - Text ClassificationReadings NLTK Ch 6

– 3 – CSCE 771 Spring 2013 Evaluation of classifiers again Last time RecallPrecision F value Accuracy

– 4 – CSCE 771 Spring 2013 Reuters Data set documents 118 categories document can be in multiple classes 118 binary classifiers

– 5 – CSCE 771 Spring 2013 Confusion matrix C ij – documents that are really C i that are classified as C j. C ii – documents that are really C i that correctly classified

– 6 – CSCE 771 Spring 2013 Micro averaging vs Macro Averaging Macro Averaging – average performance of individual classifiers (average of averages) Micro averaging sum up all correct and all fp and fn

– 7 – CSCE 771 Spring 2013 Training, Development and Test Sets

– 8 – CSCE 771 Spring 2013 nltk.tag Classes AffixTagger BigramTagger BrillTagger BrillTaggerTrainer DefaultTagger FastBrillTaggerTrainer HiddenMarkovModelTagger HiddenMarkovModelTrainer NgramTagger RegexpTagger TaggerI TrigramTagger UnigramTagger AffixTagger BigramTagger BrillTagger BrillTaggerTrainer DefaultTagger FastBrillTaggerTrainer HiddenMarkovModelTagger HiddenMarkovModelTrainer NgramTagger RegexpTagger TaggerI TrigramTagger UnigramTaggerFunctions batch_pos_tag pos_tag untag batch_pos_tag pos_tag untag

– 9 – CSCE 771 Spring 2013 Module nltk.tag.hmm Source Code for Module nltk.tag.hmm Module nltk.tag.hmmModule nltk.tag.hmm import nltk nltk.tag.hmm.demo()nltk.tag.hmm.demo_pos()nltk.tag.hmm.demo_pos_bw()

– 10 – CSCE 771 Spring 2013 HMM demo import nltk nltk.tag.hmm.demo()nltk.tag.hmm.demo_pos()nltk.tag.hmm.demo_pos_bw()

– 11 – CSCE 771 Spring 2013 Common Suffixes from nltk.corpus import brown suffix_fdist = nltk.FreqDist() for word in brown.words(): word = word.lower() word = word.lower() suffix_fdist.inc(word[-1:]) suffix_fdist.inc(word[-1:]) suffix_fdist.inc(word[-2:]) suffix_fdist.inc(word[-2:]) suffix_fdist.inc(word[-3:]) suffix_fdist.inc(word[-3:]) common_suffixes = suffix_fdist.keys()[:100] print common_suffixes

– 12 – CSCE 771 Spring 2013 rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33] extractor = nltk.RTEFeatureExtractor(rtepair) print extractor.text_words set(['Russia', 'Organisation', 'Shanghai', … print extractor.hyp_words set(['member', 'SCO', 'China']) print extractor.overlap('word') set([ ]) print extractor.overlap('ne') set(['SCO', 'China']) print extractor.hyp_extra('word') set(['member'])

– 13 – CSCE 771 Spring 2013 tagged_sents = list(brown.tagged_sents(categories='news')) random.shuffle(tagged_sents) size = int(len(tagged_sents) * 0.1) train_set, test_set = tagged_sents[size:], tagged_sents[:size] file_ids = brown.fileids(categories='news') size = int(len(file_ids) * 0.1) train_set = brown.tagged_sents(file_ids[size:]) test_set = brown.tagged_sents(file_ids[:size]) train_set = brown.tagged_sents(categories='news') test_set = brown.tagged_sents(categories='fiction') classifier = nltk.NaiveBayesClassifier.train(train_set)

– 14 – CSCE 771 Spring 2013 Traceback (most recent call last): File "C:\Users\mmm\Documents\Courses\771\Python771\ ch06\ch06d.py", line 80, in File "C:\Users\mmm\Documents\Courses\771\Python771\ ch06\ch06d.py", line 80, in classifier = nltk.NaiveBayesClassifier.train(train_set) classifier = nltk.NaiveBayesClassifier.train(train_set) File "C:\Python27\lib\site- packages\nltk\classify\naivebayes.py", line 191, in train File "C:\Python27\lib\site- packages\nltk\classify\naivebayes.py", line 191, in train for featureset, label in labeled_featuresets: for featureset, label in labeled_featuresets: ValueError: too many values to unpack

– 15 – CSCE 771 Spring 2013 from nltk.corpus import brown brown_tagged_sents = brown.tagged_sents(categories='news') size = int(len(brown_tagged_sents) * 0.9) train_sents = brown_tagged_sents[:size] test_sents = brown_tagged_sents[size:] t0 = nltk.DefaultTagger('NN') t1 = nltk.UnigramTagger(train_sents, backoff=t0) t2 = nltk.BigramTagger(train_sents, backoff=t1)

– 16 – CSCE 771 Spring 2013 def tag_list(tagged_sents): return [tag for sent in tagged_sents for (word, tag) in sent] return [tag for sent in tagged_sents for (word, tag) in sent] def apply_tagger(tagger, corpus): return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus] return [tagger.tag(nltk.tag.untag(sent)) for sent in corpus] gold = tag_list(brown.tagged_sents(categories='editorial')) test = tag_list(apply_tagger(t2, brown.tagged_sents(categories='editorial'))) cm = nltk.ConfusionMatrix(gold, test) print cm.pp(sort_by_count=True, show_percents=True, truncate=9)

– 17 – CSCE 771 Spring 2013 | N | | N | | N I A J N V N | | N I A J N V N | | N N T J. S, B P | | N N T J. S, B P | NN | 0.0%. 0.2%. 0.0%. 0.3% 0.0% | NN | 0.0%. 0.2%. 0.0%. 0.3% 0.0% | IN | 0.0% %... | IN | 0.0% %... | AT | | AT | | JJ | 1.7% % 0.0% | JJ | 1.7% % 0.0% |. | |. | | NNS | 1.5% % |, | |, | | VB | 0.9%.. 0.0%.... | VB | 0.9%.. 0.0%.... | NP | 1.0%.. 0.0%.... | NP | 1.0%.. 0.0%.... | (row = reference; col = test)

– 18 – CSCE 771 Spring 2013 Entropy import math def entropy(labels): freqdist = nltk.FreqDist(labels) freqdist = nltk.FreqDist(labels) probs = [freqdist.freq(l) for l in nltk.FreqDist(labels)] probs = [freqdist.freq(l) for l in nltk.FreqDist(labels)] return -sum([p * math.log(p,2) for p in probs]) return -sum([p * math.log(p,2) for p in probs])

– 19 – CSCE 771 Spring 2013 print entropy(['male', 'male', 'male', 'male']) -0.0 print entropy(['male', 'female', 'male', 'male']) print entropy(['female', 'male', 'female', 'male']) 1.0 print entropy(['female', 'female', 'male', 'female']) print entropy(['female', 'female', 'female', 'female']) -0.0

– 20 – CSCE 771 Spring 2013 The Rest of NLTK Chapter Naïve Bayes Classifiers 6.6 Maximum Entropy Classifiers nltk.classify.maxent.BinaryMaxentFeatureEncoding(l abels, mapping, unseen_features=False, alwayson_features=False)nltk.classify.maxent.BinaryMaxentFeatureEncoding(l abels, mapping, unseen_features=False, alwayson_features=False) 6.7 Modeling Linguistic Patterns 6.8 Summary But no more Code?!?

– 21 – CSCE 771 Spring 2013 Maximum Entropy Models (again) features are elements of evidence that connect observations d with categories c f: C X D  R Example feature f(c,d) = { c = LOCATION & w -1 = IN & is Capitalized(w)} An “input-feature” is a property of an unlabeled token. A “joint-feature” is a property of a labeled token.

– 22 – CSCE 771 Spring 2013 Feature-Based Liner Classifiers p(c |d, lambda)=

– 23 – CSCE 771 Spring 2013 Maxent Model revisited

– 24 – CSCE 771 Spring 2013 Maximum Entropy Markov Models (MEMM) repeatedly use Maxent classifier to iteratively apply to a sequence

– 25 – CSCE 771 Spring 2013

– 26 – CSCE 771 Spring 2013 Named Entity Recognition (NER) enities – 1.a : being, existence; especially : independent, separate, or self-contained existence beingexistencebeingexistence b: the existence of a thing as contrasted with its attributes 2.: something that has separate and distinct existence and objective or conceptual reality 3.: an organization (as a business or governmental unit) that has an identity separate from those of its members one of those with a name

– 27 – CSCE 771 Spring 2013 Classes of Named Entities Person (PERS) Location (LOC) Organization (ORG) DATE Example: Jim bought 300 shares of Acme Corp. in And producing an annotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in Jim bought 300 shares of Acme Corp. in

– 28 – CSCE 771 Spring 2013 IOB tagging

– 29 – CSCE 771 Spring 2013.

– 30 – CSCE 771 Spring 2013 Chunking - partial parsing

– 31 – CSCE 771 Spring 2013 NLTK ch07.py def ie_preprocess(document): sentences = nltk.sent_tokenize(document) sentences = nltk.sent_tokenize(document) sentences = [nltk.word_tokenize(sent) for sent in sentences] sentences = [nltk.word_tokenize(sent) for sent in sentences] sentences = [nltk.pos_tag(sent) for sent in sentences] sentences = [nltk.pos_tag(sent) for sent in sentences] sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), # [_chunkex-sent] ("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] grammar = "NP: { ? * }" # [_chunkex-grammar] cp = nltk.RegexpParser(grammar) result = cp.parse(sentence) print result

– 32 – CSCE 771 Spring 2013 (S (NP the/DT little/JJ yellow/JJ dog/NN) (NP the/DT little/JJ yellow/JJ dog/NN) barked/VBD barked/VBD at/IN at/IN (NP the/DT cat/NN)) (NP the/DT cat/NN))(S (NP the/DT little/JJ yellow/JJ dog/NN) (NP the/DT little/JJ yellow/JJ dog/NN) barked/VBD barked/VBD at/IN at/IN (NP the/DT cat/NN)) (NP the/DT cat/NN)) (S (NP money/NN market/NN) fund/NN)

– 33 – CSCE 771 Spring 2013 (CHUNK combined/VBN to/TO achieve/VB) (CHUNK continue/VB to/TO place/VB) (CHUNK serve/VB to/TO protect/VB) (CHUNK wanted/VBD to/TO wait/VB)

– 34 – CSCE 771 Spring 2013 from nltk.corpus import conll2000 print conll2000.chunked_sents('train.txt')[99] print " B********************************************" print conll2000.chunked_sents('train.txt', chunk_types=['NP'])[99] print " C********************************************" from nltk.corpus import conll2000 cp = nltk.RegexpParser("") test_sents = conll2000.chunked_sents('test.txt', chunk_types=['NP']) print cp.evaluate(test_sents)

– 35 – CSCE 771 Spring 2013 Information extraction Step towards understanding Find named entities Figure out what is being said about them; actually just relations of named entities

– 36 – CSCE 771 Spring 2013 Outline of natural language processing