February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
February 2007CSA3050: Tagging II1 CSA2050: Natural Language Processing Tagging 2 Rule-Based Tagging Stochastic Tagging Hidden Markov Models (HMMs) N-Grams.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
CS Catching Up CS Porter Stemmer Porter Stemmer (1980) Used for tasks in which you only care about the stem –IR, modeling given/new distinction,
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Tagging, Partial Parsing Context.
1 I256: Applied Natural Language Processing Marti Hearst Sept 20, 2006.
Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Part of speech (POS) tagging
Introduction to Machine Learning Approach Lecture 5.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Albert Gatt Corpora and Statistical Methods Lecture 9.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Word classes and part of speech tagging Chapter 5.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
Tokenization & POS-Tagging
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Natural Language Processing
CPSC 503 Computational Linguistics
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
1 Introduction to NLTK part 2 Euromasters SS Trevor Cohn Euromasters summer school 2005 Introduction to NLTK Part II Trevor Cohn July 12, 2005.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Python and NLTK Shallow Parsing and Chunking NLTK Lite.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Word classes and part of speech tagging Chapter 5.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Natural Language Processing (NLP)
Lecture 6: Part of Speech Tagging (II): October 14, 2004 Neal Snider
Chunk Parsing CS1573: AI Application Development, Spring 2003
CSCI 5832 Natural Language Processing
Natural Language Processing (NLP)
Natural Language Processing (NLP)
Presentation transcript:

February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007CSA3050: Tagging III and Chunking 2 Tagging 3 and Chunking Lecture Slides based on Mike Rosner and Marti Hearst notes Additions from NLTK tutorials

February 2007CSA3050: Tagging III and Chunking 3 3 Approaches to Tagging 1.Rule-Based Tagger: ENGTWOL Tagger (Voutilainen 1995) 2.Stochastic Tagger: HMM-based Tagger 3.Transformation-Based Tagger: Brill Tagger (Brill 1995)

February 2007CSA3050: Tagging III and Chunking 4 Transformation-Based Tagging A combination of rule-based and stochastic tagging methodologies: –like rule-based tagging: rules are used to specify tags in a certain environment; –like stochastic tagging: machine learning is used. –Transformation-Based Learning (TBL)

February 2007CSA3050: Tagging III and Chunking 5 Transformation Based Error Driven Learning unannotated text initial state annotated text TRUTHlearner transformation rules diagram after Brill (1996)

February 2007CSA3050: Tagging III and Chunking 6 TBL Requirements Initial State Annotator List of allowable transformations Scoring function Search strategy

February 2007CSA3050: Tagging III and Chunking 7 Initial State Annotation Input –Corpus –Dictionary –Frequency counts for each entry Output –Corpus tagged with most frequent tags

February 2007CSA3050: Tagging III and Chunking 8 TBL Requirements Initial State Annotator List of allowable transformations Scoring function Search strategy

February 2007CSA3050: Tagging III and Chunking 9 Transformations Each transformation comprises A source tag A target tag A triggering environment Example NN VB Previous tag is TO

February 2007CSA3050: Tagging III and Chunking 10 More Examples Source tag Target Tag Triggering Environment NN VB previous tag is TO VBP VB one of the three previous tags is MD JJR RBR next tag is JJ VBP VB one of the two previous words is n’t

February 2007CSA3050: Tagging III and Chunking 11 Allowable transforms based on fixed schemas Schemat i-3 t i-2 t i-1 t i t i+1 t i+2 t i+3 1* 2* 3* 4* 5* 6* 7* 8* 9*

February 2007CSA3050: Tagging III and Chunking 12 Set of Possible Transformations The set of possible transformations is enumerated by allowing every possible tag or word in every possible slot in every possible schema This set can get quite large

February 2007CSA3050: Tagging III and Chunking 13 TBL Requirements Initial State Annotator List of allowable transformations Scoring function Search strategy

February 2007CSA3050: Tagging III and Chunking 14 Scoring Function For a given tagging state of the corpus For a given transformation For every word position in the corpus If the rule applies and yields a correct tag, increment score by 1 If the rule applies and yields an incorrect tag, decrement score by 1

February 2007CSA3050: Tagging III and Chunking 15 TBL Requirements Initial State Annotator List of allowable transformations Scoring function Search strategy

February 2007CSA3050: Tagging III and Chunking 16 The Basic Algorithm Label every word with its most likely tag Repeat the following while improvement > threshold –Examine every possible transformation, selecting the one that results in the most improved tagging –Retag the data according to this rule –Append this rule to output list Return output list of transformations

February 2007CSA3050: Tagging III and Chunking 17 TBL: Remarks Execution Speed: TBL tagger is slower than HMM approach. Learning Speed is slow: Brill’s implementation over a day (600k tokens) BUT … Learns small number of simple, non- stochastic rules Can be made to work faster with Finite State Transducers

February 2007CSA3050: Tagging III and Chunking 18 Tagging Unknown Words New words added to (newspaper) language 20+ per month Plus many proper names … Increases error rates by 1-2% Methods Assume the unknowns are nouns. Assume the unknowns have a probability distribution similar to words occurring once in the training set. Use morphological information, e.g. words ending with –ed tend to be tagged VBN.

February 2007CSA3050: Tagging III and Chunking 19 Evaluation The result is compared with a manually coded “Gold Standard” –Typically accuracy reaches 95-97% –This may be compared with the result for a baseline tagger (one that uses no context). Important: 100% accuracy is impossible even for human annotators.

February 2007CSA3050: Tagging III and Chunking 20 A word of caution 95% accuracy: every 20th token wrong 96% accuracy: every 25th token wrong –an improvement of 25% from 95% to 96% ??? 97% accuracy: every 33th token wrong 98% accuracy: every 50th token wrong

February 2007CSA3050: Tagging III and Chunking 21 How much training data is needed? When working with the STTS (50 tags) we observed a strong increase in accuracy when testing on 10´000, 20´000, …, 50´000 tokens, a slight increase in accuracy when testing on up to 100´000 tokens, hardly any increase thereafter.

February 2007CSA3050: Tagging III and Chunking 22 Summary Tagging decisions are conditioned on a wider range of events that HMM models mentioned earlier. For example, left and right context can be used simultaneously. Learning and tagging are simple, intuitive and understandable. Transformation-based learning has also been applied to sentence parsing.

February 2007CSA3050: Tagging III and Chunking 23 The Three Approaches Compared Rule Based –Hand crafted rules –It takes too long to come up with good rules –Portability problems Stochastic –Find sequence with highest probability (Viterbi) –Result of training not accessible to humans –Large storage needs for intermediate results whilst training Transformation –Rules are learned –Small number of rules –Rules can be inspected and modified by humans

February 2007CSA3050: Tagging III and Chunking 24 Shallow/Chunk Parsing Goal: divide a sentence into a sequence of chunks. Chunks are non-overlapping regions of a text [I] saw [a tall man] in [the park]. Chunks are non-recursive – A chunk can not contain other chunks Chunks are non-exhaustive – Not all words are included in chunks

February 2007CSA3050: Tagging III and Chunking 25 Chunk Parsing Examples Noun-phrase chunking: [I] saw [a tall man] in [the park]. Verb-phrase chunking: The man who [was in the park] [saw me]. Prosodic chunking: [I saw] [a tall man] [in the park]. Question answering: –What [Spanish explorer] discovered [the Mississippi River]?

February 2007CSA3050: Tagging III and Chunking 26 Motivation Locating information –e.g., text retrieval Index a document collection on its noun phrases Ignoring information –Generalize in order to study higher-level patterns e.g. phrases involving “gave” in Penn treebank: –gave NP; gave up NP in NP; gave NP up; gave NP help; gave NP to NP –Sometimes a full parse has too much structure Too nested Chunks usually are not recursive

February 2007CSA3050: Tagging III and Chunking 27 Representation BIO (or IOB) Trees

February 2007CSA3050: Tagging III and Chunking 28 Comparison with Full Parsing Parsing is usually an intermediate stage –Builds structures that are used by later stages of processing Full parsing is a sufficient but not necessary intermediate stage for many NLP tasks –Parsing often provides more information than we need Shallow parsing is an easier problem –Less word-order flexibility within chunks than between chunks –More locality: Fewer long-range dependencies Less context-dependence Less ambiguity

February 2007CSA3050: Tagging III and Chunking 29 Chunks and Constituency Constituents: [[a tall man] [ in [the park]]]. Chunks: [a tall man] in [the park]. A constituent is part of some higher unit in the hierarchical syntactic parse Chunks are not constituents – Constituents are recursive But, chunks are typically subsequences of constituents – Chunks do not cross major constituent boundaries

February 2007CSA3050: Tagging III and Chunking 30 Chunk Parsing in NLTK Chunk parsers usually ignore lexical content –Only need to look at part-of-speech tags Possible steps in chunk parsing –Chunking, unchunking –Chinking –Merging, splitting Evaluation –Compare to a Baseline –Evaluate in terms of Precision, Recall, F-Measure Missed (False Negative), Incorrect (False Positive)

February 2007CSA3050: Tagging III and Chunking 31 Chunk Parsing in NLTK Define a regular expression that matches the sequences of tags in a chunk A simple noun phrase chunk regexp: (Note that matches any tag starting with NN) ? * Chunk all matching subsequences: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] If matching subsequences overlap, first 1 gets priority

February 2007CSA3050: Tagging III and Chunking 32 Unchunking Remove any chunk with a given pattern –e.g., unChunkRule(‘ +’, ‘Unchunk NNDT’) –Combine with Chunk Rule + Chunk all matching subsequences: –Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN –Apply chunk rule [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] –Apply unchunk rule [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN

February 2007CSA3050: Tagging III and Chunking 33 Chinking A chink is a subsequence of the text that is not a chunk. Define a regular expression that matches the sequences of tags in a chink A simple chink regexp for finding NP chunks: ( | )+ First apply chunk rule to chunk everything –Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN –ChunkRule(' +', ‘Chunk everything’) [ the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN ] –Apply Chink rule above: [ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ]

February 2007CSA3050: Tagging III and Chunking 34 Merging Combine adjacent chunks into a single chunk –Define a regular expression that matches the sequences of tags on both sides of the point to be merged Example: –Merge a chunk ending in JJ with a chunk starting with NN MergeRule(‘ ’, ‘ ’, ‘Merge adjs and nouns’) [ the/DT little/JJ ] [ cat/NN ] sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN Splitting is the opposite of merging

February 2007CSA3050: Tagging III and Chunking 35 Merging Combine adjacent chunks into a single chunk –Define a regular expression that matches the sequences of tags on both sides of the point to be merged Example: –Merge a chunk ending in JJ with a chunk starting with NN MergeRule(‘ ’, ‘ ’, ‘Merge adjs and nouns’) [ the/DT little/JJ ] [ cat/NN ] sat/VBD on/IN the/DT mat/NN [ the/DT little/JJ cat/NN ] sat/VBD on/IN the/DT mat/NN Splitting is the opposite of merging

February 2007CSA3050: Tagging III and Chunking 36 Next Sessions… NLTK Exercises