EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
CS Catching Up CS Porter Stemmer Porter Stemmer (1980) Used for tasks in which you only care about the stem –IR, modeling given/new distinction,
Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Statistical techniques in NLP Vasileios Hatzivassiloglou University of Texas at Dallas.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Implementing FastTBL in Oz Leif Grönqvist & Fredrik Kronlid
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Some Advances in Transformation-Based Part of Speech Tagging
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Dialogue Act Tagging Using TBL Sachin Kamboj CISC 889: Statistical Approaches to NLP Spring 2003 September 14, 2015September 14, 2015September 14, 2015.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
Albert Gatt Corpora and Statistical Methods Lecture 10.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Language Independent Method for Question Classification COLING 2004.
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Tokenization & POS-Tagging
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.
Deterministic Part-of-Speech Tagging with Finite-State Transducers 정 유 진 KLE Lab. CSE POSTECH by Emmanuel Roche and Yves Schabes.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
PoS tagging and Chunking with HMM and CRF
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
Strategy Pattern Jim Fawcett CSE776 – Design Patterns Fall 2014.
Objective of This Course
Classification and Prediction
CS4705 Natural Language Processing
Chunk Parsing CS1573: AI Application Development, Spring 2003
Classical Part of Speech (PoS) Tagging
Using Uneven Margins SVM and Perceptron for IE
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
Strategy Pattern Jim Fawcett CSE776 – Design Patterns Fall 2014.
Presentation transcript:

EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven L earning ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven L earning

EMNLP’01 19/11/2001 TBEDL Transformation-Based Error-Driven Learning (Brill 92,93,95) The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules Firstly, unannotated text is passed through an initial-state annotator Then, at each step the algorithm adds the transformation rule that best repairs the current errors The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules Firstly, unannotated text is passed through an initial-state annotator Then, at each step the algorithm adds the transformation rule that best repairs the current errors

EMNLP’01 19/11/2001 TBEDL Concrete rules are acquired by instantiation of a predefined set of template rules: conjunction_of_conditions transformation When annotating a new text, all the transformation rules are applied in order of generation Concrete rules are acquired by instantiation of a predefined set of template rules: conjunction_of_conditions transformation When annotating a new text, all the transformation rules are applied in order of generation Transformation-Based Error-Driven Learning (Brill 92,93,95)

EMNLP’01 19/11/2001 Unnanotated Text Annotated Text Rules “Truth” Initial State Learner TRAINING TBEDL Transformation-Based Error-Driven Learning (Brill 92,93,95)

EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Initial_State_Annotator = Most_Frequent Label Three types of templates –Non lexicalized conditions –Lexicalized patterns –Morphological conditions for dealing with unknown words Initial_State_Annotator = Most_Frequent Label Three types of templates –Non lexicalized conditions –Lexicalized patterns –Morphological conditions for dealing with unknown words TBEDL

EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Non-lexicalized conditions: TBEDL

EMNLP’01 19/11/2001 TBEDL First implementation

EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Non-lexicalized conditions: best rules acquired TBEDL

EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Lexicalized patterns: TBEDL

EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Lexicalized patterns: TBEDL – as /IN tall /JJ as /IN – We do ’nt eat / We did ’nt usually drink

EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Morphological conditions for dealing with unknown words: TBEDL

EMNLP’01 19/11/2001 TBEDL TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Unknown words: best rules acquired

EMNLP’01 19/11/2001 Tested on 600 Kw of the Wall Street annotated corpus –Number of transformation rules: <500 –Accuracy: 97.0% % (with no unknown words) The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules 96.6% considering unknown words (82.2%) Tested on 600 Kw of the Wall Street annotated corpus –Number of transformation rules: <500 –Accuracy: 97.0% % (with no unknown words) The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules 96.6% considering unknown words (82.2%) TBEDL TB(ED)L Applied to POS Tagging (Brill 92,93,94,95)

EMNLP’01 19/11/2001 TBEDL TB(ED)L Applied to POS Tagging (Brill 92,93,94,95)

EMNLP’01 19/11/2001 TB(ED)L and NLP POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96) PP-attachment disambiguation (Brill & Resnik, 1994) Grammar induction and Parsing (Brill, 1993) Context-sensitive Spelling Correction (Mangu & Brill, 1996) Word Sense Disambiguation (Dini et al., 1998) Dialogue Act Tagging (Samuel et al., 1998a,1998b) Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004) POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96) PP-attachment disambiguation (Brill & Resnik, 1994) Grammar induction and Parsing (Brill, 1993) Context-sensitive Spelling Correction (Mangu & Brill, 1996) Word Sense Disambiguation (Dini et al., 1998) Dialogue Act Tagging (Samuel et al., 1998a,1998b) Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004) TBEDL

EMNLP’01 19/11/2001 TB(ED)L: Main Drawback TBEDL Computational cost –Memory & Time (specially on Training) Some proposals –Ramshaw & Marcus (1994) –LazyTBL (Samuel 98)  -TBL (Lager 99) –ICA (Hepple 00) –FastTBL (Ngai & Florian, 01) Computational cost –Memory & Time (specially on Training) Some proposals –Ramshaw & Marcus (1994) –LazyTBL (Samuel 98)  -TBL (Lager 99) –ICA (Hepple 00) –FastTBL (Ngai & Florian, 01)

EMNLP’01 19/11/2001 Extensions: LazyTBEDL (Samuel 98) Uses Brill’s TB(ED)L algorithm Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data Application to Dialogue Act Tagging –Accuracy results: 75.5% over state-of-the-art systems Uses Brill’s TB(ED)L algorithm Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data Application to Dialogue Act Tagging –Accuracy results: 75.5% over state-of-the-art systems TBEDL

EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)

EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)

EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)

EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)

EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)

EMNLP’01 19/11/2001 TBEDL Extensions: FastTBEDL (Ngai & Florian 01)

EMNLP’01 19/11/2001 TBEDL Extensions: FastTBEDL (Ngai & Florian 01) Software available at: Software available at:

EMNLP’01 19/11/2001 TB(ED)L: Summary Advantages –General, simple and understandable modeling –Provides a very compact set of interpretable transformation rules –High accuracy in many NLP applications Advantages –General, simple and understandable modeling –Provides a very compact set of interpretable transformation rules –High accuracy in many NLP applications TBEDL Drawbacks –Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL) –Sequential application of rules Drawbacks –Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL) –Sequential application of rules

EMNLP’01 19/11/2001 TB(ED)L: Summary Others –A transformation list is a processor and not a classifier –A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995) Others –A transformation list is a processor and not a classifier –A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995) TBEDL