EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven L earning ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven L earning
EMNLP’01 19/11/2001 TBEDL Transformation-Based Error-Driven Learning (Brill 92,93,95) The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules Firstly, unannotated text is passed through an initial-state annotator Then, at each step the algorithm adds the transformation rule that best repairs the current errors The learning algorithm is a mistake-driven greedy procedure that iteratively acquires a set of transformation rules Firstly, unannotated text is passed through an initial-state annotator Then, at each step the algorithm adds the transformation rule that best repairs the current errors
EMNLP’01 19/11/2001 TBEDL Concrete rules are acquired by instantiation of a predefined set of template rules: conjunction_of_conditions transformation When annotating a new text, all the transformation rules are applied in order of generation Concrete rules are acquired by instantiation of a predefined set of template rules: conjunction_of_conditions transformation When annotating a new text, all the transformation rules are applied in order of generation Transformation-Based Error-Driven Learning (Brill 92,93,95)
EMNLP’01 19/11/2001 Unnanotated Text Annotated Text Rules “Truth” Initial State Learner TRAINING TBEDL Transformation-Based Error-Driven Learning (Brill 92,93,95)
EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Initial_State_Annotator = Most_Frequent Label Three types of templates –Non lexicalized conditions –Lexicalized patterns –Morphological conditions for dealing with unknown words Initial_State_Annotator = Most_Frequent Label Three types of templates –Non lexicalized conditions –Lexicalized patterns –Morphological conditions for dealing with unknown words TBEDL
EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Non-lexicalized conditions: TBEDL
EMNLP’01 19/11/2001 TBEDL First implementation
EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Non-lexicalized conditions: best rules acquired TBEDL
EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Lexicalized patterns: TBEDL
EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Lexicalized patterns: TBEDL – as /IN tall /JJ as /IN – We do ’nt eat / We did ’nt usually drink
EMNLP’01 19/11/2001 TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Morphological conditions for dealing with unknown words: TBEDL
EMNLP’01 19/11/2001 TBEDL TB(ED)L Applied to POS Tagging (Brill 92,93,94,95) Unknown words: best rules acquired
EMNLP’01 19/11/2001 Tested on 600 Kw of the Wall Street annotated corpus –Number of transformation rules: <500 –Accuracy: 97.0% % (with no unknown words) The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules 96.6% considering unknown words (82.2%) Tested on 600 Kw of the Wall Street annotated corpus –Number of transformation rules: <500 –Accuracy: 97.0% % (with no unknown words) The accuracy of a HMM trigram tagger is achieved using only 86 transformation rules 96.6% considering unknown words (82.2%) TBEDL TB(ED)L Applied to POS Tagging (Brill 92,93,94,95)
EMNLP’01 19/11/2001 TBEDL TB(ED)L Applied to POS Tagging (Brill 92,93,94,95)
EMNLP’01 19/11/2001 TB(ED)L and NLP POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96) PP-attachment disambiguation (Brill & Resnik, 1994) Grammar induction and Parsing (Brill, 1993) Context-sensitive Spelling Correction (Mangu & Brill, 1996) Word Sense Disambiguation (Dini et al., 1998) Dialogue Act Tagging (Samuel et al., 1998a,1998b) Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004) POS Tagging (Brill 92,94a,95; Roche & Schabes 95; Aone & Hausman 96) PP-attachment disambiguation (Brill & Resnik, 1994) Grammar induction and Parsing (Brill, 1993) Context-sensitive Spelling Correction (Mangu & Brill, 1996) Word Sense Disambiguation (Dini et al., 1998) Dialogue Act Tagging (Samuel et al., 1998a,1998b) Semantic Role Labeling (Higgins, 2004; Williams et al., 2004; CoNLL-2004) TBEDL
EMNLP’01 19/11/2001 TB(ED)L: Main Drawback TBEDL Computational cost –Memory & Time (specially on Training) Some proposals –Ramshaw & Marcus (1994) –LazyTBL (Samuel 98) -TBL (Lager 99) –ICA (Hepple 00) –FastTBL (Ngai & Florian, 01) Computational cost –Memory & Time (specially on Training) Some proposals –Ramshaw & Marcus (1994) –LazyTBL (Samuel 98) -TBL (Lager 99) –ICA (Hepple 00) –FastTBL (Ngai & Florian, 01)
EMNLP’01 19/11/2001 Extensions: LazyTBEDL (Samuel 98) Uses Brill’s TB(ED)L algorithm Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data Application to Dialogue Act Tagging –Accuracy results: 75.5% over state-of-the-art systems Uses Brill’s TB(ED)L algorithm Applies Monte Carlo strategy to randomly sample from the space of rules, rather than exhaustively analyzing all possible rules The memory and time costs of the TB(ED)L algorithm are drastically reduced without compromising accuracy on unseen data Application to Dialogue Act Tagging –Accuracy results: 75.5% over state-of-the-art systems TBEDL
EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 TBEDL Extensions: LazyTBEDL (Samuel 98)
EMNLP’01 19/11/2001 TBEDL Extensions: FastTBEDL (Ngai & Florian 01)
EMNLP’01 19/11/2001 TBEDL Extensions: FastTBEDL (Ngai & Florian 01) Software available at: Software available at:
EMNLP’01 19/11/2001 TB(ED)L: Summary Advantages –General, simple and understandable modeling –Provides a very compact set of interpretable transformation rules –High accuracy in many NLP applications Advantages –General, simple and understandable modeling –Provides a very compact set of interpretable transformation rules –High accuracy in many NLP applications TBEDL Drawbacks –Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL) –Sequential application of rules Drawbacks –Computational cost: high memory and time requirements. But some efficient variants of TBL have been proposed (fastTBL) –Sequential application of rules
EMNLP’01 19/11/2001 TB(ED)L: Summary Others –A transformation list is a processor and not a classifier –A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995) Others –A transformation list is a processor and not a classifier –A comparison between Decision Trees and Transformation lists can be found in (Brill, 1995) TBEDL