Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deterministic Part-of-Speech Tagging with Finite-State Transducers 정 유 진 KLE Lab. CSE POSTECH 98. 10. 16 by Emmanuel Roche and Yves Schabes.

Similar presentations


Presentation on theme: "Deterministic Part-of-Speech Tagging with Finite-State Transducers 정 유 진 KLE Lab. CSE POSTECH 98. 10. 16 by Emmanuel Roche and Yves Schabes."— Presentation transcript:

1 Deterministic Part-of-Speech Tagging with Finite-State Transducers 정 유 진 KLE Lab. CSE POSTECH 98. 10. 16 by Emmanuel Roche and Yves Schabes

2 CS730B Statistical NLP - Page 2 - Introduction oStochastic approaches to NLP have often been preferred to rule- based approaches oEric Brill (1992) : rule-based tagger by inferring rules from a training corpus m rules are automatically acquired m require drastically less space than stochastic tagger m but, considerably slow  Deterministic Finite-State Transducer (Subsequential Transducer)

3 CS730B Statistical NLP - Page 3 - Overview of Brill ’ s Tagger oStructure of the tagger m Lexical tagger (Initial tagger) m Unknown word tagger m Contextual tagger oInefficiency m Individual rules is compared at each token of the input (Fig.3) m Potential interaction between rules (Fig.1) oComplexity : RKn R : # of contextual rulesn : # of input words K : max # of tokens which rules require

4 CS730B Statistical NLP - Page 4 - Finite-State Transducer (1) oFinite-State Transducer T = ( , Q, i, F, E)  : finite alphabetQ : finite set of states i : initial stateF : set of final state E : set of transitions (q, a, w, q’) on Q  (   {  })  *  Q oDeterministic F.S. TransducerT = ( , Q, i, F, , ,  )  : deterministic state transition func. ( q  a = q’)  : deterministic emission func. ( q  a = w’ )  : final emission func.(  (q) = w for q  F )

5 CS730B Statistical NLP - Page 5 - Finite-State Transducer (2) ostate transition function d (q,a) = {q’  Q |  w’   * and (q,a,w’,q’)  E} oemission function  (q,a,q’) = {w’   * | (q,a,w’,q’)  E}

6 CS730B Statistical NLP - Page 6 - Construction of the Finite-State Tagger (1) 1. Turn each contextual rule into a finite-state transducer 2. Local extension of the transducer (algorithm of Fig.17) vbn vbd PRETAG np 012 np/npvbn/vbd 10 np/np vbn/vbd ?/? np/np ?/?

7 CS730B Statistical NLP - Page 7 - Construction of the Finite-State Tagger (2) 3. Combines all transducers into one single transducer (algorithm of Elgot and Mezei) 4. Transforming the obtained transducer into an equivalent subsequential (deterministic) transducer (algorithm of Fig.21) oAdvantage m Requires n steps to tag a sentence of length n, independently of the number of rules and the length of the context m Eliminate inefficiencies of Brill’s tagger

8 CS730B Statistical NLP - Page 8 - Local Extension Algorithm 1 0 2 a/bb/c b/d {0} identity {0,1} identity {1} transd {2} transd {} transd 0 2 3 1 4?/? a/a a/b b/b b/d b/c // Fig.18 Fig.19

9 CS730B Statistical NLP - Page 9 - Determinization Algorithm 1 0 3 a/bh/h 2 a/ce/e (2,  ) (1,b) (2,c) (0,  ) 012 a/a/ h/bh e/ce Fig.22 Fig.13

10 CS730B Statistical NLP - Page 10 - Lexical Tagger oThe first step of the tagging process : looking up each word in a dictionary (Fig.9) oTo achieve high speed : (Fig.10) è Represent the dictionary by a deterministic finite-state automaton (algorithm of Revuz) oAdvantage m fast access : 12,000 words / second m small storage space : 742Kb (ASCII form)  360Kb oUnknown words Tagger m same techniques used

11 CS730B Statistical NLP - Page 11 - Implementation of Finite-State Transducer oRepresented by a two-dimensional table m row: states m column: alphabet of all possible input letters m content: output of the transition qnqn a w...

12 CS730B Statistical NLP - Page 12 - Evaluation oOverall performance comparison (Fig.11) m Stochastic Tagger : Church’s trigram tagger (1988) m Rule-based Tagger : Brill’s tagger m All taggers were trained on the Brown corpus and used same lexicon of Fig.10 oSpeeds of the different parts of finite-state tagger (Fig.12) m Low-level factors (storage access) dominate the computation


Download ppt "Deterministic Part-of-Speech Tagging with Finite-State Transducers 정 유 진 KLE Lab. CSE POSTECH 98. 10. 16 by Emmanuel Roche and Yves Schabes."

Similar presentations


Ads by Google