Download presentation
Presentation is loading. Please wait.
1
Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06
2
Outline Basic concept and properties Relation between DT, DL, and TBL Case study
3
Basic concepts and properties
4
TBL overview Introduced by Eric Brill (1992) Intuition: –Start with some simple solution to the problem –Then apply a sequence of transformations Applications: –Classification problems –Other kinds of problems: e.g., parsing
5
TBL flowchart
6
Transformations A transformation has two components: –A trigger environment: e.g., the previous tag is DT –A rewrite rule: change the current tag from MD to N If (prev_tag == T) then MD N Similar to a rule in decision tree, but the rewrite rule can be complicated (e.g., change a parse tree) a transformation list is a processor and not (just) a classifier.
7
Learning transformations 1.Initialize each example in the training data with a classifier 2.Consider all the possible transformations, and choose the one with the highest score. 3.Append it to the transformation list and apply it to the training corpus to obtain a “new” corpus. 4.Repeat steps 2-3. Steps 2-3 can be expensive. Various ways that try to solve the problem.
8
Using TBL The initial state-annotator The space of allowable transformations –Rewrite rules –Triggering environments The objective function: minimize error rate directly. –for comparing the corpus to the truth –for choosing a transformation
9
Using TBL (cont) Two more parameters: –Whether the effect of a transformation is visible to following transformations –If so, what’s the order in which transformations are applied to a corpus? left-to-right right-to-left
10
The order matters Transformation: If prev_label=A then change the cur_label from A to B. Input: A A A A A A Output: –Not immediate results: A B B B B B –Immediate results, left-to-right: A B A B A B –Immediate results, right-to-left: A B B B B B
11
Relation between DT, DL, and TBL
12
DT and TBL DT is a subset of TBL (Proof) when depth(DT)=1: 1.Label with S 2.If X then S A 3.S B
13
DT is a subset of TBL L1: Label with S’ L1’ T1T2 L2:: Label with S’’ L2’ Label with S If X then S S’ S S’’ L1’ L2’ Depth=n: Depth=n+1:
14
DT is a subset of TBL Label with S If X then S S’ S S’’ L1’ (renaming X with X’) L2’ (renaming X with X’’) X’ X X’’ X
15
DT is a proper subset of TBL There exists a problem that can be solved by TBL but not a DT, for a fixed set of primitive queries. Ex: Given a sequence of characters –Classify a char based on its position If pos % 4 == 0 then “yes” else “no” –Input attributes available: previous two chars
16
Transformation list: –Label with S: A/S A/S A/S A/S A/S A/S A/S –If there is no previous character, then S F A/F A/S A/S A/S A/S A/S A/S –If the char two to the left is labeled ith F, then S F A/F A/S A/F A/S A/F A/S A/F –If the char two to the left is labeled with F, then F S A/F A/S A/S A/S A/F A/S A/S –F yes –S no
17
DT and TBL TBL is more powerful than DT Extra power of TBL comes from –Transformations are applied in sequence –Results of previous transformations are visible to following transformations.
18
DL and TBL DL is a proper subset of TBL. In two-class TBL: –(if q then y’ y) (if q then y) –If multiple transformations apply to an example, only the last one matters
19
Two-class TBL DL ? Two-class TBL DL –Replace “if q then y’ y” with “if q then y” –Reverse the rule order DL two-class TBL –Replace “if q then y” with “if q then y’ y” –Reverse the rule order does not hold for “dynamic” problems: –Dynamic problem: the anwers to questions are not static: –Ex: in POS tagging, when the tag of a word is changed, it changes the answers to questions for nearby words.
20
DT, DL, and TBL (summary) K-DT is a proper subset of k-DL. DL is a proper subset of TBL. Extra power of TBL comes from –Transformations are applied in sequence –Results of previous transformations are visible to following transformations. TBL transforms training data. It does not split training data. TBL is a processor, not just a classifier
21
Case study
22
TBL for POS tagging The initial state-annotator: most common tag for a word. The space of allowable transformations –Rewrite rules: change cur_tag from X to Y. –Triggering environments (feature types): unlexicalized or lexicalized
23
Unlexicalized features t -1 is z t -1 or t -2 is z t -1 or t -2 or t -3 is z t -1 is z and t +1 is w …
24
Lexicalized features w 0 is w. w -1 is w w -1 or w -2 is w t -1 is z and w 0 is w. …
25
TBL for POS tagging (cont) The objective function: tagging accuracy –for comparing the corpus to the truth: –For choosing a transformation: choose the one that results in the greatest error reduction. The order of applying transformations: left-to- right. The results of applying transformations are not visible to other transformations.
26
Learned transformations
27
Experiments
28
Uncovered issues Efficient learning algorithms Probabilistic TBL –Top-N hypothesis –Confidence measure
29
TBL Summary TBL is more powerful than DL and DT. –Transformations are applied in sequence –It can handle dynamic problems well. TBL is more than a classifier –Classification problems: POS tagging –Other problems: e.g., parsing TBL performs well because it minimizes errors directly. Learning can be expensive various methods
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.