Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06

Outline Basic concept and properties Relation between DT, DL, and TBL Case study

Basic concepts and properties

TBL overview Introduced by Eric Brill (1992) Intuition: –Start with some simple solution to the problem –Then apply a sequence of transformations Applications: –Classification problems –Other kinds of problems: e.g., parsing

TBL flowchart

Transformations A transformation has two components: –A trigger environment: e.g., the previous tag is DT –A rewrite rule: change the current tag from MD to N If (prev_tag == T) then MD  N Similar to a rule in decision tree, but the rewrite rule can be complicated (e.g., change a parse tree)  a transformation list is a processor and not (just) a classifier.

Learning transformations 1.Initialize each example in the training data with a classifier 2.Consider all the possible transformations, and choose the one with the highest score. 3.Append it to the transformation list and apply it to the training corpus to obtain a “new” corpus. 4.Repeat steps 2-3.  Steps 2-3 can be expensive. Various ways that try to solve the problem.

Using TBL The initial state-annotator The space of allowable transformations –Rewrite rules –Triggering environments The objective function: minimize error rate directly. –for comparing the corpus to the truth –for choosing a transformation

Using TBL (cont) Two more parameters: –Whether the effect of a transformation is visible to following transformations –If so, what’s the order in which transformations are applied to a corpus? left-to-right right-to-left

The order matters Transformation: If prev_label=A then change the cur_label from A to B. Input: A A A A A A Output: –Not immediate results: A B B B B B –Immediate results, left-to-right: A B A B A B –Immediate results, right-to-left: A B B B B B

Relation between DT, DL, and TBL

DT and TBL DT is a subset of TBL (Proof) when depth(DT)=1: 1.Label with S 2.If X then S  A 3.S  B

DT is a subset of TBL L1: Label with S’ L1’ T1T2 L2:: Label with S’’ L2’ Label with S If X then S  S’ S  S’’ L1’ L2’ Depth=n: Depth=n+1:

DT is a subset of TBL Label with S If X then S  S’ S  S’’ L1’ (renaming X with X’) L2’ (renaming X with X’’) X’  X X’’  X

DT is a proper subset of TBL There exists a problem that can be solved by TBL but not a DT, for a fixed set of primitive queries. Ex: Given a sequence of characters –Classify a char based on its position If pos % 4 == 0 then “yes” else “no” –Input attributes available: previous two chars

Transformation list: –Label with S: A/S A/S A/S A/S A/S A/S A/S –If there is no previous character, then S  F A/F A/S A/S A/S A/S A/S A/S –If the char two to the left is labeled ith F, then S  F A/F A/S A/F A/S A/F A/S A/F –If the char two to the left is labeled with F, then F  S A/F A/S A/S A/S A/F A/S A/S –F  yes –S  no

DT and TBL TBL is more powerful than DT Extra power of TBL comes from –Transformations are applied in sequence –Results of previous transformations are visible to following transformations.

DL and TBL DL is a proper subset of TBL. In two-class TBL: –(if q then y’  y)  (if q then y) –If multiple transformations apply to an example, only the last one matters

Two-class TBL  DL ? Two-class TBL  DL –Replace “if q then y’  y” with “if q then y” –Reverse the rule order DL  two-class TBL –Replace “if q then y” with “if q then y’  y” –Reverse the rule order  does not hold for “dynamic” problems: –Dynamic problem: the anwers to questions are not static: –Ex: in POS tagging, when the tag of a word is changed, it changes the answers to questions for nearby words.

DT, DL, and TBL (summary) K-DT is a proper subset of k-DL. DL is a proper subset of TBL. Extra power of TBL comes from –Transformations are applied in sequence –Results of previous transformations are visible to following transformations. TBL transforms training data. It does not split training data. TBL is a processor, not just a classifier

Case study

TBL for POS tagging The initial state-annotator: most common tag for a word. The space of allowable transformations –Rewrite rules: change cur_tag from X to Y. –Triggering environments (feature types): unlexicalized or lexicalized

Unlexicalized features t -1 is z t -1 or t -2 is z t -1 or t -2 or t -3 is z t -1 is z and t +1 is w …

Lexicalized features w 0 is w. w -1 is w w -1 or w -2 is w t -1 is z and w 0 is w. …

TBL for POS tagging (cont) The objective function: tagging accuracy –for comparing the corpus to the truth: –For choosing a transformation: choose the one that results in the greatest error reduction. The order of applying transformations: left-to- right. The results of applying transformations are not visible to other transformations.

Learned transformations

Experiments

Uncovered issues Efficient learning algorithms Probabilistic TBL –Top-N hypothesis –Confidence measure

TBL Summary TBL is more powerful than DL and DT. –Transformations are applied in sequence –It can handle dynamic problems well. TBL is more than a classifier –Classification problems: POS tagging –Other problems: e.g., parsing TBL performs well because it minimizes errors directly. Learning can be expensive  various methods

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Similar presentations

Presentation on theme: "Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Similar presentations

Presentation on theme: "Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06."— Presentation transcript:

Similar presentations

About project

Feedback