Presentation is loading. Please wait.

Presentation is loading. Please wait.

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

Similar presentations


Presentation on theme: "2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania."— Presentation transcript:

1 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2 2003 (c) University of Pennsylvania2 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

3 2003 (c) University of Pennsylvania3 Motivation (1) Statistical MT Approaches Statistical MT approaches Pioneered by (Brown et al., 1990, 1993) Leverage large training corpus Outperform traditional transfer based approaches Major Criticism No internal representation, syntax/semantics

4 2003 (c) University of Pennsylvania4 Motivation (2) Hybrid Approaches Hybrid approaches (Wu, 1997) (Alshawi et al., 2000) (Yamada and Knight, 2001, 2002) (Gildea 2003) Applying statistical learning to structured data Problems with Hybrid MT Approaches Structural Divergence (Dorr, 1994) Vagaries of loose translations in real corpora

5 2003 (c) University of Pennsylvania5 Motivation (3) Holy grail: Syntax based MT which captures structural divergence Accomplished work A new approach to the alignment of parallel dependency trees (paper published at MT summit IX) Allowing non-isomorphism of dependency trees

6 2003 (c) University of Pennsylvania6 We are here…

7 2003 (c) University of Pennsylvania7 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

8 2003 (c) University of Pennsylvania8 Define the Alignment Problem Define the alignment problem In natural language: find word mappings between English and Foreign sentences In math: Definition For each, find a labeling, where

9 2003 (c) University of Pennsylvania9 The IBM Models The IBM way Model 1: Orders of words don’t matter, i.e. “bag of words” model Model 2: Condition the probabilities on the length and position Model 3, 4, 5: A. generate fertility of each english word B. generate the identity C. generate the position Gradually adding positioning information

10 2003 (c) University of Pennsylvania10 Using Dependency Trees Positioning information can be acquired from parse trees Parsers: (Collins, 1999) (Bikel, 2002) Problems with using parse trees directly Two types of nodes Unlexicalized non-terminals control the domain Using dependency trees (Fox, 2002): best* phrasal cohesion properties (Xia, 2001): constructing dependency trees from parse trees using the Tree Adjoining Grammar

11 2003 (c) University of Pennsylvania11 The Framework (1) Step 1: train IBM model 1 for lexical mapping probabilities Step 2: find and fix high confidence mappings according to a heuristic function h(f, e) The girl kissed her kitty cat The girl gave a kiss to her cat A pseudo-translation example

12 2003 (c) University of Pennsylvania12 The Framework (2) Step 3: Partition the dependency trees on both sides w.r.t. fixed mappings One fixed mapping creates one new “treelet” Create a new set of parallel dependency structures

13 2003 (c) University of Pennsylvania13 The Framework (3) Step 4: Go back to Step 1 unless enough nodes fixed Algorithm properties An iterative algorithm Time complexity O(n * T(h)), where T(h) is the time for the heuristic function in Step 2. P(f |e) in IBM Model 1 has a unique global maximun Guaranteed convergence Results only depend on the heuristic function h(f, e)

14 2003 (c) University of Pennsylvania14 Heuristics Heuristic functions for Step 2 Objective: find out the confidence of a mapping between a pair of words First Heuristic: Entropy Intuition: model probability distribution shape Second heuristic: Inside-outside probability Idea borrowed from PCFG parsing Fertility threshold: rule out unlikely fertility ratio (>2.0)

15 2003 (c) University of Pennsylvania15 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

16 2003 (c) University of Pennsylvania16 Walking through an Example (1) [English] I have been here since 1947. [Chinese] 1947 nian yilai wo yizhi zhu zai zheli. Iteration 1: One dependency tree pair. Align “I” and “wo”

17 2003 (c) University of Pennsylvania17 Walking through an Example (2) Iteration 2: Partition and form two treelet pairs. Align “since” and “yilai”

18 2003 (c) University of Pennsylvania18 Walking through an Example (3) Iteration 3: Partition and form three treelet pairs. Align “1947” and “1947”, “here” and “zheli”

19 2003 (c) University of Pennsylvania19 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

20 2003 (c) University of Pennsylvania20 Evaluation Training: LDC Xinhua newswire Chinese – English parallel corpus Filtered roughly 50%, 60K+ sentence pairs used The parser generated 53130 parsed sentence pairs. Evaluation: 500 sentence pairs provided by Microsoft Research Asia. Word level aligned by hand. F-score: A: set of word pairs aligned by automatic alignment G: set of word pairs aligned in the gold file.

21 2003 (c) University of Pennsylvania21 Results (1) Results for IBM Model 1 to Model 4 (GIZA) Bootstrapped from Model 1 to Model 4 Signs of overfitting Suspect caused by difference b/w genres in training/testing Itn#IBM 1IBM 2IBM 3IBM 4 10.00000.51280.50820.5130 20.24640.52880.50770.5245 30.46070.52740.51060.5240 40.49350.52750.51300.5247 50.50390.52450.51380.5236 60.50730.52150.51490.5220 70.50920.51910.51420.5218 80.50990.51600.51380.5212 90.51110.5138 0.5195 100.51210.51270.51320.5195

22 2003 (c) University of Pennsylvania22 Results (2) Results for our algorithm: Heuristic h1: (entropy) Heuristic h2: (inside-outside probability) The table shows results after one iteration, M1 = IBM model 1 Overfitting problem mainly caused by violation of the partition assumption in fine- grained dependency structures. M1 Itn# Model h1 Model h2 10.55490.5151 20.55900.5497 30.56320.5515 40.56150.5521 50.56150.5540 60.56030.5543 70.56120.5539 80.56040.5540 90.56110.5542 100.56220.5535

23 2003 (c) University of Pennsylvania23 Outline Motivation Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

24 2003 (c) University of Pennsylvania24 Conclusion Model based on partitioning sentences according to their dependency structure Without the unrealistic isomorphism assumption Outperforms the unstructured IBM models on a large data set. “Orthogonal” to the IBM models uses syntactic structure but no linear ordering information.

25 2003 (c) University of Pennsylvania25 Thank You!


Download ppt "2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania."

Similar presentations


Ads by Google