Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Similar presentations


Presentation on theme: "Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual."— Presentation transcript:

1 Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A AAA A A

2 Statistical MT Training Pipeline 1) Align sentence pairs (GIZA++) 2) Parse English sentences (Berkeley parser) Parse Foreign sentences 3) Extract rules (Galley et al. 2006) 4) Tune discriminative parameters at office in read book read the book in the office } Joint model for (1) & (2)

3 Data Setting for Joint Models (; ) English WSJ...... ( EN ; ) (; )...... Chinese CTB Parallel, Aligned CTB...... ( EN,; ) Unlabeled parallel text...... ( EN ;)

4 Word alignment grids at office in read book read the book in the office

5 Syntactic Correspondences EN Build a model

6 Correspondence via Synchronous Grammars

7 Synchronous derivation

8 Synchronous Derivation

9 Weakly Synchronized Example

10 Separate PCFGs

11 Weakly Synchronized Example ITG alignment

12 Weakly Synchronized Example Points for synchronization, but not required

13 Correspondence Model & Feature Types office Feature type 1: Word Alignment EN Feature type 3: Correspondence Feature type 2: Monolingual Parser EN PP in the office EN EN EN EN EN [HBDK09]

14 Estimating EN EN Set to maximize the log-likelihood of the correct parses & alignments EN EN EN normalizes to sum to 1

15 Computing Correspondence features tie pieces together EN EN Computing exactly is intractable EN EN Individual,, have polynomial-time dynamic programming algorithms

16 Approximating : Mean Field Exploit tractability in individual models: Factored approximation: EN 1)Initialize separately 2)Iterate: Set to minimize EN EN Algorithm

17 Large scale inference We can approximate in polynomial time, but... EN Sum over possible alignments is an algorithm. But computers are fast, right? Medium-length sentences are 50 words long Small translation data sets are 250,000 sentences ~4 quadrillion operations (See for speedup details) [BBK10, HBDK09]

18 Quantitative Results: Parsing

19 85.7% 83.6%

20 Quantitative Results: Parsing 81.2% 84.5%

21 Incorrect English PP Attachment

22 Corrected English PP Attachment

23 Quantitative Results: Translation 69.5% 85.0% BLEU improvement from 29.4 to 30.6 79.5%

24 Better Translations with Bilingual Adaptation Reference At this point the cause of the plane collision is still unclear. The local caa will launch an investigation into this. Baseline (GIZA++) The cause of planes is still not clear yet, local civil aviation department will investigate this., Cur- rently causeplanecrashDEreasonyetnotclear,local civil aero- nautics bureauwilltowardopen investi- gations Bilingual Adaptation Model The cause of plane collision remained unclear, local civil aviation departments will launch an investigation.

25 Thanks


Download ppt "Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual."

Similar presentations


Ads by Google