Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.

Similar presentations


Presentation on theme: "Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009."— Presentation transcript:

1 Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009

2 About PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

3 Syntactic Parsing and Head Percolation PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

4 Parsing I: Constituency Structure Vinken will join the board as a nonexecutive director Nov 29 (ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))

5 Vinken will join the board as a nonexecutive director Nov 29 HEADDEPENDENT joinVinken joinwill boardthe joinboard joinas directora directornonexecutive asdirector 29Nov join29 Parsing II: Dependency Structure

6 Parsing III: Head Percolation  It is straightforward to convert constituency tree to an unlabeled dependency tree (Gaifman 1965)  Use head percolation tables to identify head child in a constituency representation (Magerman 1995)  Dependency tree is obtained by recursively applying head child and non-head child heuristics (Xia & Palmer 2001) (NP (DT the) (NN board)) NP right NN/NNP/CD/JJ (NP-board (DT the) (NN board)) the is dependent on board

7 Parsing IV: Three Parses  Constituency (phrase-structure) parses : CON requires CON parser  Dependency (head-dependent) parses : DEP requires DEP parser  Percolated (head-dependent) parses : PERC requires CON parser + heuristics

8 Phrase-Based Statistical Machine Translation PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

9 PBSMT I: Framework  argmax e p(e|f) = argmax e p(f|e) p(e)  Decoder, Translation Model, Language Model  PBSMT framework in Moses (Koehn et al., 2007)  Phrase Table in Translation Model := Align words + extract phrases + score phrases  Different methods to extract phrases  Moses phrase extraction as baseline system…

10 PBSMT II: Non-syntactic Phrase Extraction  … baseline Moses  Get word alignments (src2tgt, tgt2src)  Perform grow-diag-final heuristics (Koehn et al., 2003)  Extract phrase pairs consistent with the word alignments  String-based (non-syntactic) phrases: STR

11 PBSMT III: Syntactic Phrase Extraction  Get word alignments (src2tgt, tgt2src)  Parse src sentences  Parse tgt sentences  Use Tree Aligner to align subtree nodes (Zhechev 2009)  Extract surface-level chunks from parallel treebanks  Previously, Tinsley et al., 2007 & Hearne et al., 2008  Syntactic phrases: CON DEP PERC

12 System Design PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

13 System I: Tools and Resources  English-French parallel corpora  Phrase Structure Parsers (En, Fr)  Dependency Structure Parsers (En, Fr)  Head Percolation tables (En, Fr)  Statistical Tree Aligner  Giza++ Word Aligner  SRILM (Language Modeling) Toolkit  Moses Decoder CORPORATRAINDEVTEST JOC7,723400599 EUROPARL100,0001,8892,000

14 System II: # Entries in Phrase tables: Europarl Phrase TypesCommon to both Unique in 1 st type Unique in 2 nd type DEP & PERC369K213K195K CON & PERC492K171K72K STR & PERC127K2,018K437K CON & DEP391K271K191K STR & DEP128K2,016K454K STR & CON144K2,000K518K STRCONDEPPERC 2,145K663K585K565K PERC is a unique knowledge source… … but is it useful?

15 System III: Combinations  Concatenate phrase tables and re-estimate probabilities  15 different systems: ∑ 4 C r, 1≤r≤4 STR CON DEP PERC UNIBITRIQUAD SSC, SD, SPSCD, SCP, SDPSCDP CCD, CPCDP- DDP-- P---

16 MT Systems and Evaluation PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

17 Numbers I: Evaluation - JOC

18 Numbers II: Evaluation - Europarl

19 Numbers III: Uniquely best  Evaluate MT systems STR, CON, DEP, PERC on a per sentence level. (Translation Error Rate)  JOC (440 sentences):  Europarl (2000 sentences): STRCONDEPPERC 1837383101 STRCONDEPPERC 2481120301331

20 Numbers IV: Adding +PERC: Europarl

21 Analysis of Results PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

22 Analysis I: STR  Using Moses baseline phrases (STR) is essential for coverage. SIZE matters!  However, adding any system to STR increases baseline score. Symbiotic!  Hence, do not replace STR, but augment it.

23 Analysis II: CON  Seems to be the best combination with STR (S+C seems to be the best performing system)  Has most common chunks with PERC  Does PERC harm a CON system – needs more analysis

24 Analysis III: DEP  PERC is different from DEP chunks, despite being formally equivalent  PERC can substitute DEP

25 Analysis IV: PERC  Is a unique knowledge source.  Sometimes, it helps.  Needs more work on finding connection with CON / DEP

26 Conclusion & Future Work PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

27 Conclusion & Future Work  Extended Hearne et al., 2008 by - scaling up data size from 7.7K to 100K - introducing percolated dependencies in PBSMT  Manual evaluation  More analysis of results  More combining strategies  Seek to determine if each chunk type “owns” sentence types

28 Thanks


Download ppt "Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009."

Similar presentations


Ads by Google