Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.

Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009

About PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

Syntactic Parsing and Head Percolation PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

Parsing I: Constituency Structure Vinken will join the board as a nonexecutive director Nov 29 (ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))

Vinken will join the board as a nonexecutive director Nov 29 HEADDEPENDENT joinVinken joinwill boardthe joinboard joinas directora directornonexecutive asdirector 29Nov join29 Parsing II: Dependency Structure

Parsing III: Head Percolation  It is straightforward to convert constituency tree to an unlabeled dependency tree (Gaifman 1965)  Use head percolation tables to identify head child in a constituency representation (Magerman 1995)  Dependency tree is obtained by recursively applying head child and non-head child heuristics (Xia & Palmer 2001) (NP (DT the) (NN board)) NP right NN/NNP/CD/JJ (NP-board (DT the) (NN board)) the is dependent on board

Parsing IV: Three Parses  Constituency (phrase-structure) parses : CON requires CON parser  Dependency (head-dependent) parses : DEP requires DEP parser  Percolated (head-dependent) parses : PERC requires CON parser + heuristics

Phrase-Based Statistical Machine Translation PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

PBSMT I: Framework  argmax e p(e|f) = argmax e p(f|e) p(e)  Decoder, Translation Model, Language Model  PBSMT framework in Moses (Koehn et al., 2007)  Phrase Table in Translation Model := Align words + extract phrases + score phrases  Different methods to extract phrases  Moses phrase extraction as baseline system…

PBSMT II: Non-syntactic Phrase Extraction  … baseline Moses  Get word alignments (src2tgt, tgt2src)  Perform grow-diag-final heuristics (Koehn et al., 2003)  Extract phrase pairs consistent with the word alignments  String-based (non-syntactic) phrases: STR

PBSMT III: Syntactic Phrase Extraction  Get word alignments (src2tgt, tgt2src)  Parse src sentences  Parse tgt sentences  Use Tree Aligner to align subtree nodes (Zhechev 2009)  Extract surface-level chunks from parallel treebanks  Previously, Tinsley et al., 2007 & Hearne et al., 2008  Syntactic phrases: CON DEP PERC

System Design PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

System I: Tools and Resources  English-French parallel corpora  Phrase Structure Parsers (En, Fr)  Dependency Structure Parsers (En, Fr)  Head Percolation tables (En, Fr)  Statistical Tree Aligner  Giza++ Word Aligner  SRILM (Language Modeling) Toolkit  Moses Decoder CORPORATRAINDEVTEST JOC7,723400599 EUROPARL100,0001,8892,000

System II: # Entries in Phrase tables: Europarl Phrase TypesCommon to both Unique in 1 st type Unique in 2 nd type DEP & PERC369K213K195K CON & PERC492K171K72K STR & PERC127K2,018K437K CON & DEP391K271K191K STR & DEP128K2,016K454K STR & CON144K2,000K518K STRCONDEPPERC 2,145K663K585K565K PERC is a unique knowledge source… … but is it useful?

System III: Combinations  Concatenate phrase tables and re-estimate probabilities  15 different systems: ∑ 4 C r, 1≤r≤4 STR CON DEP PERC UNIBITRIQUAD SSC, SD, SPSCD, SCP, SDPSCDP CCD, CPCDP- DDP-- P---

MT Systems and Evaluation PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

Numbers I: Evaluation - JOC

Numbers II: Evaluation - Europarl

Numbers III: Uniquely best  Evaluate MT systems STR, CON, DEP, PERC on a per sentence level. (Translation Error Rate)  JOC (440 sentences):  Europarl (2000 sentences): STRCONDEPPERC 1837383101 STRCONDEPPERC 2481120301331

Numbers IV: Adding +PERC: Europarl

Analysis of Results PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

Analysis I: STR  Using Moses baseline phrases (STR) is essential for coverage. SIZE matters!  However, adding any system to STR increases baseline score. Symbiotic!  Hence, do not replace STR, but augment it.

Analysis II: CON  Seems to be the best combination with STR (S+C seems to be the best performing system)  Has most common chunks with PERC  Does PERC harm a CON system – needs more analysis

Analysis III: DEP  PERC is different from DEP chunks, despite being formally equivalent  PERC can substitute DEP

Analysis IV: PERC  Is a unique knowledge source.  Sometimes, it helps.  Needs more work on finding connection with CON / DEP

Conclusion & Future Work PARSINGPBSMTSYSTEM Using Percolated Dependencies in Phrase Based Statistical Machine Translation ENDNOTEANALYSISNUMBERS

Conclusion & Future Work  Extended Hearne et al., 2008 by - scaling up data size from 7.7K to 100K - introducing percolated dependencies in PBSMT  Manual evaluation  More analysis of results  More combining strategies  Seek to determine if each chunk type “owns” sentence types

Thanks

Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.

Similar presentations

Presentation on theme: "Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.

Similar presentations

Presentation on theme: "Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009."— Presentation transcript:

Similar presentations

About project

Feedback