Presentation is loading. Please wait.

Presentation is loading. Please wait.

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

Similar presentations


Presentation on theme: "English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA."— Presentation transcript:

1 English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA

2 Advantages of Neural Machine Translation Models Require only a fraction of the memory needed by traditional statistical machine translation (SMT) models Deep Neural Nets out-perform previous state of the art methods assuming availability of large parallel corpora Can be combined with word-alignment approach to address the rare-word problem

3 Performance of NMT Models Source: T Luong et al, ACL 2015

4 Motivation Advantages of Neural Machine Translation Very large parallel English-Hindi corpora are unavailable However comparable corpora available

5 Encoding Scheme Using one-hot encoding for top-N words chosen from large monolingual corpora for each language Out Of Vocabulary (OOV) words represented by unk Monolingual corpora used: http://corpora.heliohost.org/

6 Encoding Scheme: Example (top 10 words) the 1000000000…... to 0100000000…... and 0010000000…... a 0001000000…... of 0000100000…... in 0000010000…... for 0000001000…... that 0000000100…... is 0000000010…... on 0000000001…... के 1000000000…... में 0100000000…... की 0010000000…... को 0001000000…... से 0000100000…... है 0000010000…... ने 0000001000…... का 0000000100…... और 0000000010…... कि 0000000001…...

7 Recurrent Neural Networks

8 Summary of Methodology Training weak translator using limited parallel corpus Weak translator and aligning heuristic (ex: Hunalign) used to create additional parallel corpus Neural translator re-trained on generated bigger parallel corpus

9 Pipeline for creating parallel corpora Source: K Wołk et al, 2014

10 Aligning sentences Original Hindi Translated English Original English Comparison Heuristics Basic Translation Engine Comparison Adapted From: K Wołk et al, 2014

11 Questions ?


Download ppt "English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA."

Similar presentations


Ads by Google