Download presentation
Presentation is loading. Please wait.
Published byCorey Allen Modified over 8 years ago
1
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA
2
Advantages of Neural Machine Translation Models Require only a fraction of the memory needed by traditional statistical machine translation (SMT) models Deep Neural Nets out-perform previous state of the art methods assuming availability of large parallel corpora Can be combined with word-alignment approach to address the rare-word problem
3
Performance of NMT Models Source: T Luong et al, ACL 2015
4
Motivation Advantages of Neural Machine Translation Very large parallel English-Hindi corpora are unavailable However comparable corpora available
5
Encoding Scheme Using one-hot encoding for top-N words chosen from large monolingual corpora for each language Out Of Vocabulary (OOV) words represented by unk Monolingual corpora used: http://corpora.heliohost.org/
6
Encoding Scheme: Example (top 10 words) the 1000000000…... to 0100000000…... and 0010000000…... a 0001000000…... of 0000100000…... in 0000010000…... for 0000001000…... that 0000000100…... is 0000000010…... on 0000000001…... के 1000000000…... में 0100000000…... की 0010000000…... को 0001000000…... से 0000100000…... है 0000010000…... ने 0000001000…... का 0000000100…... और 0000000010…... कि 0000000001…...
7
Recurrent Neural Networks
8
Summary of Methodology Training weak translator using limited parallel corpus Weak translator and aligning heuristic (ex: Hunalign) used to create additional parallel corpus Neural translator re-trained on generated bigger parallel corpus
9
Pipeline for creating parallel corpora Source: K Wołk et al, 2014
10
Aligning sentences Original Hindi Translated English Original English Comparison Heuristics Basic Translation Engine Comparison Adapted From: K Wołk et al, 2014
11
Questions ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.