Neural Machine Translation

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Artificial Intelligence (AI) Addition to the lecture 11.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
The Electronic Geometry Textbook Project Xiaoyu Chen LMIB - Department of Mathematics Beihang University, China.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Haitham Elmarakeby.  Speech recognition
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
DeepWalk: Online Learning of Social Representations
S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer
Attention Model in NLP Jichuan ZENG.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Statistical Machine Translation Part II: Word Alignments and EM
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Approaches to Machine Translation
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
Natural Language and Text Processing Laboratory
Recurrent Neural Networks for Natural Language Processing
Relation Extraction CSCI-GA.2591
Neural Machine Translation by Jointly Learning to Align and Translate
An Overview of Machine Translation
Natural Language Processing (NLP)
Improving a Pipeline Architecture for Shallow Discourse Parsing
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Neural Machine Translation By Learning to Jointly Align and Translate
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Neural Language Model CS246 Junghoo “John” Cho.
Deep Learning based Machine Translation
Distributed Representation of Words, Sentences and Paragraphs
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Understanding LSTM Networks
Approaches to Machine Translation
The Big Health Data–Intelligent Machine Paradox
Introduction to Natural Language Processing
Machine Translation(MT)
Natural Language to SQL(nl2sql)
Report by: 陆纪圆.
RNN Encoder-decoder Architecture
Attention.
Natural Language Processing (NLP)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Neural Modular Networks
Attention for translation
Statistical Machine Translation Part VI – Phrase-based Decoding
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Neural Machine Translation using CNN
Neural Machine Translation
CSC 578 Neural Networks and Deep Learning
Language Transfer of Audio Word2Vec:
Neural Machine Translation by Jointly Learning to Align and Translate
Natural Language Processing (NLP)
Presentation transcript:

Neural Machine Translation Omid Kashefi omid.Kashefi@pitt.edu Visual Languages Seminar November, 2016 Instructions for editing school and department titles: Select from menu: View > Master > Slide Master Click on each text area you wish to edit. Text will become editable. 1

Outline Machine Translation Deep Learning Neural Machine Translation The study of naturally occurring connected/coherent sentences In linguistics, the term “discourse” refers to a structural unit larger than the sentence. Discourse minimally involves more than one sentence, and the sentences must be contingent. Pragmatics, context contributes to meaning. The objects of discourse analysis (discourse, writing, conversation, communicative event) are variously defined in terms of coherent sequences of sentences, propositions, speech, or turns-at-talk A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another.

Machine Translation Machine Translation Use of software in translating from one language into another Oldest Natural Language Processing Problem Late 40’s (Weaver 1949) Cryptoanalysis Rule-based Approaches A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Machine Translation Statistical Machine Translation Parallel corpus The mathematics of statistical machine translation (Brown et al. 1993) Introduced five models Word alignments Phrase-based Machine Translation (Koehn et al., 2003) Phrase alignment A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Deep Learning Good Old Neural Networks Deep Learning Computation Power Data Deep Learning A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Deep Learning Deep Learning Simplicity Hand-crafting features Feature engineering Representation Learning Does it works (remarkably) better? Not necessarily When to use it? Having a lot of data A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Neural Machine Translation Translation Problem Find target sentence y Maximize the conditional probability of y given source sentence x arg max p(y|x) Encoder-Decoder (Sutskever et al., 2014) Encode the source sentence x Decode that to target sentence y

Neural Machine Translation RNN Encoder Read input sentence x = (x1, x2, … , xn) into a vector c ht = f(xt,ht−1) c = q({ h1, h2, … ,hn })

Neural Machine Translation RNN Decoder Predict the next word yt Given the context vector c And all previously predicted words (y1, y2, … , yt−1) p(y|x) ≈ p(y) = 𝒕=𝟏 𝒏 𝑷 𝒚 𝒕 𝒚 𝟏 , …, 𝒚 𝒕−𝟏 , 𝒄) RNN 𝑷 𝒚 𝒕 𝒚 𝟏 , …, 𝒚 𝒕−𝟏 , 𝒄)=𝒈( 𝒚 𝒕−𝟏 , 𝒔 𝒕 , 𝒄)

Neural Machine Translation

Neural Machine Translation Compared to even easiest model, IBM Model 1 (Brown et al. 1993) Extensive domain knowledge 20 slides of complex formula Compared to state-of-the-art (Koehn et al., 2003) Performs comparably good

Neural Machine Translation Improvements Jointly train decoder and encoder (Cho et al., 2015) Variable length context vector (Bahdanau et al., 2015) Hybrid Models Phrase-based translation Score phrase pairs with RNN (Cho et al., 2014) Reorder translation candidates (Sutskever et al., 2014)

Thank You