Neural Machine Translation

Slides:

Advertisements

Similar presentations

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Artificial Intelligence (AI) Addition to the lecture 11.

UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

The Electronic Geometry Textbook Project Xiaoyu Chen LMIB - Department of Mathematics Beihang University, China.

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Haitham Elmarakeby.  Speech recognition

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

DeepWalk: Online Learning of Social Representations

S.Bengio, O.Vinyals, N.Jaitly, N.Shazeer

Attention Model in NLP Jichuan ZENG.

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

Statistical Machine Translation Part II: Word Alignments and EM

RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,

Approaches to Machine Translation

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Natural Language and Text Processing Laboratory

Recurrent Neural Networks for Natural Language Processing

Relation Extraction CSCI-GA.2591

Neural Machine Translation by Jointly Learning to Align and Translate

An Overview of Machine Translation

Natural Language Processing (NLP)

Improving a Pipeline Architecture for Shallow Discourse Parsing

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

Deep learning and applications to Natural language processing

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Neural Machine Translation By Learning to Jointly Align and Translate

Statistical Machine Translation Part III – Phrase-based SMT / Decoding

Neural Language Model CS246 Junghoo “John” Cho.

Deep Learning based Machine Translation

Distributed Representation of Words, Sentences and Paragraphs

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Understanding LSTM Networks

Approaches to Machine Translation

The Big Health Data–Intelligent Machine Paradox

Introduction to Natural Language Processing

Machine Translation(MT)

Natural Language to SQL(nl2sql)

Report by: 陆纪圆.

RNN Encoder-decoder Architecture

Natural Language Processing (NLP)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Neural Modular Networks

Attention for translation

Statistical Machine Translation Part VI – Phrase-based Decoding

Artificial Intelligence 2004 Speech & Natural Language Processing

Information Retrieval

Neural Machine Translation using CNN

Neural Machine Translation

CSC 578 Neural Networks and Deep Learning

Language Transfer of Audio Word2Vec:

Neural Machine Translation by Jointly Learning to Align and Translate

Natural Language Processing (NLP)

Presentation transcript:

Neural Machine Translation Omid Kashefi omid.Kashefi@pitt.edu Visual Languages Seminar November, 2016 Instructions for editing school and department titles: Select from menu: View > Master > Slide Master Click on each text area you wish to edit. Text will become editable. 1

Outline Machine Translation Deep Learning Neural Machine Translation The study of naturally occurring connected/coherent sentences In linguistics, the term “discourse” refers to a structural unit larger than the sentence. Discourse minimally involves more than one sentence, and the sentences must be contingent. Pragmatics, context contributes to meaning. The objects of discourse analysis (discourse, writing, conversation, communicative event) are variously defined in terms of coherent sequences of sentences, propositions, speech, or turns-at-talk A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another.

Machine Translation Machine Translation Use of software in translating from one language into another Oldest Natural Language Processing Problem Late 40’s (Weaver 1949) Cryptoanalysis Rule-based Approaches A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Machine Translation Statistical Machine Translation Parallel corpus The mathematics of statistical machine translation (Brown et al. 1993) Introduced five models Word alignments Phrase-based Machine Translation (Koehn et al., 2003) Phrase alignment A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Deep Learning Good Old Neural Networks Deep Learning Computation Power Data Deep Learning A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Deep Learning Deep Learning Simplicity Hand-crafting features Feature engineering Representation Learning Does it works (remarkably) better? Not necessarily When to use it? Having a lot of data A discourse relation (or rhetorical relation) is a description of how two segments of discourse are logically connected to one another. Rhetorical relations or coherence relations or discourse relations RST explains coherence hierarchical, connected structure of texts RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels. Shallow discourse parsing is the task of parsing a piece of text into a set of discourse relations between two adjacent or non-adjacent discourse units. We call this task shallow discourse parsing because the relations in a text are not connected to one another to form a connected structure in the form of a tree or graph. Yield more reliable human annotation

Neural Machine Translation Translation Problem Find target sentence y Maximize the conditional probability of y given source sentence x arg max p(y|x) Encoder-Decoder (Sutskever et al., 2014) Encode the source sentence x Decode that to target sentence y

Neural Machine Translation RNN Encoder Read input sentence x = (x1, x2, … , xn) into a vector c ht = f(xt,ht−1) c = q({ h1, h2, … ,hn })

Neural Machine Translation RNN Decoder Predict the next word yt Given the context vector c And all previously predicted words (y1, y2, … , yt−1) p(y|x) ≈ p(y) = 𝒕=𝟏 𝒏 𝑷 𝒚 𝒕 𝒚 𝟏 , …, 𝒚 𝒕−𝟏 , 𝒄) RNN 𝑷 𝒚 𝒕 𝒚 𝟏 , …, 𝒚 𝒕−𝟏 , 𝒄)=𝒈( 𝒚 𝒕−𝟏 , 𝒔 𝒕 , 𝒄)

Neural Machine Translation

Neural Machine Translation Compared to even easiest model, IBM Model 1 (Brown et al. 1993) Extensive domain knowledge 20 slides of complex formula Compared to state-of-the-art (Koehn et al., 2003) Performs comparably good

Neural Machine Translation Improvements Jointly train decoder and encoder (Cho et al., 2015) Variable length context vector (Bahdanau et al., 2015) Hybrid Models Phrase-based translation Score phrase pairs with RNN (Cho et al., 2014) Reorder translation candidates (Sutskever et al., 2014)

Thank You