Introduction to Natural Language Processing

Slides:

Advertisements

Similar presentations

Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.

Advertisements

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Artificial Intelligence (AI) Addition to the lecture 11.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

A shallow introduction to Deep Learning

Addressing the Rare Word Problem in Neural Machine Translation

Haitham Elmarakeby.  Speech recognition

Semantic Compositionality through Recursive Matrix-Vector Spaces

The Unreasonable Effectiveness of Data

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

DeepWalk: Online Learning of Social Representations

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Distributed Representations for Natural Language Processing

Attention Model in NLP Jichuan ZENG.

Big data classification using neural network

Neural Machine Translation

SUNY Korea BioData Mining Lab - Journal Review

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Hierarchical Question-Image Co-Attention for Visual Question Answering

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Sentiment analysis algorithms and applications: A survey

Deep Learning Amin Sobhani.

Natural Language and Text Processing Laboratory

Recursive Neural Networks

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Neural Machine Translation by Jointly Learning to Align and Translate

An Overview of Machine Translation

Intelligent Information System Lab

Deep learning and applications to Natural language processing

Vector-Space (Distributional) Lexical Semantics

RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION

Neural Machine Translation By Learning to Jointly Align and Translate

Deep Learning Workshop

Efficient Estimation of Word Representation in Vector Space

--Mengxue Zhang, Qingyang Li

Word2Vec CS246 Junghoo “John” Cho.

Attention Is All You Need

Deep Learning based Machine Translation

Recursive Structure.

convolutional neural networkS

Distributed Representation of Words, Sentences and Paragraphs

Paraphrase Generation Using Deep Learning

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

convolutional neural networkS

Understanding LSTM Networks

The Big Health Data–Intelligent Machine Paradox

Machine Translation(MT)

Unsupervised Pretraining for Semantic Parsing

Natural Language to SQL(nl2sql)

Report by: 陆纪圆.

Ali Hakimi Parizi, Paul Cook

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Advances in Deep Audio and Audio-Visual Processing

Word embeddings (continued)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

Introduction to Sentiment Analysis

Presented by: Anurag Paul

Neural Machine Translation using CNN

Neural Machine Translation

Sequence-to-Sequence Models

Deep learning: Recurrent Neural Networks CV192

CSC 578 Neural Networks and Deep Learning

Week 7 Presentation Ngoc Ta Aidean Sharghi

Neural Machine Translation by Jointly Learning to Align and Translate

CS249: Neural Language Model

Presentation transcript:

Introduction to Natural Language Processing Mamoru Komachi <komachi@tmu.ac.jp> Faculty of System Design Tokyo Metropolitan University 16 Feb 2018, 1:1 English presentation rehearsal

Short Bio March 2005 The University of Tokyo Majored in History and Philosophy of Science (B.A.) March 2010 Nara Institute of Science and Technology Majored in Natural Language Processing (PhD.) March 2013 Nara Institute of Science and Technology Started an academic career (Assistant Professor) September 2011 Apple Japan Developed Japanese input method (Software Engineer) April 2013-present Tokyo Metropolitan University Opened a lab (Associate Professor)

Deep learning in a nutshell Paradigm for learning complex mathematical models by using multi- layered neural networks Achieves dramatic improvements in various kinds of pattern recognition tasks Now becomes one of the standard approaches in vision and speech (continuous and dense feature representation) Not widely used in natural language processing until recently due to the nature of the language (discrete and sparse feature representation)

DL learns implicit features without supervision Lee et al., Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. ICML 2009.

Deep learning for natural language processing Representation learning Traditional feature engineering requires huge annotation cost and/or crafting features (feature templates) Need to learn implicit feature representation (without supervision) Architecture for deep neural networks Conventional statistical methods cannot produce fluent sentences Need to generate fluent sentences (possibly with heavy supervision)

Example of implicit features in NLP What is the meaning of king? King Arthur is a legendary British leader of the late 5th and early 6th centuries Edward VIII was King of the United Kingdom and the Dominions of the British Empire, … The meaning of a word can be characterized by its contextual words (distributional hypothesis) “you shall know a word by the company it keeps.” (Firth, 1957)

Vector space model: Representing the meaning as a vector king = (0.1, 0.7, -0.3, 0, -1)T How often it co-occurs with “leader”? How often it co-occurs with “empire”? How often it co-occurs with “dog”? …… Similarity of word (vector) king・farmar = ||king|| ||farmar|| cos θ similarity = cos θ = king・farmar / ||king|| ||farmar||

Semantic composition (additive operation) by a word vector (word2vec) king = (0.1, 0.7, -0.3, 0, -1)T

Quantitative evaluation of semantic similarity of words and phrases Visualization of word embeddings Red: word embeddings learned by word2vec Blue: word embeddings optimized by the usage of grammatical errors of English learners Can reflect the use of words to learn word embeddings

Other representation of semantic meanings Matrix Baroni and Zamperelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective- noun constructions in semantic space. Tensor Socher et al. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.

Deep learning for natural language processing Representation learning Traditional feature engineering requires huge annotation cost and/or crafting features (feature templates) Need to learn implicit feature representation (without supervision) Architecture for deep neural networks Conventional statistical methods cannot produce fluent sentences Need to generate fluent sentences (possibly with heavy supervision)

Natural language generation: Machine translation Translate a sentence in a language to another Research question: How to address multi-class classification problem? How to model alignment between words? https://twitter.com/haldaume3/status/732333418038562816

Traditional approaches in machine translation Bernerd Vauquois’ Pyramid Generates an expression in the target language by understanding the source language Mainstream approache until 90s CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=683855

Recent advances in statistical and empirical methods in machine translation 1.Translation models 2. Open-source software 3. Automatic evaluation 4. Optimization 5.Massive Data Minimum error rate training (2003) GIZA++, SRILM Pharaoh, Moses (2003-) Europarl Patent corpus (2008) BLEU (2002) Starting from IBM models (1993) to phrase-based methods (2003)

Statistical machine translation: Learning mathematical models from parallel corpus Noisy channel model ③Decode ④Optimization P(f | e) P(e) ②Rule extraction argmax BLEU 翻訳モデル言語モデル目的言語 source target target target source target ①Alignment Parallel corpus Raw corpus Development corpus (reference)

The larger the data, the better the translation Brants et al. “Large Language Models in Machine Translation”. EMNLP-2007. ←Better Translation Worse→ ←small Data Data　large→

From statistical models to a neural model Why factorize? →If the parallel corpus is small, it is not able to estimate translation model P(e|f) robustly. What if we have a large-scale parallel corpus? →No need to factorize translation models! P 𝑒 𝑓 = P( 𝑒 𝑧 | 𝑒 <𝑧 ,𝑓) Generating word depends on source words and all the target words generated so far

Recurrent neural network (RNN) encodes sequence of words 深い Recurrent neural network (RNN) encodes sequence of words マジやばいやばい </s> output hidden マジ input 深層学習

Generating a sentence by combining two RNN models (sequence to sequence) Encoder-decoder approach DL is is really cool </s> decoder </s> DL encoder Encode a sentence vector from word vectors in the source language 深層学習マジやばい

Seq2seq models can learn complex models only from simple features Zhang and LeCun, Text Understanding from Scratch, arXiv 2015. →Learns text classification model only from characters Zaremba and Sutskever, Learning to Execute, arXiv 2015. →Learns Python interpreter only using RNN

How to make alignments？ →Attends source side during decoding Attention = weighted sum of hidden states of encoder DL is is really cool </s> DL <s> Do not form a single sentence vector. Rather, it uses all the word vectors by an attention mechanism. 深層学習マジやばい </s>

If there is a large-scale parallel corpus, neural machine translation outperforms statistical methods Luong et al. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015.

Attention can be done not only for sequence but also for tree structure Eriguchi et al. Tree-to-Sequence Attentional Neural Machine Translation. ACL 2016. Attends phrase structure in the encoder to consider syntactic structures on the source side

Deep learning enables language generation from multi-modal input Generates a fluent caption only from a single image http://deeplearning.cs.toronto.edu/i2t http://googleresearch.blogspot.jp/2014/11/a-picture-is-worth-thousand- coherent.html

Summary: Deep learning for natural language processing Representation learning Can find implicit features Can compute the meaning of a sentence by semantic composition through mathematical modeling Architecture for deep neural network Can generate fluent sentences Opens up broad possibility for natural language generation