Introduction to Natural Language Processing

Slides:



Advertisements
Similar presentations
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Advertisements

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Artificial Intelligence (AI) Addition to the lecture 11.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
A shallow introduction to Deep Learning
Addressing the Rare Word Problem in Neural Machine Translation
Haitham Elmarakeby.  Speech recognition
Semantic Compositionality through Recursive Matrix-Vector Spaces
The Unreasonable Effectiveness of Data
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
DeepWalk: Online Learning of Social Representations
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Distributed Representations for Natural Language Processing
Attention Model in NLP Jichuan ZENG.
Big data classification using neural network
Neural Machine Translation
SUNY Korea BioData Mining Lab - Journal Review
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Hierarchical Question-Image Co-Attention for Visual Question Answering
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Sentiment analysis algorithms and applications: A survey
Deep Learning Amin Sobhani.
Natural Language and Text Processing Laboratory
Recursive Neural Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Neural Machine Translation by Jointly Learning to Align and Translate
An Overview of Machine Translation
Intelligent Information System Lab
Deep learning and applications to Natural language processing
Vector-Space (Distributional) Lexical Semantics
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Neural Machine Translation By Learning to Jointly Align and Translate
Deep Learning Workshop
Efficient Estimation of Word Representation in Vector Space
--Mengxue Zhang, Qingyang Li
Word2Vec CS246 Junghoo “John” Cho.
Attention Is All You Need
Deep Learning based Machine Translation
Recursive Structure.
convolutional neural networkS
Distributed Representation of Words, Sentences and Paragraphs
Paraphrase Generation Using Deep Learning
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
convolutional neural networkS
Understanding LSTM Networks
The Big Health Data–Intelligent Machine Paradox
Machine Translation(MT)
Unsupervised Pretraining for Semantic Parsing
Natural Language to SQL(nl2sql)
Report by: 陆纪圆.
Word2Vec.
Ali Hakimi Parizi, Paul Cook
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Advances in Deep Audio and Audio-Visual Processing
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Introduction to Sentiment Analysis
Presented by: Anurag Paul
Neural Machine Translation using CNN
Neural Machine Translation
Sequence-to-Sequence Models
Deep learning: Recurrent Neural Networks CV192
CSC 578 Neural Networks and Deep Learning
Week 7 Presentation Ngoc Ta Aidean Sharghi
Neural Machine Translation by Jointly Learning to Align and Translate
CS249: Neural Language Model
Presentation transcript:

Introduction to Natural Language Processing Mamoru Komachi <komachi@tmu.ac.jp> Faculty of System Design Tokyo Metropolitan University 16 Feb 2018, 1:1 English presentation rehearsal

Short Bio March 2005 The University of Tokyo Majored in History and Philosophy of Science (B.A.) March 2010 Nara Institute of Science and Technology Majored in Natural Language Processing (PhD.) March 2013 Nara Institute of Science and Technology Started an academic career (Assistant Professor) September 2011 Apple Japan Developed Japanese input method (Software Engineer) April 2013-present Tokyo Metropolitan University Opened a lab (Associate Professor)

Deep learning in a nutshell Paradigm for learning complex mathematical models by using multi- layered neural networks Achieves dramatic improvements in various kinds of pattern recognition tasks Now becomes one of the standard approaches in vision and speech (continuous and dense feature representation) Not widely used in natural language processing until recently due to the nature of the language (discrete and sparse feature representation)

DL learns implicit features without supervision Lee et al., Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. ICML 2009.

Deep learning for natural language processing Representation learning Traditional feature engineering requires huge annotation cost and/or crafting features (feature templates) Need to learn implicit feature representation (without supervision) Architecture for deep neural networks Conventional statistical methods cannot produce fluent sentences Need to generate fluent sentences (possibly with heavy supervision)

Example of implicit features in NLP What is the meaning of king? King Arthur is a legendary British leader of the late 5th and early 6th centuries Edward VIII was King of the United Kingdom and the Dominions of the British Empire, … The meaning of a word can be characterized by its contextual words (distributional hypothesis) “you shall know a word by the company it keeps.” (Firth, 1957)

Vector space model: Representing the meaning as a vector king = (0.1, 0.7, -0.3, 0, -1)T How often it co-occurs with “leader”? How often it co-occurs with “empire”? How often it co-occurs with “dog”? …… Similarity of word (vector) king・farmar = ||king|| ||farmar|| cos θ similarity = cos θ = king・farmar / ||king|| ||farmar||

Semantic composition (additive operation) by a word vector (word2vec) king = (0.1, 0.7, -0.3, 0, -1)T

Quantitative evaluation of semantic similarity of words and phrases Visualization of word embeddings Red: word embeddings learned by word2vec Blue: word embeddings optimized by the usage of grammatical errors of English learners Can reflect the use of words to learn word embeddings

Other representation of semantic meanings Matrix Baroni and Zamperelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective- noun constructions in semantic space. Tensor Socher et al. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.

Deep learning for natural language processing Representation learning Traditional feature engineering requires huge annotation cost and/or crafting features (feature templates) Need to learn implicit feature representation (without supervision) Architecture for deep neural networks Conventional statistical methods cannot produce fluent sentences Need to generate fluent sentences (possibly with heavy supervision)

Natural language generation: Machine translation Translate a sentence in a language to another Research question: How to address multi-class classification problem? How to model alignment between words? https://twitter.com/haldaume3/status/732333418038562816

Traditional approaches in machine translation Bernerd Vauquois’ Pyramid Generates an expression in the target language by understanding the source language Mainstream approache until 90s CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=683855

Recent advances in statistical and empirical methods in machine translation 1.Translation models 2. Open-source software 3. Automatic evaluation 4. Optimization 5.Massive Data Minimum error rate training (2003) GIZA++, SRILM Pharaoh, Moses (2003-) Europarl Patent corpus (2008) BLEU (2002) Starting from IBM models (1993) to phrase-based methods (2003)

Statistical machine translation: Learning mathematical models from parallel corpus Noisy channel model ③Decode ④Optimization P(f | e) P(e) ②Rule extraction argmax BLEU 翻訳モデル 言語モデル 目的 言語 source target target target source target ①Alignment Parallel corpus Raw corpus Development corpus (reference)

The larger the data, the better the translation Brants et al. “Large Language Models in Machine Translation”. EMNLP-2007. ←Better Translation Worse→ ←small Data Data large→

From statistical models to a neural model Why factorize? →If the parallel corpus is small, it is not able to estimate translation model P(e|f) robustly. What if we have a large-scale parallel corpus? →No need to factorize translation models! P 𝑒 𝑓 = P( 𝑒 𝑧 | 𝑒 <𝑧 ,𝑓) Generating word depends on source words and all the target words generated so far

Recurrent neural network (RNN) encodes sequence of words 深い Recurrent neural network (RNN) encodes sequence of words マジ やばい やばい </s> output hidden マジ input 深層学習

Generating a sentence by combining two RNN models (sequence to sequence) Encoder-decoder approach DL is is really cool </s> decoder </s> DL encoder Encode a sentence vector from word vectors in the source language 深層学習 マジ やばい

Seq2seq models can learn complex models only from simple features Zhang and LeCun, Text Understanding from Scratch, arXiv 2015. →Learns text classification model only from characters Zaremba and Sutskever, Learning to Execute, arXiv 2015. →Learns Python interpreter only using RNN

How to make alignments? →Attends source side during decoding Attention = weighted sum of hidden states of encoder DL is is really cool </s> DL <s> Do not form a single sentence vector. Rather, it uses all the word vectors by an attention mechanism. 深層学習 マジ やばい </s>

If there is a large-scale parallel corpus, neural machine translation outperforms statistical methods Luong et al. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015.

Attention can be done not only for sequence but also for tree structure Eriguchi et al. Tree-to-Sequence Attentional Neural Machine Translation. ACL 2016. Attends phrase structure in the encoder to consider syntactic structures on the source side

Deep learning enables language generation from multi-modal input Generates a fluent caption only from a single image http://deeplearning.cs.toronto.edu/i2t http://googleresearch.blogspot.jp/2014/11/a-picture-is-worth-thousand- coherent.html

Summary: Deep learning for natural language processing Representation learning Can find implicit features Can compute the meaning of a sentence by semantic composition through mathematical modeling Architecture for deep neural network Can generate fluent sentences Opens up broad possibility for natural language generation