Are End-to-end Systems the Ultimate Solutions for NLP?

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)
A shallow introduction to Deep Learning
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Brief Intro to Machine Learning CS539
Course Outline (6 Weeks) for Professor K.H Wong
Ensembling Diverse Approaches to Question Answering
Big data classification using neural network
He Xiangnan Research Fellow National University of Singapore
Gist of Achieving Human Parity in Conversational Speech Recognition
Approaches to Machine Translation
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Machine Learning for Big Data
Sentiment analysis algorithms and applications: A survey
Japan Science and Technology Agency
Deep learning David Kauchak CS158 – Fall 2016.
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.
Data Mining, Neural Network and Genetic Programming
An Artificial Intelligence Approach to Precision Oncology
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Relation Extraction CSCI-GA.2591
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Intelligent Information System Lab
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Supervised Training of Deep Networks
Azure Machine Learning Noam Brezis Madeira Data Solutions
Deep learning and applications to Natural language processing
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Background & Overview Proposed Model Experimental Results Future Work
Population Information Integration, Analysis and Modeling
Trends and Future of NLP
Machine Learning in Natural Language Processing
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Approaches to Machine Translation
Translingual Knowledge Projection and Statistical Machine Translation
Introduction to Machine Translation
CS246: Information Retrieval
Unsupervised Pretraining for Semantic Parsing
Using Uneven Margins SVM and Perceptron for IE
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Heterogeneous convolutional neural networks for visual recognition
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Neural Modular Networks
Distributed Edge Computing
Learn to Comment Mentor: Mahdi M. Kalayeh
Automatic Handwriting Generation
Artificial Intelligence 2004 Speech & Natural Language Processing
Ask and Answer Questions
Word representations David Kauchak CS158 – Fall 2016.
Information Retrieval
Modeling IDS using hybrid intelligent systems
Bidirectional LSTM-CRF Models for Sequence Tagging
Presentation transcript:

Are End-to-end Systems the Ultimate Solutions for NLP? Jing Jiang March 20, 2018 CICLing

Background Recent years have witnessed a fast-growing trend of using deep learning solutions, oftentimes end-to-end, for NLP tasks. Machine translation Information extraction Question answering Abstractive summarization Good performance No feature engineering Requires a large amount of training data Hard to interpret

Example: Question Answering

Question Answering named entity recognition question type analysis syntactic parsing semantic parsing

End-to-end Question Answering Starts from passages and questions as word sequences Uses deep neural networks for encoding, matching and prediction Does not need named entity recognition, question type analysis, syntactic parsing, etc. On some benchmark dataset, best performance is close to human performance

SQuAD Leaderboard

Other Examples Relation extraction Headline generation Traditionally feature engineering involves POS tagging, constituency parsing, dependency parsing, etc. Recent work uses LSTM or CNN and position embedding, without feature engineering Headline generation Recent work uses sequence-to-sequence model trained on large amount of automatically obtained training data Neural machine translation

End-to-end Systems Advantages: Eliminate the need to design subcomponents and features Reduce error propagation Results are good when sufficient training data is used

End-to-end Systems Problems: Require a large amount of training data, which may not always be readily available Require careful tuning May not be adaptable to a different dataset or domain / overfitting Hard to interpret

End-to-end Systems Do you believe end-to-end systems will become the ultimate solutions to all NLP applications? Many intermediate steps such as morphological analysis, syntactic analysis or even discourse analysis would not be useful

End-to-end Systems Or do you believe end-to-end systems have their limitations? E.g, how do we share knowledge across different tasks?

What Are Your Thoughts?

How do you typically build your systems? Feature engineering + traditional ML method (e.g., SVM) Mixture of traditional method and NN (e.g., incorporate POS tagging and parsing features into a neural network) End-to-end (i.e., no feature engineering)

Is NN model useful for your task? Have not tried it yet Tried but not useful Useful through word embeddings only Useful through models such as CNN and LSTM

Interpretability Do you think interpretability is important? Do you find it hard to interpret your NN model? Does error analysis help you come up with ways to improve your NN model?

Tuning Is a model considered good if it requires heavy tuning? Parameter sensitivity study?

Benchmark datasets Are we just chasing the numbers? Statistical significance tests?

Challenges What challenges do you face when adopting deep learning models for your NLP problem?