Are End-to-end Systems the Ultimate Solutions for NLP?

Slides:

Advertisements

Similar presentations

Neural networks Introduction Fitting neural networks

Advertisements

For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

How much do word embeddings encode about syntax? Jacob Andreas and Dan Klein UC Berkeley.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Authors : Ramon F. Astudillo, Silvio Amir, Wang Lin, Mario Silva, Isabel Trancoso Learning Word Representations from Scarce Data By: Aadil Hayat (13002)

A shallow introduction to Deep Learning

For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.

For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.

Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Brief Intro to Machine Learning CS539

Course Outline (6 Weeks) for Professor K.H Wong

Ensembling Diverse Approaches to Question Answering

Big data classification using neural network

He Xiangnan Research Fellow National University of Singapore

Gist of Achieving Human Parity in Conversational Speech Recognition

Approaches to Machine Translation

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Machine Learning for Big Data

Sentiment analysis algorithms and applications: A survey

Japan Science and Technology Agency

Deep learning David Kauchak CS158 – Fall 2016.

Deep Learning Amin Sobhani.

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.

Data Mining, Neural Network and Genetic Programming

An Artificial Intelligence Approach to Precision Oncology

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Relation Extraction CSCI-GA.2591

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Intelligent Information System Lab

Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.

Supervised Training of Deep Networks

Azure Machine Learning Noam Brezis Madeira Data Solutions

Deep learning and applications to Natural language processing

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Background & Overview Proposed Model Experimental Results Future Work

Population Information Integration, Analysis and Modeling

Trends and Future of NLP

Machine Learning in Natural Language Processing

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Approaches to Machine Translation

Translingual Knowledge Projection and Statistical Machine Translation

Introduction to Machine Translation

CS246: Information Retrieval

Unsupervised Pretraining for Semantic Parsing

Using Uneven Margins SVM and Perceptron for IE

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Heterogeneous convolutional neural networks for visual recognition

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Neural Modular Networks

Distributed Edge Computing

Learn to Comment Mentor: Mahdi M. Kalayeh

Automatic Handwriting Generation

Artificial Intelligence 2004 Speech & Natural Language Processing

Ask and Answer Questions

Word representations David Kauchak CS158 – Fall 2016.

Information Retrieval

Modeling IDS using hybrid intelligent systems

Bidirectional LSTM-CRF Models for Sequence Tagging

Presentation transcript:

Are End-to-end Systems the Ultimate Solutions for NLP? Jing Jiang March 20, 2018 CICLing

Background Recent years have witnessed a fast-growing trend of using deep learning solutions, oftentimes end-to-end, for NLP tasks. Machine translation Information extraction Question answering Abstractive summarization Good performance No feature engineering Requires a large amount of training data Hard to interpret

Example: Question Answering

Question Answering named entity recognition question type analysis syntactic parsing semantic parsing

End-to-end Question Answering Starts from passages and questions as word sequences Uses deep neural networks for encoding, matching and prediction Does not need named entity recognition, question type analysis, syntactic parsing, etc. On some benchmark dataset, best performance is close to human performance

SQuAD Leaderboard

Other Examples Relation extraction Headline generation Traditionally feature engineering involves POS tagging, constituency parsing, dependency parsing, etc. Recent work uses LSTM or CNN and position embedding, without feature engineering Headline generation Recent work uses sequence-to-sequence model trained on large amount of automatically obtained training data Neural machine translation

End-to-end Systems Advantages: Eliminate the need to design subcomponents and features Reduce error propagation Results are good when sufficient training data is used

End-to-end Systems Problems: Require a large amount of training data, which may not always be readily available Require careful tuning May not be adaptable to a different dataset or domain / overfitting Hard to interpret

End-to-end Systems Do you believe end-to-end systems will become the ultimate solutions to all NLP applications? Many intermediate steps such as morphological analysis, syntactic analysis or even discourse analysis would not be useful

End-to-end Systems Or do you believe end-to-end systems have their limitations? E.g, how do we share knowledge across different tasks?

What Are Your Thoughts?

How do you typically build your systems? Feature engineering + traditional ML method (e.g., SVM) Mixture of traditional method and NN (e.g., incorporate POS tagging and parsing features into a neural network) End-to-end (i.e., no feature engineering)

Is NN model useful for your task? Have not tried it yet Tried but not useful Useful through word embeddings only Useful through models such as CNN and LSTM

Interpretability Do you think interpretability is important? Do you find it hard to interpret your NN model? Does error analysis help you come up with ways to improve your NN model?

Tuning Is a model considered good if it requires heavy tuning? Parameter sensitivity study?

Benchmark datasets Are we just chasing the numbers? Statistical significance tests?

Challenges What challenges do you face when adopting deep learning models for your NLP problem?