Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Functions
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Neural Networks Marco Loog.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning Neural Networks.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Classification Part 3: Artificial Neural Networks
Multiple-Layer Networks and Backpropagation Algorithms
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Appendix B: An Example of Back-propagation algorithm
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Evaluation CSCI-GA.2590 – Lecture 6A Ralph Grishman NYU.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Learning Neural Networks (NN) Christina Conati UBC
COMP53311 Other Classification Models: Neural Network Prepared by Raymond Wong Some of the notes about Neural Network are borrowed from LW Chan’s notes.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Deep Feedforward Networks
Supervised Learning in ANNs
Deep Learning Amin Sobhani.
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Neuro-Computing Lecture 4 Radial Basis Function Network
of the Artificial Neural Networks.
Capabilities of Threshold Neurons
The Naïve Bayes (NB) Classifier
Computer Vision Lecture 19: Object Recognition III
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Image recognition.
Presentation transcript:

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU

The Pipeline Extracting information from natural language text is a complex process We have been able to make it manageable by dividing it into many separate stages, each realized by its own (relatively simple) m odel token izer name tagger POS tagger parser coref IE

Introducing Errors Each stage introduces some errors because model is an oversimplification of linguistic phenomenon (hand-prepared) training data may be noisy typically 10% error rate per stage – error rates range from 3% (POS) to 15% (name tagging)

Compounding Errors Errors in output of stage = errors due to faulty input + errors introduced by stage Final output error rate > 50% 10% 20% 30% 40% 50% 60% token izer name tagger POS tagger parser coref IE

Helpful Feedback We will reduce the error rate by in effect providing feedback from later stages to earlier ones Example: “Roger Park began to work for IBM.” NE tagger says “Roger Park” is most likely a location but could also be a person relation extraction pattern (in IE stage) indicates a preference for “person works” over “location works”, fixing the error 10% 20% 30% 40% 50% 60% token izer name tagger POS tagger parser coref IE

Joint Inference To perform joint inference between stages A and B, – we define an objective function combining A and B – we search the combined space of A and B where possible B outputs may depend on A output seeking to maximize combined objective much larger search space than with independent components

More Examples “Meet me in front of the White House.” – “White House” may refer to the building or the organization therein 1-token context used by NE doesn’t help resolve ambiguity – relation pattern determines this is a reference to building “Ford employs 4000 in Detroit.” – event pattern determines that 4000 is of type people

Benefit For 2 or 3 stages, reductions of 2 - 3% (absolute) in error rate are reported – can only correct errors which change a valid (more likely) input to second stage to an invalid (less likely) input

Cost Much larger space to search – full search of product space infeasible – joint token-by-token scan updating multiple models (NE, relation, event) concurrently use beam search to limit search space – follow only top n hypotheses at each token OR – follow only hypotheses within m% of best hypothesis OR – build graphical model connecting stages

Deep Learning Instead of training separate models for each stage and then coupling them, can we train a unified model to perform the entire analysis starting from a sequence of tokens: deep learning using multi-layer neural networks

2 minute guide to neural nets building block = node (artificial neuron) node takes in multiple real-valued inputs, produces one real-valued output w i = weights (to be learned) K = (non-linear) activation function f(x) x1x1 xnxn

Layers

Training The network is a powerful system with a simple yet efficient training strategy termed backpropagation. – we define a cost function (given an input x, the different between the desired and current outputs) – we compute the derivative of the cost function w.r.t. each of the weights in the network – given an input/output pair, we update each of the weights (gradient descent)

Input Data We need to convert the text into a form which can be effectively used by the NN Natural form of information is a fixed-length vector of real values – these are the word embeddings discussed earlier

Convolution Layer In conventional feature-based systems, we program a set of features – each feature is a function of a set of consecutive tokens: f(w i-1, w i, w i+1 ) – feature is applied starting at each token in succession In NN, we want the network to learn the appropriate features given only the input and final output – We do this through a convolution layer – a row of nodes with a common set of weights – Has potential to create features best suited to main task