Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Multi-Layer Perceptron (MLP)
Beyond Linear Separability
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks (1)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Neural Networks Marco Loog.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Back-Propagation Algorithm
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Sentence Processing using a Simple Recurrent Network EE 645 Final Project Spring 2003 Dong-Wan Kang 5/14/2003.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function Networks
1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.
Computer Science and Engineering
Explorations in Neural Networks Tianhui Cai Period 3.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Introduction to Artificial Intelligence (G51IAI) Dr Rong Qu Neural Networks.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Linear Classification with Perceptrons
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
COSC 4426 AJ Boulay Julia Johnson Artificial Neural Networks: Introduction to Soft Computing (Textbook)
Perceptrons Michael J. Watts
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating.
Computational Intelligence Semester 2 Neural Networks Lecture 2 out of 4.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Neural Networks.
Prof. Carolina Ruiz Department of Computer Science
Machine Learning Today: Reading: Maria Florina Balcan
Multilayer Perceptron & Backpropagation
Capabilities of Threshold Neurons
Artificial Intelligence 12. Two Layer ANNs
David Kauchak CS158 – Spring 2019
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

Meeting 3 : Notes: My Web page got a little messed up. Sorry about that! Should be OK now. There is a link to this course, but we will probably move to the new blackboard system. I got some asking about the details of how ANN’s work. Yes. Working out the math for a simple perceptron is fair game for a midterm question. A good link to check out: pris.comp.nus.edu.sg/ArtificialNeuralNetworks/perceptrons.html And I will be happy to arrange to meet with people to go over the math (as I will today at the beginning of class).

Artificial Neural Networks: A Brief Review a) fully recurrent b) feedforward c) multi-component

Bias node (a fixed activation) If the sum of these inputs are great enough, the unit fires. That is to say, a positive activation occurs here. How can we implement the AND function? Threshold unit Input activations A Perceptron

First we must decide on representation: possible inputs: 1,0 possible outputs: 1, unit inputs unit output Boolean AND: How can we implement the AND function? We want an artificial neuron to implement this function.

1 1 1 net unit inputs unit output Oooops net = Σ activations arriving at threshold node

STEP activation function f(x) = 1 if x >= 0 f(x) = 0 if x < net f ( net ) net = Σ activations arriving at threshold node

f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 a 8 (w 89 ) = 0(.87) = 0.0 net 9 = 0(.87) + 1(.76) + 1(-.92) = net = which is less than 0, so a 9 = 0 1 w 91 0 a1a1 The picture is a little more complicated by adding weights. A weight is simply a multiplier. 1 w a0a Boolean AND a i = activation node i, where a 0 is the bias node (fixed at 1) w ij = the weight between node a i and a j net k = the result of summing all multiplications: ai (w ik ) that enter node a k

f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 net 9 = net = ????, so a 9 = ???? 1 w 91 a1a1 Now work out on your own the resulting activations for inputs 1,1. 1 w a0a0 Boolean AND For those that have had some exposure to this stuff, what is the bias node really doing?

f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 = a 7 (w 79 ) = 1(.75) =.75 = a 8 (w 89 ) =.3(1.6667) =.5 net 9 = Σ j a j (w j9 ) =.3(1.6667) + 1(.75) = 1.25 = 1 / (1+ e (-net) ) = w a1a1 The picture is a yet little more complicated by changing the activation function to 1 / (1+e (-net) )

x x x x A hypothesis space for two inputs.

x x x x Can think of the Perceptron as drawing a line in the space that separates the points in hypothesis space. All instances of AND function are linearly separable "true" and "false" regions of the space. 0 1 x x x x But how divide the space for XOR? unit inputs unit output Boolean XOR:

The fact that a perceptron couldn't represent the XOR function stopped ANN research for years (Minsky and Papert's work in 1969 was particularly damaging). Early 1980's work (Hopfield, 1982) on associative memory and in the mid 1980's a simple, important innovation was introduced. Rumelhart, Hinton and Williams (1986) Multilayer networks with backpropagation.

unit inputs unit output Boolean XOR: Try and figure out the weights. Hidden Layer

From: pris.comp.nus.edu.sg/ArtificialNeuralNetworks/perceptrons.html

Now we have to talk about learning. Training simply means the process by which the weights of the ANN are calculated by exposure to training data. Supervised learning: Training dataSupervisor's answers One datum at a time This is a bit simplified. In the general case, it is possible to feed the learner batch data. But the models we will look at in this course data is fed one datum at a time.

0000 From the training file 1 ANN's prediction based on the current weights (which haven't converged yet) 0 From the supervisor's file. Ooops! Gotta go back and increase the weights so that the output unit fires.

f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 a1a1 1 w a0a0 Let’s look at how we might train an OR unit. First: set the weights to values picked out of a hat. and the bias activationt to 1. Then: feed in 1,1. What does the network predict? The prediction is fine (f(.3) = 1) so do nothing. Boolean OR

f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 1 w 91 a1a1 1 w a0a0 Now: feed in 0,1. What does the network predict? Now got to adjust the weights. ANN’s predicition = 0 = f(-.3), But supervisor’s answer = 1 (remember we’re doing boolean OR) But how much to adjust? The modeler picks a value:  = learning rate (Let’s pick.1 for this example)

The weights are adjusted to minimize the error rate of the ANN. Perceptron Learning Procedure: w ij = old w ij +  (Supervisor’s answer - ANN’s prediction) So for example, the ANN predicts 0 and the supervisor says 1 w ij = old w ij +.1 ( 1 - 0) I.e. all weights are increased by.1

For multilayer ANN’s, the error rate is backpropagated through the hidden layer. eyey exex approx = w 3 (Supervisor’s answer-ANN’s prediction) approx = w 4 (Supervisor’s answer - ANN’s prediction) e z = w 1 e y + w 2 e x w1w1 w2w2 w3w3 w4w4 Backpropagated error

In summary: 1)Multilayer ANN’s are Universal Function Approximators - they can approximate any function a modern computer can represent. 2)They learn without explicitly being told any “rules” - they simply cut up the hypothesis space by inducing boundaries. Importantly, they are "non-symbolic" computational devices. That is, they simply multiply activations by weights.

So,what does all of this have to do with linguistics and language? Some assumptions of “classical” language processing (roughly from Elman (1995)) 1)symbols and rules that operate over symbols (S, VP, IP, etc) 2)static structures of competence (e.g. parse trees) 3)structure is built up More or less the classical viewpoint is language as algebra ANN’s make none of these assumptions, so if an ANN can learn language, then perhaps the language as algebra is wrong. We’re going to discuss the pros and cons of Elman’s viewpoint in some depth next week, but for now, let’s go over his variation of the basic, feedforward ANN that we’ve been addressing.

boydogrun book rockseeeat boydogrun book rockseeeat..... Localist representation in a standard feedforward ANN Localist = each node represents a single item. If more than one output node fires, then a group of items can be considered activated. Basic idea is activate a single input node (representing a word) and see which group of output nodes (words) are activated. Output nodes Hidden nodes Input nodes

boydogrun book rockseeeat boydogrun book rockseeeat Elman’s Single Recurrent Network 1) activate from input to output as usual (one input word at a time), but copy the hidden activations to the context layer. 2) repeat 1 over and over - but activate from the input AND copy layers to the ouput layer. 1-to-1 exact copy of activations "regular" tainable weight connections

From Elman (1990) Templates were set up and lexical items were chosen at random from "reasonable" categories. Templates for sentence generator NOUN-HUM VERB-EAT NOUN-FOOD NOUN-HUM VERB-PERCEPT NOUN-INANIM NOUN-HUM VERB-DESTROY NOUN-FRAG NOUN-HUM VERB-INTRAN NOUN-HUM VERB-TRAN NOUN-HUM NOUN-HUM VERB-AGPAT NOUN-INANIM NOUN-HUM VERB-AGPAT NOUN-ANIM VERB-EAT NOUN-FOOD NOUN-ANIM VERB-TRAN NOUN-ANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM NOUN-ANIM VERB-AGPAT NOUN-INANIM VERB-AGPAT NOUN-AGRESS VERB-DESTORY NOUN-FRAG NOUN-AGRESS VERB-EAT NOUN-HUM NOUN-AGRESS VERB-EAT NOUN-ANIM NOUN-AGRESS VERB-EAT NOUN-FOOD Categories of lexical items NOUN-HUM man, woman NOUN-ANIM cat, mouse NOUN-INANIM book, rock NOUN-AGRESS dragon, monster NOUN-FRAG glass, plate NOUN-FOOD cookie, sandwich VERB-INTRAN think, sleep VERB-TRAN see, chase VERB-AGPA move, break VERB-PERCEPT smell, see VERB-DESTROY break, smash VERB-EA eat

Training dataSupervisor's answers woman smash plate cat move man break car boy move girl eat bread dog smash plate cat move man break car boy move girl eat bread dog move Resulting training and supervisor files. Files were 27,354 words long, made up of 10,000 two and three word "sentences."