CS-424 Gregory Dudek Today’s Lecture Neural networks –Backprop example Clustering & classification: case study –Sound classification: the tapper Recurrent.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Introduction to Neural Networks Computing
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Artificial Neural Networks
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Functions
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Neural Networks Marco Loog.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Artificial Neural Networks
CS 484 – Artificial Intelligence
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Radial Basis Function Networks
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
This week: overview on pattern recognition (related to machine learning)
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Chapter 9 Neural Network.
Machine Learning Chapter 4. Artificial Neural Networks
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Non-Bayes classifiers. Linear discriminants, neural networks.
CS-424 Gregory Dudek Today’s Lecture Neural networks –Training Backpropagation of error (backprop) –Example –Radial basis functions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Self-Organizing Network Model (SOM) Session 11
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Neural Networks Winter-Spring 2014
Radial Basis Function G.Anuradha.
CSE P573 Applications of Artificial Intelligence Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
Hidden Markov Models Part 2: Algorithms
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Network - 2 Mayank Vatsa
Today’s Lecture Project notes Recurrent nets Unsupervised learning
Outline Announcement Neural networks Perceptrons - continued
Today’s Lecture Project notes Recurrent nets Unsupervised learning
Presentation transcript:

CS-424 Gregory Dudek Today’s Lecture Neural networks –Backprop example Clustering & classification: case study –Sound classification: the tapper Recurrent nets Nettalk Glovetalk –video Radial basis functions Unsupervised learning –Kohonen nets Not in text

CS-424 Gregory Dudek Backprop Consider sigmoid activation functions. We can examine the output of the net as a function of the weights. –How does the output change with changes in the weights? –Linear analysis: consider partial derivative of output with respect to weight(s). We saw this last lecture. –If we have multiple layers, consider effect on each layer as a function of the preceding layer(s). We propagate the error backwards through the net (using the chain rule for differentiation). Derivation on overheads [reference: DAA p. 212] Review

CS-424 Gregory Dudek Backprop observations We can do gradient descent in weight space. What is the dimensionality of this space? –Very high: each weight is a free variable. There are as many dimensions as weights. A “typical” net might have hundreds of weights. Can we find the minimum? –It turns out that for multi-layer networks, the error space (often called the “energy” of the network) is NOT CONVEX. [so?] –Commonest approach: multiple restart gradient descent. i.e. Try learning given various random initial weigh t distributions. Review

CS-424 Gregory Dudek Success? Stopping? We have a training algorithm (backprop). We might like to ask: –1. Have we done enough training (yet)? –2. How good is our network at solving the problem? –3. Should we try again to learn the problem (from the beginning)? The first 2 problems have standard answers: –Can’t just look at energy. Why not? Because we want to GENERALIZE across examples. “I understand multiplication: I know 3*6=18, 5*4=20.” –What’s 7*3? Hmmmm. –Must have additional examples to validate the training. Separate input data into 2 classes: training and testing sets. Can also use cross-validation. Review

CS-424 Gregory Dudek What can we learn? For any mapping from input to output units, we can learn it if we have enough hidden units with the right weights! In practice, many weights means difficulty. The right representation is critical! Generalization depends on bias. –The hidden units form an internal representation of the problem. make them learn something general. Bad example: one hidden unit learns exactly one training example. –Want to avoid learning by table lookup.

CS-424 Gregory Dudek Representation Much learning can be equated with selected a good problem representation. –If we have the right hidden layer, things become easy. Consider the problem of face recognition from photographs. Or fingerprints. –Digitized photos: a big array (256x256 or 512x512) of intensities. –How do we match one array to another? (Either manually or by computer.) –Key: measure important properties, use those as criteria for estimating similarity?

CS-424 Gregory Dudek Faces (an example) What is an important property to measure for faces? –Eye distance? –Average intensity BAD! –Nose width? –Forehead height? These measurements form the basis functions for describing faces. –BUT NOT NECESSARILY photographs!!! We don’t need to reconstruct the photo. Some information is not needed.

CS-424 Gregory Dudek Radial basis functions Use “blobs” summed together to create an arbitrary function. –A good kind of blob is a Gaussian: circular, variable width, can be easily generalized to 2D, 3D,....

CS-424 Gregory Dudek Topology changes Can we get by with fewer connections? When every neuron from one layer is connected to every layer in the next layer, we call the network fully-connected. What if we allow signals to flow backwards to a preceding layer? Recurrent networks

CS-424 Gregory Dudek Inductive bias? Where’s the inductive bias? –In the topology and architecture of the network. –In the learning rules. –In the input and output representation. –In the initial weights. Not in text

CS-424 Gregory Dudek Backprop demo Consider the problem of learning to recognize handwritten digits. –Each digit is a small picture –Sample the location in the picture: these are pixels (picture elements) –A grid of pixels can be used as input to a network –Each digit is a training example. –Use multiple output units, one to classify each possible digit. Not in text

CS-424 Gregory Dudek Clustering & recognition How to we recognize things? –By learning salient features. That’s what the hidden layer is doing. Another aspect of this is clustering together data that is associated. 1. What do you observe (what features to extract)? 2. How do you measure similarity in “feature space”? 3. What do you measure with respect to? One approach to part 3 is to define “exemplars”. Not in text

CS-424 Gregory Dudek The tapper: a case study See overheads….. Not in text

CS-424 Gregory Dudek Temporal features? Question: how do you deal with time varying inputs? We can represent time explicitly or implicitly. The tapper represented time explicitly by recording a signal as a function of time. –Analysis is then static on a signal f(t) where t happens to be time (or, in the discrete domain, on a vector). –Shortcomings? Not in text

CS-424 Gregory Dudek Difficulties of explicit time It is storage intensive (more units, more signal most be stored and manipulated). It is “weight intensive”: more units, more weights? You must determine a priori how larger the temporal sampling window much be! So what? More weights -> harder training. (less bias) Not in text

CS-424 Gregory Dudek Topology changes Can we get by with fewer connections? When every neuron from one layer is connected to every layer in the next layer, we call the network fully-connected. What if we allow signals to flow backwards to a preceding layer? Recurrent networks

CS-424 Gregory Dudek Recurrent nets Simplest instance: –Allow signals to flow in various directions between layers Can now trivially compute f(t) = f(x)+ 0.5 f(t-1) Time is implicit in the topology and weights. Difficult to avoid blurring in time, however.

CS-424 Gregory Dudek Example: nettalk Learn to read English: output is audio speech First grad text, before & after training. Dictionary Text, getting progressively Better.

CS-424 Gregory Dudek Example: glovetalk Input: gestures Output: speech Akin to reading sign language, but HIGHLY simplified. –Input is encoded as electrical signals from, essentially, a Nintendo Power Glove. Various joint angles as a function of time. See Video

CS-424 Gregory Dudek Unsupervised learning Given a set of input examples, can we categorize them into meaningful groups? This appears to be a common aspect of human intelligence. –A precursor to recognition, hypothesis formation, etc. Several mechanisms can be used. –The tapper illustrated an essentially statistical approach Supervised? –Nice methods based on Bayes rule

CS-424 Gregory Dudek Kohonen nets AKA Kohonen feature maps A graph G that encodes learning –Think of vertices as embedded in a metric space, eg. The plane. –Inputs represented as points in an N-dimensional space Based on number of degrees of freedom (number of parameters) Every node connected to all inputs One weight on every edge

CS-424 Gregory Dudek Kohonen learning For each training example s i Compute distance d between s i and each node n j. d = sqrt( ( [s i ][w i ] T ) 2 ) Find node with min distance For each node close to n: D(n,n’)<K update weights: w i = w i + c(s i - w i ) Make c and K smaller.

CS-424 Gregory Dudek Kohonen parameters Need two functions: (Weighted) Distance from nodes to inputs Distance between nodes Two constants: –A threshold K on inter-node distance –A learning rate c

CS-424 Gregory Dudek Kohonen example

CS-424 Gregory Dudek Kohonen: problems? We had to choose the features in advance.