Neural Nets: Something you can use and something to think about Cris Koutsougeras What are Neural Nets What are they good for Pointers to some models and.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Linear Discriminant Functions
Perceptron.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Radial Basis Functions
Neural Networks Marco Loog.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Chapter 6: Multilayer Neural Networks
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Aula 4 Radial Basis Function Networks
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Collaborative Filtering Matrix Factorization Approach
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Feed-Forward Neural Networks 主講人 : 虞台文. Content Introduction Single-Layer Perceptron Networks Learning Rules for Single-Layer Perceptron Networks – Perceptron.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Classification with Perceptrons
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Artificial Neural Network
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Learning with Perceptrons and Neural Networks
Learning in Neural Networks
One-layer neural networks Approximation problems
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Collaborative Filtering Matrix Factorization Approach
Neuro-Computing Lecture 4 Radial Basis Function Network
Artificial Intelligence 10. Neural Networks
Computer Vision Lecture 19: Object Recognition III
Presentation transcript:

Neural Nets: Something you can use and something to think about Cris Koutsougeras What are Neural Nets What are they good for Pointers to some models and formulations Methods that you can use Fun and open issues

What are they good for System of interest (Black box) If I have a finite set of observations (samples) of input-output behavior, can I figure out what function the box performs? System identification Prediction-forecasting Controls, Autogenerate a simulator Dealing with cases where only samples are known about the function of interest

Why Neural Nets Target function is unknown except from samples Target is known but is very hard to describe in finite terms (e.g. closed form expression) Target function is non-deterministic Something to think about

General Structure of a NN … … … Inputs Outputs

Function Approximation Output is a composition of the various node functions. Output is parametric on the inputs; weights-thresholds are parameters Bottomline: function approximators

Nets perform functional compositions Net output=

Output is a complex function of the inputs. The complexity comes from the deep nesting of the typical neuron functions. … Inputs f f ffff Y x1x2xn Y=f(f(f(f(x1, x2, … xn), f(f(f (x1, x2, … xn), f(f(f((x1, x2, … xn), …..)))

Net’s function is an “elastic” curve By adjusting the weights (curve parameters) the curve fits the samples Adjusting the weights is the key issue

How does it work We have a Sample set S i ={(x 1,t 1 ), (x 2,t 2 ),…(x n,t n )} We have the net producing: y i =f(x i, W) We define a Quality measure Q(W) that involves f and the targets t i. We adjust W iteratively:  W=-a  w Q until Q is optimized. A convenient Q is usually the mean square error

How does it work Target Net function Something you can use

Nonlinear regression Quality function is the sum of individual errors. Minimizing the error is like stretching the curve to fit the samples. Problem: How do we know that we are done?

Nonlinear regression

Problems Not enough non-linearity to fit, or Overfitting Need for minimal nonlinearity that can accomplish fitting

Gradient Decent can get stuck Weight Space Total Error (Q)

Simulated Annealing Turn a into a time function; start with very large values and gradually reduce it Theorem: If a is reduced at a slow enough rate the probability of landing at the global minimum asymptotically tends to 1 Something you can use

Simulated Annealing By starting with lots of energy and reducing it slowly enough, the probe will eventually have enough energy to jump out of local minima but not out of the global. If it remains long enough in that energy range it will get trapped in the global minimum area.

Let’s have some fun What network structure do we need? Particularly how many nodes?

Let’s have some fun

… Inputs X ij W rj V rj YjYj Y j = F(  W rj V rj ) So:  W rj V rj ) = F -1 (Y j ) r linear equations

Our Framework

New Class of Training Algorithms We conclude that after proper training (by any method) all intermediate normalized vectors Y project at the same point in the direction of W. Thus all Y’s are aligned on a plane that is perpendicular to W New class of algorithms: –Find weights for hidden layer that align all Y’s on a plane –W for the output layer is the normal to that plane

One such Algorithm didi YiYi W Minimizewhich is parametric on all weights. Thus use as and perform a gradient descent: quality function : Something you can use

Open Questions: What is the minimum number of neurons needed? What is the minimum nontrivial rank that the system can assume? This determines the number of neurons in the intermediate layer.

Interesting Results The local activation functions must be nonlinear for hidden layer but not for the output layer. We thus arrive at the same result as Kolmogorov’s theorem The solvability of proves universal approximation with only one necessary hidden layer The minimum nontrivial rank of the matrix provides the number of hidden layer neurons necessary for proper fitting Problem: the matrix is parametric and we have no effective method for computing the lowest (non trivial) rank We came up with other characterizations based on Vapnik- Chervonenkis dimension and PAC learning However, the problem of a precise optimum number for the hidden layer is at large still open (Something to think about)

Clustering Models (pattern recognition/classification) Neuron functions represent “discriminant functions” that can be used to construct borders among classes.

Clustering Models (pattern recognition/classification) Neuron functions represent “discriminant functions” that can be used to construct borders among classes.

Linear neurons (thresholding) Output = 1 if F(w 1 x 1 + w 2 x w n x n ) > T 0 if F(w 1 x 1 + w 2 x w n x n ) < T F x1x1 x2x2 xnxn... W1W1 WnWn W2W2 T

Radial Basis Output = 1 if (w 1 - x 1 ) 2 + (w 2 -x 2 ) (w n -x n ) 2 > R 2 F x1x1 x2x2 xnxn... W1W1 WnWn W2W2 R 0 if (w 1 - x 1 ) 2 + (w 2 -x 2 ) (w n -x n ) 2 < R