Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Slides from: Doug Gray, David Poole
Computer Science Department FMIPA IPB 2003 Neural Computing Yeni Herdiyeni Computer Science Dept. FMIPA IPB.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2008 Shreekanth Mandayam ECE Department Rowan University.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks: Concepts (Reading:
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2010 Shreekanth Mandayam ECE Department Rowan University.
Chapter 6: Multilayer Neural Networks
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Spring 2002 Shreekanth Mandayam Robi Polikar ECE Department.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Machine Learning. Learning agent Any other agent.
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Chapter 9 Neural Network.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 8: Neural Networks.
Non-Bayes classifiers. Linear discriminants, neural networks.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Lecture 5 Neural Control
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 18 Connectionist Models
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Machine Learning Supervised Learning Classification and Regression
Learning in Neural Networks
FUNDAMENTAL CONCEPT OF ARTIFICIAL NETWORKS
CSC 578 Neural Networks and Deep Learning
Intelligent Leaning -- A Brief Introduction to Artificial Neural Networks Chiung-Yao Fang.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006

HAYKIN, S., "Neural Networks: A Comprehensive Foundation," Prentice Hall, Upper Saddle River, NJ, 1999 Introduction Perceptron Algorithm Multilayered Perceptrons To be covered today

The Biological Neuron The human brain is made of about 100 billions of such neurons.

Characteristics of Biological Neural Networks 1)Massive connectivity 2)Nonlinear, Parallel, Robust and Fault Tolerant 3)Capability to adapt to surroundings 4)Ability to learn and generalize from known examples 5)Collective behavior is different from individual behavior Artificial Neural Networks mimics some of the properties of the biological neural networks

Some Properties of Artificial Neural Networks Assembly of simple processors Information stored in connections – No Memory Massively Parallel Massive connectivity Fault Tolerant Learning and Generalization Ability Robust Individual dynamics different from group dynamics All these properties may not be present in a particular network

Network Characteristics Neural Network Characterized by: 1)Architecture 2)Learning (update scheme of weights and/or outputs) Architecture Layered (single /multiple): Feed forward – MLP, RBF Recurrent : At least one feedback loop – Hopfield Competitive : p – dimensional array of neurons with a set of nodes supplying input to each element of the array – LVQ, SOFM

Learning Supervised : In Presence of a teacher Unsupervised or Self-Organized : No teacher Reinforcement: Trial and error, no teacher, but can asses the situations – reinforcement signals.

Model of an Artificial Neuron u T = (u 1,u 2,…,u N ) The input vector w T =(w 1,w 2,…,w N ) The weight vector

Activation Functions 1)Threshold Function f(v) = 1 if v  0 = 0 otherwise 2)Piecewise-Linear Function f(v) = 1 if v  ½ = v if ½> v > - ½ = 0 otherwise 3)Sigmoid Function f(v) = 1/{1 + exp(- av)} etc..

Perceptron Learning Algorithm Assume we are given a data set X={(x 1,y 1 ),....,( x l,y l )}, where x  R n and y = {1,-1}. Assume X is linearly separable i.e.: There exists a w and b, such that (w T x i + b)y i > 0, for all i Classification of X means finding a w and b such that (w T x i + b)y i > 0, for all i A perceptron can classify X in a finite number of steps

Separating hyperplane

Linearly separable OR, AND and NOT are linearly separable Boolean Functions XOR is not linearly separable

Perceptron Learning Algorithm (Contd.) f(net i ) = 1 if net i > 0 f(net i ) = -1 otherwise net i = w T x i Starting with w (0) =0 we follow the following learning rule: w (t+1) = w (t) +α y i x i for each misclassified point x i

The Multilayered Perceptron MLPs are layered feed-forward networks. The n-th layer is fully connected with the (n+1)-th layer. They are widely used for learning input-output mappings from data which has varied scientific and engineering applications. Each node in an MLP behaves like a perceptron with a sigmoidal activation function.

Multilayered Perceptrons (Contd.) An MLP can learn efficiently any input-output mapping. Suppose we have a training set X={(x 1,y 1 ),....,( x n,y n )}, where x  R p and y  R q. There is an unknown functional relationship between x and y. Say, y = F(x). Our objective is to learn F, given X.

Multilayered Perceptrons (Contd.) When an input vector is given to an MLP it computes a function. The function F* which the MLP computes has the weights and biases of each nodes as a parameter. Let W be a vector which contains all the weights and biases associated with the MPL as its elements, thus the MLP computes the function F*(W,x). Our objective would be to find such a W which minimizes E = ½  i ||F*(W,x i ) – y i || 2

The Gradient descent algorithm Let w = ( w 1,…,w N ) T be a vector of N adjustable parameters. Let J(w) be a scalar cost function, with the following properties : 1)Smoothness: The cost function J(w) is twice differentiable with respect to any pair (w j,w j ) for 1  i  j  N. 2)Existence of Solution: At least one parameter vector w opt = ( w 1,opt,…,w N,opt ) T exists, such that a) b) The N  N Hessian Matrix H(w) with entries h ij (w) Is positive definite for w = w opt

The minimizer for J can be found as Where w(0) is any initial parameter vector and  (k) is a positive values sequence of step sizes. This optimization procedure may lead to a local minima of the cost function J. The Gradient Descent Algorithm (Contd.)

The weights of a MLP which minimizes the error E can also be found by the gradient descent algorithm. This method when applied to a MLP is called the backpropagation which have two passes. Forward pass: where the output is calculated Backward pass: According to the error the weights are updated Modes of update: Batch Update Online Update Training the MLP

Multilayered Perceptron (Contd.) Some important issues: How big should be my network ? No specific answer is known till date. The size of the network depends on the complexity of the problem at hand and the training accuracy which is desired. A good training accuracy does not always means a good network. If the number of free parameters of the network is almost the same as the number of data points, the network tends to memorize the data and gives bad generalization. How many hidden layers to use ? It has been proved that a single hidden layer is sufficient to do any mapping task. But still experience shows that multiple hidden layers may be sometimes simplify learning.

Can a trained network generalize on all data points ? No, it can generalize only on data points which lies within the boundary of the training sample. The output given by an MLP is never reliable on data points far away from the training sample. Can I get the explicit functional form of the relationship that exists in my data from the trained MLP? No, one may write a functional form of nested sigmoids, but it will (in almost all cases) be far from useful. MLPs are black- boxes, one cannot retrieve the rules which governs the input- output mapping from a trained MLP by any easy means.

More on Generalization A network is said to generalize well if it produces correct output (or nearly so) for a input data point never used to train the network. The training of an MLP may be viewed as a “curve fitting” problem. The network performs useful generalization (interpolation) as MLPs with continuous activation functions leads to continuous outputs. If an MLP have too many free parameters compared to the diversity in the data, the network may tend to memorize the training data. Generalization ability depends on: 1)Representativeness of the training set 2)The architecture of the network 3)The complexity of the problem

Some applications 1)Function approximation 2)Classification a) Land Cover classification for remotely sensed images b) Optical Character Recognition many more !! 3)Dimensionality Reduction

S xy Function approximation The system S can be any type of system with numerical input and output.

Classification Classifiers are functions of special types which do not have numerical outputs but have class labels as outputs. D: R p  N pc The class labels can be numerically coded and thus an MLP may be used to learn a classification problem. Example: We may code three different classes as Class Class – Class3

Both the input and output nodes contains p nodes and the hidden layer contain q nodes. Here q<p. A pattern x = (x 1,...,x p ) is presented to the network with the same target x. If the output from the hidden layer of the trained network is tapped, then we get a transformed set of feature vectors y  R q But, these feature vectors y are not interpretable. Dimensionality Reduction by MLP There can be other approaches too !!

Associate with each input node i a multiplier f i. f i takes values in [0,1]. f i 's takes values near one for good features and near zero for bad/redundant ones. A good choice f i = f( i ) =1/(1+e - i ) i 's are learnable. Initialization. Online Feature Selection by MLP

Thank You