October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 1 K-Class Classification Problem Let us denote the k-th class by C k, with n k exemplars.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Computer Vision Lecture 18: Object Recognition II
Simple Neural Nets For Pattern Classification
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Un Supervised Learning & Self Organizing Maps Learning From Examples
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
November 10, 2009Theory of Computation Lecture 16: Computation on Strings I 1Diagonalization Another example: Let TOT be the set of all numbers p such.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
October 7, 2010Neural Networks Lecture 10: Setting Backpropagation Parameters 1 Creating Data Representations On the other hand, sets of orthogonal vectors.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks Lecture 17: Self-Organizing Maps
October 12, 2010Neural Networks Lecture 11: Setting Backpropagation Parameters 1 Exemplar Analysis When building a neural network application, we must.
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
1/11 طراحی و آموزش شبکه های عصبی Slide from Dr. M. Pomplun.
November 28, 2012Introduction to Artificial Intelligence Lecture 18: Neural Network Application Design I 1 CPN Distance/Similarity Functions In the hidden.
December 5, 2012Introduction to Artificial Intelligence Lecture 20: Neural Network Application Design III 1 Example I: Predicting the Weather Since the.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
1/11 طراحی و آموزش شبکه های عصبی Slide from Dr. M. Pomplun.
From Biological to Artificial Neural Networks Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
Applying Neural Networks Michael J. Watts
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
1 CSC 9010 Spring Paula Matuszek CSC 9010 ANN Lab Paula Matuszek Spring, 2011.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.
EEE502 Pattern Recognition
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
April 12, 2016Introduction to Artificial Intelligence Lecture 19: Neural Network Application Design II 1 Now let us talk about… Neural Network Application.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
April 5, 2016Introduction to Artificial Intelligence Lecture 17: Neural Network Paradigms II 1 Capabilities of Threshold Neurons By choosing appropriate.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Big data classification using neural network
Applying Neural Networks
Supervised Learning in ANNs
One-layer neural networks Approximation problems
FUNDAMENTAL CONCEPT OF ARTIFICIAL NETWORKS
Hidden Markov Models Part 2: Algorithms
of the Artificial Neural Networks.
Word Embedding Word2Vec.
Creating Data Representations
Supervised Function Approximation
NEURAL NETWORK APPLICATION DESIGN
Artificial Intelligence 12. Two Layer ANNs
Computer Vision Lecture 19: Object Recognition III
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Presentation transcript:

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 1 K-Class Classification Problem Let us denote the k-th class by C k, with n k exemplars or training samples, forming the sets T k for k = 1, …, K: The complete training set is T = T 1  …  T K. The desired output of the network for an input of class k is 1 for output unit k and 0 for all other output units: with a 1 at the k-th position if the sample is in class k.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 2 K-Class Classification Problem However, due to the sigmoid output function, the net input to the output units would have to be -  or  to generate outputs 0 or 1, respectively. Because of the shallow slope of the sigmoid function at extreme net inputs, even approaching these values would be very slow. To avoid this problem, it is advisable to use desired outputs  and (1 -  ) instead of 0 and 1, respectively. Typical values for  range between 0.01 and 0.1. For  = 0.1, desired output vectors would look like this:

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 3 K-Class Classification Problem We should not “punish” more extreme values, though. To avoid punishment, we can define l p,j as follows: 1.If d p,j = (1 -  ) and o p,j  d p,j, then l p,j = 0. 2.If d p,j =  and o p,j  d p,j, then l p,j = 0. 3.Otherwise, l p,j = o p,j - d p,j

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 4 NN Application Design Now that we got some insight into the theory of backpropagation networks, how can we design networks for particular applications? Designing NNs is basically an engineering task. As we discussed before, for example, there is no formula that would allow you to determine the optimal number of hidden units in a BPN for a given task.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 5 NN Application Design We need to address the following issues for a successful application design: Choosing an appropriate data representation Choosing an appropriate data representation Performing an exemplar analysis Performing an exemplar analysis Training the network and evaluating its performance Training the network and evaluating its performance We are now going to look into each of these topics.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 6 Data Representation Most networks process information in the form of input pattern vectors.Most networks process information in the form of input pattern vectors. These networks produce output pattern vectors that are interpreted by the embedding application.These networks produce output pattern vectors that are interpreted by the embedding application. All networks process one of two types of signal components: analog (continuously variable) signals or discrete (quantized) signals.All networks process one of two types of signal components: analog (continuously variable) signals or discrete (quantized) signals. In both cases, signals have a finite amplitude; their amplitude has a minimum and a maximum value.In both cases, signals have a finite amplitude; their amplitude has a minimum and a maximum value.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 7 Data Representation analog discretemaxmin maxmin

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 8 Data Representation The main question is: How can we appropriately capture these signals and represent them as pattern vectors that we can feed into the network? We should aim for a data representation scheme that maximizes the ability of the network to detect (and respond to) relevant features in the input pattern. Relevant features are those that enable the network to generate the desired output pattern.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 9 Data Representation Similarly, we also need to define a set of desired outputs that the network can actually produce. Often, a “natural” representation of the output data turns out to be impossible for the network to produce. We are going to consider internal representation and external interpretation issues as well as specific methods for creating appropriate representations.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 10 Internal Representation Issues As we said before, in all network types, the amplitude of input signals and internal signals is limited: analog networks: values usually between 0 and 1 analog networks: values usually between 0 and 1 binary networks: only values 0 and 1allowed binary networks: only values 0 and 1allowed bipolar networks: only values –1 and 1allowed bipolar networks: only values –1 and 1allowed Without this limitation, patterns with large amplitudes would dominate the network’s behavior. A disproportionately large input signal can activate a neuron even if the relevant connection weight is very small.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 11 External Interpretation Issues From the perspective of the embedding application, we are concerned with the interpretation of input and output signals. These signals constitute the interface between the embedding application and its NN component. Often, these signals only become meaningful when we define an external interpretation for them. This is analogous to biological neural systems: The same signal becomes completely different meaning when it is interpreted by different brain areas (motor cortex, visual cortex etc.).

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 12 External Interpretation Issues Without any interpretation, we can only use standard methods to define the difference (or similarity) between signals. For example, for binary patterns x and y, we could… … treat them as binary numbers and compute their difference as | x – y | … treat them as binary numbers and compute their difference as | x – y | … treat them as vectors and use the cosine of the angle between them as a measure of similarity … treat them as vectors and use the cosine of the angle between them as a measure of similarity … count the numbers of digits that we would have to flip in order to transform x into y (Hamming distance) … count the numbers of digits that we would have to flip in order to transform x into y (Hamming distance)

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 13 External Interpretation Issues Example: Two binary patterns x and y: x = y = These patterns seem to be very different from each other. However, given their external interpretation… yx …x and y actually represent the same thing.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 14 Creating Data Representations The patterns that can be represented by an ANN most easily are binary patterns. Even analog networks “like” to receive and produce binary patterns – we can simply round values < 0.5 to 0 and values  0.5 to 1. To create a binary input vector, we can simply list all features that are relevant to the current task. Each component of our binary vector indicates whether one particular feature is present (1) or absent (0).

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 15 Creating Data Representations With regard to output patterns, most binary-data applications perform classification of their inputs. The output of such a network indicates to which class of patterns the current input belongs. Usually, each output neuron is associated with one class of patterns. As you already know, for any input, only one output neuron should be active (1) and the others inactive (0), indicating the class of the current input.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 16 Creating Data Representations In other cases, classes are not mutually exclusive, and more than one output neuron can be active at the same time. Another variant would be the use of binary input patterns and analog output patterns for “classification”. In that case, again, each output neuron corresponds to one particular class, and its activation indicates the probability (between 0 and 1) that the current input belongs to that class.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 17 Creating Data Representations Tertiary (and n-ary) patterns can cause more problems than binary patterns when we want to format them for an ANN. For example, imagine the tic-tac-toe game. Each square of the board is in one of three different states: occupied by an X, occupied by an X, occupied by an O, occupied by an O, empty empty

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 18 Creating Data Representations Let us now assume that we want to develop a network that plays tic-tac-toe. This network is supposed to receive the current game configuration as its input. Its output is the position where the network wants to place its next symbol (X or O). Obviously, it is impossible to represent the state of each square by a single binary value.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 19 Creating Data Representations Possible solution: Use multiple binary inputs to represent non-binary states.Use multiple binary inputs to represent non-binary states. Treat each feature in the pattern as an individual subpattern.Treat each feature in the pattern as an individual subpattern. Represent each subpattern with as many positions (units) in the pattern vector as there are possible states for the feature.Represent each subpattern with as many positions (units) in the pattern vector as there are possible states for the feature. Then concatenate all subpatterns into one long pattern vector.Then concatenate all subpatterns into one long pattern vector.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 20 Creating Data Representations Example: X is represented by the subpattern 100X is represented by the subpattern 100 O is represented by the subpattern 010O is represented by the subpattern 010 is represented by the subpattern 001 is represented by the subpattern 001 The squares of the game board are enumerated as follows:The squares of the game board are enumerated as follows:

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 21 Creating Data Representations Then consider the following board configuration: XX OOX O It would be represented by the following binary string: Consequently, our network would need a layer of 27 input units.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 22 Creating Data Representations And what would the output layer look like? Well, applying the same principle as for the input, we would use nine units to represent the 9-ary output possibilities. Considering the same enumeration scheme: Our output layer would have nine neurons, one for each position. To place a symbol in a particular square, the corresponding neuron, and no other neuron, would fire (1)

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 23 Creating Data Representations But… Would it not lead to a smaller, simpler network if we used a shorter encoding of the non-binary states? We do not need 3-digit strings such as 100, 010, and 001, to represent X, O, and the empty square, respectively. We can achieve a unique representation with 2-digits strings such as 10, 01, and 00.

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 24 Creating Data Representations Similarly, instead of nine output units, four would suffice, using the following output patterns to indicate a square:

October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 25 Creating Data Representations The problem with such representations is that the meaning of the output of one neuron depends on the output of other neurons. This means that each neuron does not represent (detect) a certain feature, but groups of neurons do. In general, such functions are much more difficult to learn. Such networks usually need more hidden neurons and longer training, and their ability to generalize is weaker than for the one-neuron-per-feature-value networks.