Download presentation
Presentation is loading. Please wait.
Published byDylan Rice Modified over 8 years ago
1
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any linear separation of the input space into a region that yields output 1, and another region that yields output 0. As we have seen, a two-dimensional input space can be divided by any straight line. A three-dimensional input space can be divided by any two-dimensional plane. In general, an n-dimensional input space can be divided by an (n-1)-dimensional plane or hyperplane. Of course, for n > 3 this is hard to visualize.
2
November 20, 2014Computer Vision Lecture 19: Object Recognition III 2 Capabilities of Threshold Neurons What do we do if we need a more complex function? We can combine multiple artificial neurons to form networks with increased capabilities. For example, we can build a two-layer network with any number of neurons in the first layer giving input to a single neuron in the second layer. The neuron in the second layer could, for example, implement an AND function.
3
November 20, 2014Computer Vision Lecture 19: Object Recognition III 3 Capabilities of Threshold Neurons What kind of function can such a network realize? x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2... xixixixi
4
November 20, 2014Computer Vision Lecture 19: Object Recognition III 4 Capabilities of Threshold Neurons Assume that the dotted lines in the diagram represent the input-dividing lines implemented by the neurons in the first layer: 1 st comp. 2 nd comp. Then, for example, the second-layer neuron could output 1 if the input is within a polygon, and 0 otherwise.
5
November 20, 2014Computer Vision Lecture 19: Object Recognition III 5 Capabilities of Threshold Neurons However, we still may want to implement functions that are more complex than that. An obvious idea is to extend our network even further. Let us build a network that has three layers, with arbitrary numbers of neurons in the first and second layers and one neuron in the third layer. The first and second layers are completely connected, that is, each neuron in the first layer sends its output to every neuron in the second layer.
6
November 20, 2014Computer Vision Lecture 19: Object Recognition III 6 Capabilities of Threshold Neurons What type of function can a three-layer network realize? x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2... oioioioi...
7
November 20, 2014Computer Vision Lecture 19: Object Recognition III 7 Capabilities of Threshold Neurons Assume that the polygons in the diagram indicate the input regions for which each of the second-layer neurons yields output 1: 1 st comp. 2 nd comp. Then, for example, the third-layer neuron could output 1 if the input is within any of the polygons, and 0 otherwise.
8
November 20, 2014Computer Vision Lecture 19: Object Recognition III 8 Capabilities of Threshold Neurons The more neurons there are in the first layer, the more vertices can the polygons have. With a sufficient number of first-layer neurons, the polygons can approximate any given shape. The more neurons there are in the second layer, the more of these polygons can be combined to form the output function of the network. With a sufficient number of neurons and appropriate weight vectors w i, a three-layer network of threshold neurons can realize any (!) function R n {0, 1}.
9
November 20, 2014Computer Vision Lecture 19: Object Recognition III 9Terminology Usually, we draw neural networks in such a way that the input enters at the bottom and the output is generated at the top. Arrows indicate the direction of data flow. The first layer, termed input layer, just contains the input vector and does not perform any computations. The second layer, termed hidden layer, receives input from the input layer and sends its output to the output layer. After applying their activation function, the neurons in the output layer contain the output vector.
10
November 20, 2014Computer Vision Lecture 19: Object Recognition III 10Terminology Example: Network function f: R 3 {0, 1} 2 output layer hidden layer input layer input vector output vector
11
November 20, 2014Computer Vision Lecture 19: Object Recognition III 11 Sigmoidal Neurons Sigmoidal neurons accept any vectors of real numbers as input, and they output a real number between 0 and 1. Sigmoidal neurons are the most common type of artificial neuron, especially in learning networks. A network of sigmoidal units with m input neurons and n output neurons realizes a network function f: R m (0,1) n
12
November 20, 2014Computer Vision Lecture 19: Object Recognition III 12 Sigmoidal Neurons In backpropagation networks, we typically choose = 1 and = 0. 1 0 1 f i (net i (t)) net i (t) = 1 = 0.1
13
November 20, 2014Computer Vision Lecture 19: Object Recognition III 13 Sigmoidal Neurons This leads to a simplified form of the sigmoid function: We do not need a modifiable threshold , because we will use “dummy” (offset) inputs. The choice = 1 works well in most situations and results in a very simple derivative of S(net).
14
November 20, 2014Computer Vision Lecture 19: Object Recognition III 14 Sigmoidal Neurons This result will be very useful when we develop the backpropagation algorithm.
15
November 20, 2014Computer Vision Lecture 19: Object Recognition III 15 Feedback-Based Weight Adaptation Feedback from environment (possibly teacher) is used to improve the system’s performanceFeedback from environment (possibly teacher) is used to improve the system’s performance Synaptic weights are modified to reduce the system’s error in computing a desired functionSynaptic weights are modified to reduce the system’s error in computing a desired function For example, if increasing a specific weight increases error, then the weight is decreasedFor example, if increasing a specific weight increases error, then the weight is decreased Small adaptation steps are needed to find optimal set of weightsSmall adaptation steps are needed to find optimal set of weights Learning rate can vary during learning processLearning rate can vary during learning process Typical for supervised learningTypical for supervised learning
16
November 20, 2014Computer Vision Lecture 19: Object Recognition III 16 Evaluation of Networks Basic idea: define error function and measure error for untrained data (testing set)Basic idea: define error function and measure error for untrained data (testing set) Typical: where d is the desired output, and o is the actual output.Typical: where d is the desired output, and o is the actual output. For classification: E = number of misclassified samples/ total number of samplesFor classification: E = number of misclassified samples/ total number of samples
17
November 20, 2014Computer Vision Lecture 19: Object Recognition III 17 Gradient Descent Gradient descent is a very common technique to find the absolute minimum of a function. It is especially useful for high-dimensional functions. We will use it to iteratively minimizes the network’s (or neuron’s) error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction.
18
November 20, 2014Computer Vision Lecture 19: Object Recognition III 18 Gradient Descent Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x): f(x)x x0x0x0x0 slope: f’(x 0 ) x 1 = x 0 - f’(x 0 ) Repeat this iteratively until for some x i, f’(x i ) is sufficiently close to 0.
19
November 20, 2014Computer Vision Lecture 19: Object Recognition III 19 Gradient Descent Gradients of two-dimensional functions: The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient.
20
November 20, 2014Computer Vision Lecture 19: Object Recognition III 20 Multilayer Networks The backpropagation algorithm was popularized by Rumelhart, Hinton, and Williams (1986). This algorithm solved the “credit assignment” problem, i.e., crediting or blaming individual neurons across layers for particular outputs. The error at the output layer is propagated backwards to units at lower layers, so that the weights of all neurons can be adapted appropriately.
21
November 20, 2014Computer Vision Lecture 19: Object Recognition III 21Terminology Example: Network function f: R 3 R 2 output layer hidden layer input layer input vector output vector x1x1x1x1 x2x2x2x2 o2o2o2o2 o1o1o1o1 x3x3x3x3
22
November 20, 2014Computer Vision Lecture 19: Object Recognition III 22 Backpropagation Learning The goal of the backpropagation learning algorithm is to modify the network’s weights so that its output vector o p = (o p,1, o p,2, …, o p,K ) is as close as possible to the desired output vector d p = (d p,1, d p,2, …, d p,K ) for K output neurons and input patterns p = 1, …, P. The set of input-output pairs (exemplars) {(x p, d p ) | p = 1, …, P} constitutes the training set.
23
November 20, 2014Computer Vision Lecture 19: Object Recognition III 23 Backpropagation Learning We need a cumulative error function that is to be minimized: We can choose the mean square error (MSE), where the 1/P factor does not matter for minimizing error: where
24
November 20, 2014Computer Vision Lecture 19: Object Recognition III 24 Backpropagation Learning For input pattern p, the i-th input layer node holds x p,i. Net input to j-th node in hidden layer: Network error for p: Output of k-th node in output layer: Net input to k-th node in output layer: Output of j-th node in hidden layer:
25
November 20, 2014Computer Vision Lecture 19: Object Recognition III 25 Backpropagation Learning As E is a function of the network weights, we can use gradient descent to find those weights that result in minimal error. For individual weights in the hidden and output layers, we should move against the error gradient (omitting index p): Output layer: Derivative easy to calculate Hidden layer: Derivative difficult to calculate
26
November 20, 2014Computer Vision Lecture 19: Object Recognition III 26 Backpropagation Learning When computing the derivative with regard to w k,j (2,1), we can disregard any output units except o k : Remember that o k is obtained by applying the sigmoid function S to net k (2), which is computed by: Therefore, we need to apply the chain rule twice.
27
November 20, 2014Computer Vision Lecture 19: Object Recognition III 27 Backpropagation Learning Since We have: We know that: Which gives us:
28
November 20, 2014Computer Vision Lecture 19: Object Recognition III 28 Backpropagation Learning For the derivative with regard to w j,i (1,0), notice that E depends on it through net j (1), which influences each o k with k = 1, …, K: Using the chain rule of derivatives again:
29
November 20, 2014Computer Vision Lecture 19: Object Recognition III 29 Backpropagation Learning This gives us the following weight changes at the output layer: … and at the inner layer:
30
November 20, 2014Computer Vision Lecture 19: Object Recognition III 30 Backpropagation Learning As you surely remember from a few minutes ago: Then we can simplify the generalized error terms: And:
31
November 20, 2014Computer Vision Lecture 19: Object Recognition III 31 Backpropagation Learning The simplified error terms k and j use variables that are calculated in the feedforward phase of the network and can thus be calculated very efficiently. Now let us state the final equations again and reintroduce the subscript p for the p-th pattern:
32
November 20, 2014Computer Vision Lecture 19: Object Recognition III 32 Backpropagation Learning Algorithm Backpropagation; Start with randomly chosen weights; Start with randomly chosen weights; while MSE is above desired threshold and computational bounds are not exceeded, do while MSE is above desired threshold and computational bounds are not exceeded, do for each input pattern x p, 1 p P, for each input pattern x p, 1 p P, Compute hidden node inputs; Compute hidden node inputs; Compute hidden node outputs; Compute hidden node outputs; Compute inputs to the output nodes; Compute inputs to the output nodes; Compute the network outputs; Compute the network outputs; Compute the error between output and desired output; Compute the error between output and desired output; Modify the weights between hidden and output nodes; Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes; Modify the weights between input and hidden nodes; end-for end-for end-while. end-while.
33
November 20, 2014Computer Vision Lecture 19: Object Recognition III 33 Supervised Function Approximation There is a tradeoff between a network’s ability to precisely learn the given exemplars and its ability to generalize (i.e., inter- and extrapolate). This problem is similar to fitting a function to a given set of data points. Let us assume that you want to find a fitting function f:R R for a set of three data points. You try to do this with polynomials of degree one (a straight line), two, and nine.
34
November 20, 2014Computer Vision Lecture 19: Object Recognition III 34 Supervised Function Approximation Obviously, the polynomial of degree 2 provides the most plausible fit. f(x)x deg. 1 deg. 2 deg. 9
35
November 20, 2014Computer Vision Lecture 19: Object Recognition III 35 Supervised Function Approximation The same principle applies to ANNs: If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. Unfortunately, there are no known equations that could tell you the optimal size of your network for a given application; there are only heuristics.
36
November 20, 2014Computer Vision Lecture 19: Object Recognition III 36 Creating Data Representations The problem with some data representations is that the meaning of the output of one neuron depends on the output of other neurons. This means that each neuron does not represent (detect) a certain feature, but groups of neurons do. In general, such functions are much more difficult to learn. Such networks usually need more hidden neurons and longer training, and their ability to generalize is weaker than for the one-neuron-per-feature-value networks.
37
November 20, 2014Computer Vision Lecture 19: Object Recognition III 37 Creating Data Representations On the other hand, sets of orthogonal vectors (such as 100, 010, 001) representing individual features can be processed by the network more easily. This becomes clear when we consider that a neuron’s net input signal is computed as the inner product of the input and weight vectors. The geometric interpretation of these vectors shows that orthogonal vectors are especially easy to discriminate for a single neuron.
38
November 20, 2014Computer Vision Lecture 19: Object Recognition III 38 Creating Data Representations Another way of representing n-ary data in a neural network is using one neuron per feature, but scaling the (analog) value to indicate the degree to which a feature is present. Good examples: the brightness of a pixel in an input image the brightness of a pixel in an input image the distance between a robot and an obstacle the distance between a robot and an obstacle Poor examples: the letter (1 – 26) of a word the letter (1 – 26) of a word the type (1 – 6) of a chess piece the type (1 – 6) of a chess piece
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.