Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic 1 Neural Networks. Ming-Feng Yeh1-2 OUTLINES Neural Networks Cerebellar Model Articulation Controller (CMAC) Applications References C.L. Lin &

Similar presentations


Presentation on theme: "Topic 1 Neural Networks. Ming-Feng Yeh1-2 OUTLINES Neural Networks Cerebellar Model Articulation Controller (CMAC) Applications References C.L. Lin &"— Presentation transcript:

1 Topic 1 Neural Networks

2 Ming-Feng Yeh1-2 OUTLINES Neural Networks Cerebellar Model Articulation Controller (CMAC) Applications References C.L. Lin & H.W. Su, “Intelligent control theory in guidance and control system design: an overview,” Proc. Natl. Sci. Counc. ROC (A), pp. 15-30

3 Ming-Feng Yeh1-3 1. Neural Networks As you read these words you are using a complex biological neural network. You have a highly interconnected set of 10 11 neurons to facilitate your reading, breathing, motion and thinking. In the artificial neural network, the neurons are not biological. They are extremely simple abstractions of biological neurons, realized as elements in a program or perhaps as circuits made of silicon.

4 Ming-Feng Yeh1-4 Biological Inspiration Human brain consists of a large number (about 10 11 ) of highly interconnected elements (about 10 4 connections per element) called neurons. Three principle components are the dendrites, the cell body and the axon. The point of contact is called a synapse.

5 Ming-Feng Yeh1-5 Biological Neurons Dendrites( 樹突 ): carry electrical into the cell body Cell Body( 細胞體 ): sums and thresholds these incoming signals Axon( 軸突 ): carry the signal from the cell body out to other neurons Synapse( 突觸 ): contact between an axon of one cell and a dendrites of another cell

6 Ming-Feng Yeh1-6 Neural Networks Neural Networks: a promising new generation of information processing systems, usually operate in parallel, that demonstrate the ability to learn, recall, and generalize from training patterns or data. Artificial neural networks are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning.

7 Ming-Feng Yeh1-7 Basic Model ~ 1 A neural network is composed of four pieces: nodes, connections between the nodes, nodal functions, and a learning rule for updating the information in the network. node y output s1s1 s2s2 snsn input y = f(s 1,s 2,…,s n ) …

8 Ming-Feng Yeh1-8 Basic Model ~ 2  Nodes: a number of nodes, each an elementary processor (EP) is required.  Connectivity: This can be represented by a matrix that shows the connections between the nodes. The number of nodes plus the connectivity define the topology of the network. In the human brain, all neurons are connected to about 10 4 other neurons. Artificial nets can range from totally connected to a topology where each node is just connected to its nearest neighbors.

9 Ming-Feng Yeh1-9 Basic Model ~ 3 Elementary processor functions: A node has inputs s 1,…, s n and an output y, and the node generates the output y as a function of the inputs. A learning rule: There are two types of learning: Supervised learning: you have to teach the networks the “answers.” Unsupervised learning: the network figures out the answers on its own. All the learning rules try to embed information by sampling the environment.

10 Ming-Feng Yeh1-10 Perceptron Model Suppose we have a two class problem. If we can separate these classes with a straight line (decision surface), then they are separable. The question is, how can we find the best line, and what do we mean by “best.” In n dimensions, we have a hyperplane separating the classes. These are all decision surfaces. Another problem is that you may need more than one line to separate the classes.

11 Ming-Feng Yeh1-11 Decision Surfaces x x x x o o o Linearly separable classes o o o o o o o x x x x x x x x Multi-line decision surface

12 Ming-Feng Yeh1-12 Single Layer Perceptron Model x i : inputs to the node; y: output; w i : weights;  : threshold value. The output y can be expressed as: The function f is called the nodal (transfer) function and is not the same in every application x1x1 xnxn w1w1 wnwn inputs output f(x)

13 Ming-Feng Yeh1-13 Nodal Function 1 1 1 Hard-limiterThreshold function Sigmoid function

14 Ming-Feng Yeh1-14 Single Layer Perceptron Model Two-input case: w 1 x 1 + w 2 x 2   = 0 If we use the hard limiter, then we could say that if the output of the function is a 1, the input vector belongs to class A. If the output is a –1, the input vector belongs to class B. XOR: caused the field of neural networks to lose credibility in the 1940’s. The perceptron model could not draw a line to separate the two classes given by the exclusive-OR.

15 Ming-Feng Yeh1-15 Exclusive OR problem x o (1,1) (1,0) (0,1) (0,0) ? ? o x

16 Ming-Feng Yeh1-16 Two-layer Perceptron Model The outputs from the two hidden nodes are The network output is x1x1 x2x2 w 11 w 12 w 21 w 22 y1y1 y2y2 w’ 1 w’ 2 z

17 Ming-Feng Yeh1-17 Exclusive-XOR problem 0.5 1.5 input units hidden unit output unit +1 -2 f g xy input patterns output patterns 00  0 01  1 10  1 11  0

18 Ming-Feng Yeh1-18 Exclusive-XOR problem g = sgn (1·x + 1·y  1.5) f = sgn (1·x + 1·y  2g  0.5) input (0,0)  g=0  f=0 input (0,1)  g=0  f=1 input (1,0)  g=0  f=1 input (1,1)  g=1  f=0

19 Ming-Feng Yeh1-19 Multilayer Network input patterns internal representation units output patterns i j k w ji w kj

20 Ming-Feng Yeh1-20 Weight Adjustment Adjust weights by: w ji (l+1) = w ji (l) +  w ji where w ji (l) is the weight from unit i to unit j at time l (or the lth iteration) and  w ji is the weight adjustment. The weight change may be computed by the delta rule:  w ji =   j i i where  is a trial-independent learning rate and  j is the error at unit j:  j = t j  o j where t j is the desired output and o j is the actual output at output unit j. Repeat iterations until convergence.

21 Ming-Feng Yeh1-21 Generalized Delta Rule : the target output for jth component of the output pattern for pattern p. : the jth element of the actual output pattern produced by the presentation of input pattern p. : the value of the ith element of the input pattern. : is the change to be made to the weight from the ith to the jth unit following presentation of pattern p.

22 Ming-Feng Yeh1-22 Delta Rule and Gradient Descent : the error on input/output pattern p : the overall measure of the error. We wish to show that the delta rule implements a gradient descent in E when units are linear. We will proceed by simply showing that which is proportional to as prescribed by the delta rule.

23 Ming-Feng Yeh1-23 When there are no hidden units it is easy to compute the relevant derivative. For this purpose we use the chain rule to write the derivative as the product of two parts: the derivative of the error with respect to the output of the unit times the derivative of the output with respect to the weight. The first part tells how the error changes with the output of the jth unit and the second part tells how much changing w ji changes that output. Delta Rule & Gradient Descent

24 Ming-Feng Yeh1-24 The contribution of unit j to the error is simply proportional to  pj. Since we have linear units,. From which we conclude that. Thus, we have Delta Rule & Gradient Descent no hidden units

25 Ming-Feng Yeh1-25 Combining this with the observation that should lead us to conclude that the net change in w ji after one complete cycle of pattern presentations is proportional to this derivative and hence that the delta rule implements a gradient descent in E. In fact, this is strictly true only if the values of the weights are not changed during this cycle. Delta Rule and Gradient Descent

26 Ming-Feng Yeh1-26 Delta Rule for Activation Functions in Feedforward Networks The standard delta rule essentially implements gradient descent in sum-squared error for linear activation functions. Without hidden units, the error surface is shaped like a bowl with only one minimum, so gradient descent is guaranteed to find the best set of weights. With hidden units, however, it is not so obvious how to compute the derivatives, and the error surface is not concave upwards, so there is the danger of getting stuck in local minimum.

27 Ming-Feng Yeh1-27 Delta Rule for Semilinear Activation Functions in Feedforward Networks The main theoretical contribution is to show that there is an efficient way of computing the derivatives. The main empirical contribution is to show that the apparently fatal problem of local minima is irrelevant in a wide variety of learning tasks. A semilinear activation function is one in which the output of a unit is a non-decreasing and differentiable function of the net total input, where o i =i i if unit i is an input unit.

28 Ming-Feng Yeh1-28 Delta Rule for Semilinear Activation Functions in Feedforward Networks Thus, a semilinear activation function is one in which and f is differentiable and non-decreasing. To get the correct generalization of the delta rule, we must set where E is the same sum-squared error function defined earlier.

29 Ming-Feng Yeh1-29 Delta Rule for Semilinear Activation Functions in Feedforward Networks As in the standard delta rule, it is to see this derivative as resulting from the product of two parts: One part reflecting the change in error as a function of the change in the net input to the unit and one part representing the effect of changing a particular weight on the net input. The second factor is

30 Ming-Feng Yeh1-30 Delta Rule for Semilinear Activation Functions in Feedforward Networks Define Thus, This says that to implement gradient descent in E we should make our weight changes according to just as in the standard delta rule. The trick is to figure out what  pj should be for each unit u j in the network.

31 Ming-Feng Yeh1-31 Delta Rule for Semilinear Activation Functions in Feedforward Networks Compute The second factor: which is simply the derivative of the function f j for the jth unit, evaluated at the net input net pj to that unit. Note: To compute the first factor, we consider two cases.

32 Ming-Feng Yeh1-32 Delta Rule for Semilinear Activation Functions in Feedforward Networks First, assume that unit u j is an output unit of the network. In this case, it follows from the definition of E p that Thus, for any output unit u j.

33 Ming-Feng Yeh1-33 Delta Rule for Semilinear Activation Functions in Feedforward Networks If u j is not an output unit we use the chain rule to write Thus, whenever u j is not an output unit.

34 Ming-Feng Yeh1-34 Delta Rule for Semilinear Activation Functions in Feedforward Networks If u j is an output unit: If u j is not an output unit: The above two equations give a recursive procedure for computing the  ’s for all units in the network, which are then used to compute the weight changes in the network.

35 Ming-Feng Yeh1-35 Delta Rule for Semilinear Activation Functions in Feedforward Networks The application of the generalized delta rule, thus, involves two phases. During the first phase the input is presented and propagated forward through the network to compute the output value o pj for each unit. This output is then compared with the targets, resulting in an error signal  pj for each output unit.

36 Ming-Feng Yeh1-36 Delta Rule for Semilinear Activation Functions in Feedforward Networks The second phase involves a backward pass through the network (analogous to the initial forward pass) during which the error signal is passed to each unit in the network and the appropriate weight changes are made.

37 Ming-Feng Yeh1-37 Ex: Function Approximation 1-2-1 Network +  t e p

38 Ming-Feng Yeh1-38 Network Architecture 1-2-1 Network a p

39 Ming-Feng Yeh1-39 Initial Values Initial Network Response:

40 Ming-Feng Yeh1-40 Forward Propagation Initial input: Output of the 1st layer: Output of the 2nd layer: error:

41 Ming-Feng Yeh1-41 Transfer Func. Derivatives

42 Ming-Feng Yeh1-42 Backpropagation The second layer sensitivity: The first layer sensitivity:

43 Ming-Feng Yeh1-43 Weight Update Learning rate

44 Ming-Feng Yeh1-44 Choice of Network Structure Multilayer networks can be used to approximate almost any function, if we have enough neurons in the hidden layers. We cannot say, in general, how many layers or how many neurons are necessary for adequate performance.

45 Ming-Feng Yeh1-45 Illustrated Example 1 1-3-1 Network

46 Ming-Feng Yeh1-46 Illustrated Example 2 1-5-1 1-2-11-3-1 1-4-1

47 Ming-Feng Yeh1-47 Convergence 1 2 3 4 5 0 1 2 3 4 5 0 Convergence to Global Min. Convergence to Local Min. The numbers to each curve indicate the sequence of iterations.


Download ppt "Topic 1 Neural Networks. Ming-Feng Yeh1-2 OUTLINES Neural Networks Cerebellar Model Articulation Controller (CMAC) Applications References C.L. Lin &"

Similar presentations


Ads by Google