Presentation is loading. Please wait.

Presentation is loading. Please wait.

IE 585 History of Neural Networks & Introduction to Simple Learning Rules.

Similar presentations


Presentation on theme: "IE 585 History of Neural Networks & Introduction to Simple Learning Rules."— Presentation transcript:

1 IE 585 History of Neural Networks & Introduction to Simple Learning Rules

2 2 Elements of Neural Networks

3 3 Types of Transfer Function Linear Function Identity Function Transfer Function: y =  (wx) Piecewise-Linear Function Transfer Function: if  (wx)≥ 1/2θ, y = 1 |  (wx)|< 1/2θ, y = θ*  (wx)+1/2  (wx)≤ - 1/2θ, y = 0  (wx) y y 1 -1/2θ1/2θ  (wx)

4 4 Types of Transfer Function Step Function Binary Transfer Function: if  (wx)≥0, y= 1  (wx)<0, y= 0 Bipolar Transfer Function: if  (wx)≥0, y= 1  (wx)<0, y= -1 y y 1 1 0  (wx)

5 5 Types of Transfer Function Sigmoid Function Binary Transfer Function: y=1/(1+exp(-  (wx))) Bipolar Transfer Function: y= (1-exp(-  (wx))) / (1+exp(-  (wx))) y y 1 1 0  (wx)

6 6 -+ + Linear Separability “OR” example x 1 x 2 t 111 -111 1-11 -1-1-1 + -- -+ “AND” example x 1 x 2 t 111 -11-1 1-1-1 -1-1-1

7 7 Why Needs Biases?

8 8 Foundations 1943 - McCulloch (neurobiologist) & Pitts (statistician) –described a model of a biological neuron –all or nothing activation (0,1) –threshold for activation –fixed structure –delay in transmitting signals –no learning

9 9 McCulloch-Pitts Neuron Example

10 10 Donald Hebb The Organization of Behavior, 1949 –first learning rule –information stored in synapse weights –weights learn in proportion to the activation of the neuron –weights are symmetric

11 11 Donald Hebb " When the axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." [D.O. Hebb, The Organization of Behavior] In other words, in a neural net, the connections between neurons get larger weights if they are repeatedly used during training.

12 12 Hebb Training Algorithm 1.Set all w = 0 2.  =learning rate (0 <  ≤ 1), x=input, y=output, t=target (unsupervised) (supervised) 3.Train the net for 1 epoch (1 epoch: 1 pass through the training set)

13 13 Hebb Net Example x1x1 x2x2 B y Transfer Function: if  (wx)  0, y= 1  (wx)<0, y=-1 “AND” example x 1 x 2 t 111 -11-1 1-1-1 -1-1-1 w1w1 w2w2 wBwB

14 14 Transfer Function: if  (wx)  0, y= 1  (wx)<0, y=-1 TRAINING x 1 x 2 tw 1 w 2 w B y  w 1  w 2  w B 1110001111 -11-111111-1-1 1-1-12001-11-1 -1-1-111-1-111-1 22-2  =1, supervised mode, substituting t for y Final Weights  w = w new - w old

15 15 - - - + net=2x 1 +2x 2 -2 x 2 =-x 1 +1 Linearly Separable

16 16 Frank Rosenblatt The perceptron, 1958 –model of biological vision –self organized and supervised –first model simulated on a computer (at Cornell)

17 17 The Perceptron

18 18 Perceptron Learning Rule Convergence Theorem If weights exist to allow the net to respond correctly to all training patterns, then the rule’s procedure for adjusting the weights will eventually find values in a finite number of training steps such that the net does respond correctly to all training patterns.

19 19 Perceptron Training 1.Set all w = 0 2.  =learning rate (0 <  ≤ 1), x=input, y=output, t=target if 3.If no weights change in one epoch, stop (1 epoch: 1 pass through the training set)

20 20 Perceptron Example x1x1 x2x2 B y Transfer Function: if  (wx)≥0, y= 1  (wx)<0, y= -1 “AND” example x 1 x 2 t 111 1-11 -111 -1-1-1

21 21 Transfer Function: if  (wx) ≥ 0, y= 1  (wx) < 0, y= -1 TRAINING x 1 x 2 tw 1 w 2 w B y  w 1  w 2  w B -1-1-1000111-1 -11111-1-1-111 1-11020-11-11 1111111000  =1 Final Weights  w = w new - w old

22 22 - + + + net=x 1 +x 2 +1 x 2 =-x 1 -1 Linearly Separable

23 23 Widrow / Hoff “Adaptive Switching Circuits”, 1960 –electrical engineers –hardware implementations –the ADALINE (ADAptive Linear Neuron) –least mean squares error training (LMS) –Delta rule

24 24 Widrow - Hoff Discovery of the "LMS" or "Widrow-Hoff" Learning Algorithm The first doctoral student that I had [at Stanford] was a man named Ted Hoff.... the two of us began talking. I was telling Ted about research.... One day I had a session with him, and out of this session came the LMS [least mean squares] algorithm.

25 25 LMS Learning The connections between neurons change in proportion to the error between the target and the output, which is the net for adalines.

26 26 Derivation of Learning Rule

27 27 LMS Training 1.Set all w to small random numbers 2.  =learning rate (0 <  ≤ 1), x=input, y=output, t=target adjust w in the direction of max error decreases 3.iterate until Δw’s are very small

28 28 Rule of Thumb for  0.1 ≤ n  ≤ 1.0n: number of inputs If  is too large  won’t converge If  is too small  learning process is too slow

29 29 LMS Example

30 30 The Party’s Over Minsky and Papert’s Perceptrons, 1969 –looked at simple perceptrons –can only solve linearly separable problems (no XOR) –thought multi layer nets would be similarly useless –funding for neural net research essentially stops

31 31 -+ + - Not Linearly Separable - XOR “XOR” example x 1 x 2 t 11-1 -111 1-11 -1-1-1

32 32 Monks Working Through the Dark Age Stephen Grossberg (with Gail Carpenter, often), BU - adaptive resonance theory (ART), sigmoid transfer function Teuvo Kohonen, Finland - self organizing map John Hopfield, CalTech - Hopfield network for optimization and pattern recall

33 33 Things Get Rolling Again Rumelhart & McClelland, The PDP Group, Parallel Distributed Processing, 1986 –popularized multi layered perceptrons trained by backpropagation DARPA (Defense Advanced Research Projects Agency), 1988, study on most promising applications of neural nets - funding starts up in a big way


Download ppt "IE 585 History of Neural Networks & Introduction to Simple Learning Rules."

Similar presentations


Ads by Google