Download presentation
Presentation is loading. Please wait.
1
LOGO Classification III Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn
2
Overview Artificial Neural Networks 2
3
Biological Motivation 3 10 11 : The number of neurons in the human brain 10 4 : The average number of connections of each neuron 10 -3 : The fastest switching times of neurons 10 -10 : The switching speeds of computers 10 -1 : The time required to visually recognize your mother
4
Biological Motivation The power of parallelism The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons. The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations. Sequential machines vs. Parallel machines Group A Using ANN to study and model biological learning processes. Group B Obtaining highly effective machine learning algorithms, regardless of how closely these algorithms mimic biological processes. 4
5
Neural Network Representations 5
6
Robot vs. Human 6
7
Perceptrons 7 ∑ x1x1 x2x2 xnxn w1w1 w2w2 wnwn...... w0w0 x 0 =1
8
Power of Perceptrons 8 -0.8 0.5 -0.3 0.5 Input ∑ Output 00-0.80 01-0.30 10 0 110.31 Input ∑ Output 00-0.30 010.21 10 1 110.71 ANDOR
9
Error Surface 9 Error w1w1 w2w2
10
Gradient Descent 10 Learning Rate Batch Learning
11
Delta Rule 11
12
Batch Learning GRADIENT_DESCENT (training_examples, η) Initialize each w i to some small random value. Until the termination condition is met, Do Initialize each Δw i to zero. For each in training_examples, Do Input the instance x to the unit and compute the output o For each linear unit weight w i, Do –Δw i ← Δw i + η(t-o)x i For each linear unit weight w i, Do w i ← w i + Δw i 12
13
Stochastic Learning 13 For example, if x i =0.8, η=0.1, t=1 and o=0 Δw i = η(t-o)x i = 0.1 × (1-0) × 0.8 = 0.08 + - + + + - - - + + - -
14
Stochastic Learning: NAND 14 Input Tar get Initial Weights Output ErrorCorrection Final Weights IndividualSum Final Output x0x0 x1x1 x2x2 tw0w0 w1w1 w2w2 C0C0 C1C1 C2C2 SoERw0w0 w1w1 w2w2 x 0 · w 0 x 1 · w 1 x 2 · w 2 C0+C1+C2C0+C1+C2 t-oLR x E 1001000000001+0.10.100 1011 00 00 01+0.10.200.1 11010.200.10.200 01+0.10.30.1 11100.30.1 0.30.1 0.50000.30.1 10010.30.1 0.300 01+0.10.40.1 10110.40.1 0.400.10.501+0.10.50.10.2 11010.50.10.20.50.100.61000.50.10.2 11100.50.10.20.50.10.20.81-0.10.400.1 10010.400.10.400 01+0.10.500.1................. 1101 0.8-.2-.10.8-.200.61000.8-.2-.1 threshold=0.5 learning rate=0.1
15
Multilayer Perceptron 15
16
XOR 16 - - InputOutput 000 011 101 110 Cannot be separated by a single line. + + p q
17
XOR 17 OR NAND AND OR NAND - + + - p q pq
18
Hidden Layer Representations 18 pq ORNAND AND 00010 01111 10111 11100 InputHiddenOutput -++ - AND OR NAND
19
The Sigmoid Threshold Unit 19 ∑ x1x1 x2x2 xnxn w1w1 w2w2 wnwn...... w0w0 x 0 =1
20
Backpropagation Rule 20 x ji = the i th input to unit j w ji = the weight associated with the i th input to unit j net j = ∑w ji x ji (the weighted sum of inputs for unit j ) o j = the output of unit j t j = the target output of unit j σ = the sigmoid function outputs = the set of units in the final layer Downstream (j ) = the set of units directly taking the output of unit j as inputs j i
21
Training Rule for Output Units 21
22
Training Rule for Hidden Units 22 k j
23
BP Framework BACKPROPAGATION (training_examples, η, n in, n out, n hidden ) Create a network with n in inputs, n hidden hidden units and n out output units. Initialize all network weights to small random numbers. Until the termination condition is met, Do For each in training_examples, Do Input the instance x to the network and computer the output o of every unit. For each output unit k, calculate its error term δ k For each hidden unit h, calculate its error term δ h Update each network weight w ji 23
24
More about BP Networks … Convergence and Local Minima The search space is likely to be highly multimodal. May easily get stuck at a local solution. Need multiple trials with different initial weights. Evolving Neural Networks Black-box optimization techniques (e.g., Genetic Algorithms) Usually better accuracy Can do some advanced training (e.g., structure + parameter). Xin Yao (1999) “Evolving Artificial Neural Networks”, Proceedings of the IEEE, pp. 1423-1447. Representational Power Deep Learning 24
25
More about BP Networks … Overfitting Tend to occur during later iterations. Use validation dataset to terminate the training when necessary. Practical Considerations Momentum Adaptive learning rate Small: slow convergence, easy to get stuck Large: fast convergence, unstable 25 Time Error Training Validation Weight Error
26
Beyond BP Networks 26 Elman Network XOR 0 1 1 0 0 0 1 1 0 1 0 1 … ? 1 ? ? 0 ? ? 0 ? ? 1 ? … In Out
27
Beyond BP Networks 27 Hopfield NetworkEnergy Landscape of Hopfield Network
28
Beyond BP Networks 28
29
When does ANN work? Instances are represented by attribute-value pairs. Input values can be any real values. The target output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes. The training samples may contain errors. Long training times are acceptable. Can range from a few seconds to several hours. Fast evaluation of the learned target function may be required. The ability to understand the learned function is not important. Weights are difficult for humans to interpret. 29
30
Reading Materials Text Book Richard O. Duda et al., Pattern Classification, Chapter 6, John Wiley & Sons Inc. Tom Mitchell, Machine Learning, Chapter 4, McGraw-Hill. http://page.mi.fu-berlin.de/rojas/neural/index.html.html Online Demo http://neuron.eng.wayne.edu/software.html http://www.cbu.edu/~pong/ai/hopfield/hopfield.html Online Tutorial http://www.autonlab.org/tutorials/neural13.pdf http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html Wikipedia & Google 30
31
Review What is the biological motivation of ANN? When does ANN work? What is a perceptron? How to train a perceptron? What is the limitation of perceptrons? How does ANN solve non-linearly separable problems? What is the key idea of Backpropogation algorithm? What are the main issues of BP networks? What are the examples of other types of ANN? 31
32
Next Week’s Class Talk Volunteers are required for next week’s class talk. Topic 1: Applications of ANN Topic 2: Recurrent Neural Networks Hints: Robot Driving Character Recognition Face Recognition Hopfield Network Length: 20 minutes plus question time 32
33
Assignment Topic: Training Feedforward Neural Networks Technique: BP Algorithm Task 1: XOR Problem 4 input samples 0 0 0 1 0 1 Task 2: Identity Function 8 input samples 10000000 10000000 00010000 00010000 … Use 3 hidden units Deliverables: Report Code (any programming language, with detailed comments) Due: Sunday, 14 December Credit: 15% 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.