Download presentation
Presentation is loading. Please wait.
1
CS 182 Sections 101 - 102 Eva Mok (emok@icsi.berkeley.edu) Feb 11, 2004 (http://www2.hi.net/s4/strangebreed.htm) bad puns alert!
2
Announcements a3 part 1 is due tonight (submit as a3-1) The second tester file is up, so pls. start part 2. The quiz is graded (get it after class).
3
Where we stand Last Week –Backprop This Week –Recruitment learning –color Coming up –Imagining techniques (e.g. fMRI)
4
The Big (and complicated) Picture Cognition and Language Computation Structured Connectionism Computational Neurobiology Biology MidtermQuiz Finals Neural Development Triangle Nodes Neural Net & Learning Spatial Relation Motor Control Metaphor SHRUTI Grammar abstraction Regier Model Bailey Model Narayanan Model Chang Model Visual System Psycholinguistics Experiments
5
Quiz 1.What is a localist representation? What is a distributed representation? Why are they both bad? 2.What is coarse-fine encoding? Where is it used in our brain? 3.What can Back-Propagation do that Hebb’s Rule can’t? 4.Derive the Back-Propagation Algorithm 5.What (intuitively) does the learning rate do? How about the momentum term?
6
Distributed vs Localist Rep’n John 1100 Paul 0110 George 0011 Ringo 1001 John 1000 Paul 0100 George 0010 Ringo 0001 What are the drawbacks of each representation?
7
Distributed vs Localist Rep’n What happens if you want to represent a group? How many persons can you represent with n bits? 2^n What happens if one neuron dies? How many persons can you represent with n bits? n John 1100 Paul 0110 George 0011 Ringo 1001 John 1000 Paul 0100 George 0010 Ringo 0001
8
Visual System 1000 x 1000 visual map For each location, encode: –orientation –direction of motion –speed –size –color –depth Blows up combinatorically! … …
9
Coarse Coding info you can encode with one fine resolution unit = info you can encode with a few coarse resolution units Now as long as we need fewer coarse units total, we’re good
10
Coarse-Fine Coding but we can run into ghost “images” Feature 2 e.g. Direction of Motion Feature 1 e.g. Orientation Y X G G Y-Orientation X-Orientation Y-DirX-Dir Coarse in F2, Fine in F1 Coarse in F1, Fine in F2
11
Back-Propagation Algorithm We define the error term for a single node to be t i - y i xixi f yjyj w ij yiyi x i = ∑ j w ij y j y i = f(x i ) t i :target Sigmoid:
12
Gradient Descent i2i2 i1i1 global mimimum: this is your goal it should be 4-D (3 weights) but you get the idea
13
kji w jk w ij E = Error = ½ ∑ i (t i – y i ) 2 yiyi t i : target The derivative of the sigmoid is just The output layer learning rate
14
kji w jk w ij E = Error = ½ ∑ i (t i – y i ) 2 yiyi t i : target The hidden layer
15
Let’s just do an example E = Error = ½ ∑ i (t i – y i ) 2 x0x0 f i1i1 w 01 y0y0 i2i2 b=1 w 02 w 0b E = ½ (t 0 – y 0 ) 2 i1i1 i2i2 y0y0 000 011 101 111 0.8 0.6 0.5 0 0 0.6224 0.5 1/(1+e^-0.5) E = ½ (0 – 0.6224) 2 = 0.1937 0 0 learning rate suppose = 0.5 0.4268
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.