Download presentation
Presentation is loading. Please wait.
1
Neural Networks
2
Today’s Class Neural Networks The Perceptron Model
The Multi-layer Perceptron (MLP) Forward-pass in an MLP (Inference) Backward-pass in an MLP (Backpropagation)
3
Perceptron Model Frank Rosenblatt (1957) - Cornell University
Activation function 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑓 𝑥 = 1, if 𝑖=0 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏>0 &0, otherwise More:
4
!? Perceptron Model Frank Rosenblatt (1957) - Cornell University 𝑥 1
𝑥 2 𝑥 3 𝑥 4 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑓 𝑥 = 1, if 𝑖=0 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏>0 &0, otherwise More:
5
Perceptron Model Frank Rosenblatt (1957) - Cornell University
Activation function 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑓 𝑥 = 1, if 𝑖=0 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏>0 &0, otherwise More:
6
Activation Functions Step(x) Sigmoid(x) Tanh(x) ReLU(x) = max(0, x)
7
Two-layer Multi-layer Perceptron (MLP)
”hidden" layer Loss / Criterion 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4
8
Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑔 𝑐 = 𝑤 𝑐1 𝑥 𝑖1 + 𝑤 𝑐2 𝑥 𝑖2 + 𝑤 𝑐3 𝑥 𝑖3 + 𝑤 𝑐4 𝑥 𝑖4 + 𝑏 𝑐 𝑔 𝑑 = 𝑤 𝑑1 𝑥 𝑖1 + 𝑤 𝑑2 𝑥 𝑖2 + 𝑤 𝑑3 𝑥 𝑖3 + 𝑤 𝑑4 𝑥 𝑖4 + 𝑏 𝑑 𝑔 𝑏 = 𝑤 𝑏1 𝑥 𝑖1 + 𝑤 𝑏2 𝑥 𝑖2 + 𝑤 𝑏3 𝑥 𝑖3 + 𝑤 𝑏4 𝑥 𝑖4 + 𝑏 𝑏 𝑓 𝑐 = 𝑒 𝑔 𝑐 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑑 = 𝑒 𝑔 𝑑 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑏 = 𝑒 𝑔 𝑏 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 )
9
Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑤= 𝑤 𝑐1 𝑤 𝑐2 𝑤 𝑐3 𝑤 𝑐4 𝑤 𝑑1 𝑤 𝑑2 𝑤 𝑑3 𝑤 𝑑4 𝑤 𝑏1 𝑤 𝑏2 𝑤 𝑏3 𝑤 𝑏4 𝑔 𝑐 = 𝑤 𝑐1 𝑥 𝑖1 + 𝑤 𝑐2 𝑥 𝑖2 + 𝑤 𝑐3 𝑥 𝑖3 + 𝑤 𝑐4 𝑥 𝑖4 + 𝑏 𝑐 𝑔 𝑑 = 𝑤 𝑑1 𝑥 𝑖1 + 𝑤 𝑑2 𝑥 𝑖2 + 𝑤 𝑑3 𝑥 𝑖3 + 𝑤 𝑑4 𝑥 𝑖4 + 𝑏 𝑑 𝑔 𝑏 = 𝑤 𝑏1 𝑥 𝑖1 + 𝑤 𝑏2 𝑥 𝑖2 + 𝑤 𝑏3 𝑥 𝑖3 + 𝑤 𝑏4 𝑥 𝑖4 + 𝑏 𝑏 𝑏= 𝑏 𝑐 𝑏 𝑑 𝑏 𝑏 𝑓 𝑐 = 𝑒 𝑔 𝑐 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑑 = 𝑒 𝑔 𝑑 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑏 = 𝑒 𝑔 𝑏 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 )
10
Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑤= 𝑤 𝑐1 𝑤 𝑐2 𝑤 𝑐3 𝑤 𝑐4 𝑤 𝑑1 𝑤 𝑑2 𝑤 𝑑3 𝑤 𝑑4 𝑤 𝑏1 𝑤 𝑏2 𝑤 𝑏3 𝑤 𝑏4 𝑔=𝑤 𝑥 𝑇 + 𝑏 𝑇 𝑏= 𝑏 𝑐 𝑏 𝑑 𝑏 𝑏 𝑓 𝑐 = 𝑒 𝑔 𝑐 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑑 = 𝑒 𝑔 𝑑 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑏 = 𝑒 𝑔 𝑏 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 )
11
Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑤= 𝑤 𝑐1 𝑤 𝑐2 𝑤 𝑐3 𝑤 𝑐4 𝑤 𝑑1 𝑤 𝑑2 𝑤 𝑑3 𝑤 𝑑4 𝑤 𝑏1 𝑤 𝑏2 𝑤 𝑏3 𝑤 𝑏4 𝑔=𝑤 𝑥 𝑇 + 𝑏 𝑇 𝑏= 𝑏 𝑐 𝑏 𝑑 𝑏 𝑏 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑔)
12
Linear Softmax 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤 𝑥 𝑇 + 𝑏 𝑇 ) 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ]
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤 𝑥 𝑇 + 𝑏 𝑇 )
13
Two-layer MLP + Softmax
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [2] 𝑥 𝑇 + 𝑏 [2] 𝑇 )
14
N-layer MLP + Softmax 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 )
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) … 𝑎 𝑘 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) … 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 )
15
How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) … 𝑎 𝑘 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) … 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 )
16
Forward pass (Forward-propagation)
𝑧 𝑖 = 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 𝑎 𝑖 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝑝 1 = 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 𝑦 1 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝐿𝑜𝑠𝑠=𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4
17
How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) We can still use SGD … 𝑎 𝑘 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑖] 𝑇 ) … We need! 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 𝜕𝑙 𝜕 𝑏 𝑘 𝑖 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 )
18
How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) … We can still use SGD 𝑎 𝑖 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) … We need! 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 ) 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 𝜕𝑙 𝜕 𝑏 𝑘 𝑖 𝑙=𝑙𝑜𝑠𝑠(𝑓, 𝑦)
19
How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) … We can still use SGD 𝑎 𝑖 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) … We need! 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 ) 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 𝜕𝑙 𝜕 𝑏 𝑘 𝑖 𝑙=𝑙𝑜𝑠𝑠(𝑓, 𝑦)
20
How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) … 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 = 𝜕𝑙 𝜕 𝑎 𝑛−1 𝜕 𝑎 𝑛−1 𝜕 𝑎 𝑛−2 … 𝜕 𝑎 𝑘 𝜕 𝑎 𝑘−1 𝜕 𝑎 𝑘−1 𝜕 𝑤 𝑘 𝑖𝑗 𝑎 𝑖 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) … 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 ) 𝑙=𝑙𝑜𝑠𝑠(𝑓, 𝑦)
21
Backward pass (Back-propagation)
GradInputs 𝜕𝐿 𝜕 𝑥 𝑘 =( 𝜕 𝜕 𝑥 𝑘 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 ) 𝜕𝐿 𝜕 𝑧 𝑖 𝜕𝐿 𝜕 𝑤 1𝑖𝑗 = 𝜕 𝑥 𝑘 𝜕 𝑤 1𝑖𝑗 𝜕𝐿 𝜕 𝑥 𝑘 𝜕𝐿 𝜕 𝑧 𝑖 = 𝜕 𝜕 𝑧 𝑖 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝜕𝐿 𝜕 𝑎 𝑘 𝜕𝐿 𝜕 𝑎 𝑘 =( 𝜕 𝜕 𝑎 𝑘 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 ) 𝜕𝐿 𝜕 𝑝 1 𝜕𝐿 𝜕 𝑤 2𝑖 = 𝜕 𝑎 𝑘 𝜕 𝑤 2𝑖 𝜕𝐿 𝜕 𝑎 𝑘 𝜕𝐿 𝜕 𝑝 1 = 𝜕 𝜕 𝑝 1 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝜕𝐿 𝜕 𝑦 1 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝜕𝐿 𝜕 𝑦 1 = 𝜕 𝜕 𝑦 1 𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4 GradParams
22
Softmax + Negative Log Likelihood
23
Linear layer
24
ReLU layer
25
Two-layer Neural Network – Forward Pass
26
Two-layer Neural Network – Backward Pass
27
Convolutional Layer
28
Convolutional Layer
29
Convolutional Layer Weights
30
Convolutional Layer Weights 4
31
Convolutional Layer Weights 1 4
32
Convolutional Layer (with 4 filters)
weights: 4x1x9x9 Input: 1x224x224 Output: 4x224x224 if zero padding, and stride = 1
33
Convolutional Layer (with 4 filters)
weights: 4x1x9x9 Input: 1x224x224 Output: 4x112x112 if zero padding, but stride = 2
34
Convolutional Layer in Torch
nOutputPlane x nInputPlane kW kH Input Output nOutputPlane (equals the number of convolutional filters for this layer) nInputPlane (e.g. 3 for RGB inputs)
35
Convolutional Layer in Keras
Convolution2D(nOutputPlane, kW, kH, input_shape = (3, 224, 224), subsample = 2, border_mode = valid) nOutputPlane x nInputPlane kW kH Input Output nOutputPlane (equals the number of convolutional filters for this layer) nInputPlane (e.g. 3 for RGB inputs)
36
Convolutional Layer in pytorch
out_channels x in_channels kernel_size Input Output out_channels (equals the number of convolutional filters for this layer) in_channels (e.g. 3 for RGB inputs)
37
Automatic Differentiation
You only need to write code for the forward pass, backward pass is computed automatically. Pytorch (Facebook -- mostly): Tensorflow (Google -- mostly): DyNet (team includes UVA Prof. Yangfeng Ji):
38
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.