Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks.

Similar presentations


Presentation on theme: "Neural Networks."— Presentation transcript:

1 Neural Networks

2 Today’s Class Neural Networks The Perceptron Model
The Multi-layer Perceptron (MLP) Forward-pass in an MLP (Inference) Backward-pass in an MLP (Backpropagation)

3 Perceptron Model Frank Rosenblatt (1957) - Cornell University
Activation function 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑓 𝑥 = 1, if 𝑖=0 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏>0 &0, otherwise More:

4 !? Perceptron Model Frank Rosenblatt (1957) - Cornell University 𝑥 1
𝑥 2 𝑥 3 𝑥 4 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑓 𝑥 = 1, if 𝑖=0 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏>0 &0, otherwise More:

5 Perceptron Model Frank Rosenblatt (1957) - Cornell University
Activation function 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑤 1 𝑤 2 𝑤 3 𝑤 4 𝑓 𝑥 = 1, if 𝑖=0 𝑛 𝑤 𝑖 𝑥 𝑖 +𝑏>0 &0, otherwise More:

6 Activation Functions Step(x) Sigmoid(x) Tanh(x) ReLU(x) = max(0, x)

7 Two-layer Multi-layer Perceptron (MLP)
”hidden" layer Loss / Criterion 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4

8 Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑔 𝑐 = 𝑤 𝑐1 𝑥 𝑖1 + 𝑤 𝑐2 𝑥 𝑖2 + 𝑤 𝑐3 𝑥 𝑖3 + 𝑤 𝑐4 𝑥 𝑖4 + 𝑏 𝑐 𝑔 𝑑 = 𝑤 𝑑1 𝑥 𝑖1 + 𝑤 𝑑2 𝑥 𝑖2 + 𝑤 𝑑3 𝑥 𝑖3 + 𝑤 𝑑4 𝑥 𝑖4 + 𝑏 𝑑 𝑔 𝑏 = 𝑤 𝑏1 𝑥 𝑖1 + 𝑤 𝑏2 𝑥 𝑖2 + 𝑤 𝑏3 𝑥 𝑖3 + 𝑤 𝑏4 𝑥 𝑖4 + 𝑏 𝑏 𝑓 𝑐 = 𝑒 𝑔 𝑐 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑑 = 𝑒 𝑔 𝑑 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑏 = 𝑒 𝑔 𝑏 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 )

9 Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑤= 𝑤 𝑐1 𝑤 𝑐2 𝑤 𝑐3 𝑤 𝑐4 𝑤 𝑑1 𝑤 𝑑2 𝑤 𝑑3 𝑤 𝑑4 𝑤 𝑏1 𝑤 𝑏2 𝑤 𝑏3 𝑤 𝑏4 𝑔 𝑐 = 𝑤 𝑐1 𝑥 𝑖1 + 𝑤 𝑐2 𝑥 𝑖2 + 𝑤 𝑐3 𝑥 𝑖3 + 𝑤 𝑐4 𝑥 𝑖4 + 𝑏 𝑐 𝑔 𝑑 = 𝑤 𝑑1 𝑥 𝑖1 + 𝑤 𝑑2 𝑥 𝑖2 + 𝑤 𝑑3 𝑥 𝑖3 + 𝑤 𝑑4 𝑥 𝑖4 + 𝑏 𝑑 𝑔 𝑏 = 𝑤 𝑏1 𝑥 𝑖1 + 𝑤 𝑏2 𝑥 𝑖2 + 𝑤 𝑏3 𝑥 𝑖3 + 𝑤 𝑏4 𝑥 𝑖4 + 𝑏 𝑏 𝑏= 𝑏 𝑐 𝑏 𝑑 𝑏 𝑏 𝑓 𝑐 = 𝑒 𝑔 𝑐 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑑 = 𝑒 𝑔 𝑑 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑏 = 𝑒 𝑔 𝑏 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 )

10 Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑤= 𝑤 𝑐1 𝑤 𝑐2 𝑤 𝑐3 𝑤 𝑐4 𝑤 𝑑1 𝑤 𝑑2 𝑤 𝑑3 𝑤 𝑑4 𝑤 𝑏1 𝑤 𝑏2 𝑤 𝑏3 𝑤 𝑏4 𝑔=𝑤 𝑥 𝑇 + 𝑏 𝑇 𝑏= 𝑏 𝑐 𝑏 𝑑 𝑏 𝑏 𝑓 𝑐 = 𝑒 𝑔 𝑐 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑑 = 𝑒 𝑔 𝑑 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 ) 𝑓 𝑏 = 𝑒 𝑔 𝑏 / (𝑒 𝑔 𝑐 + 𝑒 𝑔 𝑑 + 𝑒 𝑔 𝑏 )

11 Linear Softmax 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑤= 𝑤 𝑐1 𝑤 𝑐2 𝑤 𝑐3 𝑤 𝑐4 𝑤 𝑑1 𝑤 𝑑2 𝑤 𝑑3 𝑤 𝑑4 𝑤 𝑏1 𝑤 𝑏2 𝑤 𝑏3 𝑤 𝑏4 𝑔=𝑤 𝑥 𝑇 + 𝑏 𝑇 𝑏= 𝑏 𝑐 𝑏 𝑑 𝑏 𝑏 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑔)

12 Linear Softmax 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤 𝑥 𝑇 + 𝑏 𝑇 ) 𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ]
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑤 𝑥 𝑇 + 𝑏 𝑇 )

13 Two-layer MLP + Softmax
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [2] 𝑥 𝑇 + 𝑏 [2] 𝑇 )

14 N-layer MLP + Softmax 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 )
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) 𝑎 𝑘 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 )

15 How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) 𝑎 𝑘 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 )

16 Forward pass (Forward-propagation)
𝑧 𝑖 = 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 𝑎 𝑖 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝑝 1 = 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 𝑦 1 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝐿𝑜𝑠𝑠=𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4

17 How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) We can still use SGD 𝑎 𝑘 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑖] 𝑇 ) We need! 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 𝜕𝑙 𝜕 𝑏 𝑘 𝑖 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 )

18 How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) We can still use SGD 𝑎 𝑖 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) We need! 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 ) 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 𝜕𝑙 𝜕 𝑏 𝑘 𝑖 𝑙=𝑙𝑜𝑠𝑠(𝑓, 𝑦)

19 How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) We can still use SGD 𝑎 𝑖 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) We need! 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 ) 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 𝜕𝑙 𝜕 𝑏 𝑘 𝑖 𝑙=𝑙𝑜𝑠𝑠(𝑓, 𝑦)

20 How to train the parameters?
𝑥 𝑖 =[ 𝑥 𝑖1 𝑥 𝑖2 𝑥 𝑖3 𝑥 𝑖4 ] [ ] 𝑦 𝑖 = 𝑦 𝑖 = [ 𝑓 𝑐 𝑓 𝑑 𝑓 𝑏 ] 𝑎 1 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [1] 𝑥 𝑇 + 𝑏 [1] 𝑇 ) 𝑎 2 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [2] 𝑎 1 𝑇 + 𝑏 [2] 𝑇 ) 𝜕𝑙 𝜕 𝑤 [𝑘]𝑖𝑗 = 𝜕𝑙 𝜕 𝑎 𝑛−1 𝜕 𝑎 𝑛−1 𝜕 𝑎 𝑛−2 … 𝜕 𝑎 𝑘 𝜕 𝑎 𝑘−1 𝜕 𝑎 𝑘−1 𝜕 𝑤 𝑘 𝑖𝑗 𝑎 𝑖 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑤 [𝑘] 𝑎 𝑘−1 𝑇 + 𝑏 [𝑘] 𝑇 ) 𝑓=𝑠𝑜𝑓𝑡𝑚𝑎𝑥( 𝑤 [𝑛] 𝑎 𝑛−1 𝑇 + 𝑏 [𝑛] 𝑇 ) 𝑙=𝑙𝑜𝑠𝑠(𝑓, 𝑦)

21 Backward pass (Back-propagation)
GradInputs 𝜕𝐿 𝜕 𝑥 𝑘 =( 𝜕 𝜕 𝑥 𝑘 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 ) 𝜕𝐿 𝜕 𝑧 𝑖 𝜕𝐿 𝜕 𝑤 1𝑖𝑗 = 𝜕 𝑥 𝑘 𝜕 𝑤 1𝑖𝑗 𝜕𝐿 𝜕 𝑥 𝑘 𝜕𝐿 𝜕 𝑧 𝑖 = 𝜕 𝜕 𝑧 𝑖 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝜕𝐿 𝜕 𝑎 𝑘 𝜕𝐿 𝜕 𝑎 𝑘 =( 𝜕 𝜕 𝑎 𝑘 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 ) 𝜕𝐿 𝜕 𝑝 1 𝜕𝐿 𝜕 𝑤 2𝑖 = 𝜕 𝑎 𝑘 𝜕 𝑤 2𝑖 𝜕𝐿 𝜕 𝑎 𝑘 𝜕𝐿 𝜕 𝑝 1 = 𝜕 𝜕 𝑝 1 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝜕𝐿 𝜕 𝑦 1 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝜕𝐿 𝜕 𝑦 1 = 𝜕 𝜕 𝑦 1 𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4 GradParams

22 Softmax + Negative Log Likelihood

23 Linear layer

24 ReLU layer

25 Two-layer Neural Network – Forward Pass

26 Two-layer Neural Network – Backward Pass

27 Convolutional Layer

28 Convolutional Layer

29 Convolutional Layer Weights

30 Convolutional Layer Weights 4

31 Convolutional Layer Weights 1 4

32 Convolutional Layer (with 4 filters)
weights: 4x1x9x9 Input: 1x224x224 Output: 4x224x224 if zero padding, and stride = 1

33 Convolutional Layer (with 4 filters)
weights: 4x1x9x9 Input: 1x224x224 Output: 4x112x112 if zero padding, but stride = 2

34 Convolutional Layer in Torch
nOutputPlane x nInputPlane kW kH Input Output nOutputPlane (equals the number of convolutional filters for this layer) nInputPlane (e.g. 3 for RGB inputs)

35 Convolutional Layer in Keras
Convolution2D(nOutputPlane, kW, kH, input_shape = (3, 224, 224), subsample = 2, border_mode = valid) nOutputPlane x nInputPlane kW kH Input Output nOutputPlane (equals the number of convolutional filters for this layer) nInputPlane (e.g. 3 for RGB inputs)

36 Convolutional Layer in pytorch
out_channels x in_channels kernel_size Input Output out_channels (equals the number of convolutional filters for this layer) in_channels (e.g. 3 for RGB inputs)

37 Automatic Differentiation
You only need to write code for the forward pass, backward pass is computed automatically. Pytorch (Facebook -- mostly): Tensorflow (Google -- mostly): DyNet (team includes UVA Prof. Yangfeng Ji):

38 Questions?


Download ppt "Neural Networks."

Similar presentations


Ads by Google