Introduction to Neural Networks
Overview The motivation for NNs Neuron – The basic unit Fully-connected Neural Networks Feedforward (inference) The linear algebra behind Convolutional Neural Networks & Deep Learning
The Brain does Complex Tasks 3×4 12 Tiger Danger! Fine
Inside the Brain
Real vs. Artificial Neuron Inputs Inputs Weights 𝐼 1 Output 𝑤 1 𝑓( ) 𝐼 2 𝑤 2 𝑤 3 𝐼 3 Outputs
Σ 𝜎 Neuron – General Model 𝑎 𝑎=𝜎 𝑗=1 𝑁 𝐼 𝑗 ⋅ 𝑤 𝑗 (activation) 𝐼 1 𝑤 1 𝐼 2 𝑤 2 𝑎 Σ 𝜎 𝑤 𝑁 (activation) 𝐼 𝑁 𝑎=𝜎 𝑗=1 𝑁 𝐼 𝑗 ⋅ 𝑤 𝑗
Neuron Activation Functions The most popular activation functions: Sigmoid was the first to be used, tanh came later Rectified Linear Unit (ReLU) is the most widely used today (also the simplest function) Allows for faster net training (will be discussed later)
Overview The motivation for NNs Neuron – The basic unit Fully-connected Neural Networks Feedforward (inference) The linear algebra behind Convolutional Neural Networks & Deep Learning
Feedforward Neural Network Hidden Layers Input Layer Output Layer
Feedforward Neural Network
Feedforward – How it Works? Input Layer Output Layer 𝐼 1 𝑂 1 𝐼 2 𝐼 3 𝑂 2 Flow of Computation
Feedforward – What Can it Do? Classification region vs. # of layers in a NN: More neurons More complexity in classification
𝑎 3 1 𝑤 1 3 2 Indexing Conventions (activation) (weight) Layer 1 Input Output Layer index 𝑎 3 1 𝑤 1 3 2 (activation) (weight) Neuron index Weight index within The neuron
Feedforward – General Equations Layer 1 Layer 2 Input Output 𝐼 1 𝐼 2 𝐼 3 Input Weights matrix of layer j 𝑊 [𝑚,𝑛] 𝑗 = & 𝑤 1 1 𝑗 ⋯ neuron 1 weights ⋯ 𝑤 1 𝑛 𝑗 ⋮ 𝑤 𝑘 1 𝑗 ⋯ neuron 𝑘 weights ⋯ 𝑤 𝑘 𝑛 𝑗 & ⋮ 𝑤 𝑚 1 𝑗 ⋯ neuron 𝑚 weights ⋯ 𝑤 𝑚 𝑛 𝑗 𝐼= 𝐼 1 𝐼 2 ⋮ 𝐼 𝑁 Neurons number of activations of layer 𝑗‑1 Inputs per neuron =
The Linear Algebra Behind Feedforward: Example Layer 1 Layer 2 Input Output 𝐼 1 𝐼 2 𝐼 3 1st hidden layer weights: 𝑊 [5,3] 1 = 𝑤 1 1 1 𝑤 1 2 1 𝑤 1 3 1 ⋮ 𝑤 5 1 1 𝑤 5 2 1 𝑤 5 3 1 1st neuron activation: 𝑎 1 1 =𝜎 𝑗=1 3 𝑤 1 𝑗 1 ⋅ 𝐼 𝑗 𝐼= 𝐼 1 𝐼 2 𝐼 3 𝒂 𝟏 =𝜎 𝑊 1 ⋅𝐼 =𝜎 𝑤 1 1 1 𝑤 1 2 1 𝑤 1 3 1 ⋮ 𝑤 5 1 1 𝑤 5 2 1 𝑤 5 3 1 ⋅ 𝐼 1 𝐼 2 𝐼 3 = 𝜎( Σ j 𝑤 1 𝑗 1 ⋅ 𝐼 𝑗 ) ⋮ 𝜎( Σ j 𝑤 5 𝑗 1 ⋅ 𝐼 𝑗 )
The Linear Algebra Behind Feedforward Layer 1 𝑎 [5,1] 1 = 𝜎(Σ 𝑤 1 𝑗 1 ⋅ 𝐼 𝑗 ) ⋮ 𝜎(Σ 𝑤 5 𝑗 1 ⋅ 𝐼 𝑗 ) Input Layer 2 Output 2st hidden layer weights: 𝑊 [4,5] 2 = 𝑤 1 1 2 𝑤 1 2 2 … 𝑤 1 5 2 ⋮ 𝑤 4 1 2 𝑤 4 2 2 … 𝑤 4 5 2 =𝜎 𝑤 1 1 2 𝑤 1 2 2 … 𝑤 1 5 2 ⋮ 𝑤 4 1 2 𝑤 4 2 2 … 𝑤 4 5 2 ⋅ 𝑎 1 1 𝑎 2 1 ⋮ 𝑎 5 1 𝒂 𝟐 =𝜎 𝑊 2 ⋅ 𝑎 1 = 𝜎( Σ j 𝑤 1 𝑗 2 ⋅ 𝑎 𝑗 1 ) ⋮ 𝜎( Σ j 𝑤 4 𝑗 2 ⋅ 𝑎 𝑗 1 )
The Linear Algebra Behind Feedforward Layer 1 Input Layer 2 Output 𝒂 𝒌 =𝜎 𝑊 𝑘 ⋅ 𝑎 𝑘−1 =𝜎 𝑤 1 1 𝑘 … 𝑤 1 𝑛 𝑘 ⋮ 𝑤 𝑚 1 𝑘 … 𝑤 𝑚 𝑛 𝑘 ⋅ 𝑎 1 𝑘−1 𝑎 2 𝑘−1 ⋮ 𝑎 𝑛 𝑘−1 = 𝜎( Σ j 𝑤 1 1 𝑘 ⋅ 𝑎 1 𝑘−1 ) ⋮ 𝜎( Σ j 𝑤 𝑚 𝑗 𝑘 ⋅ 𝑎 𝑗 𝑘−1 ) 𝒂 𝒌 =𝜎 𝑊 𝑘 ⋅ 𝑎 𝑘−1 =𝜎 𝑊 𝑘 ⋅𝜎 𝑊 𝑘−1 ⋅ 𝑎 𝑘−2 =…
Number of Computations Given NN with 𝒌 layers (hidden + output): Total DMV (Dense Matrix⋅Vector) multiplications = 𝑘 Time complexity = 𝑂( 𝑖=1 𝑘 𝑛 𝑖 ⋅ 𝑛 𝑖−1 ) Memory complexity = 𝑂( 𝑖=1 𝑘 𝑛 𝑖 ⋅ 𝑛 𝑖−1 ) 𝑛 𝑖−1 𝑤 1 1 𝑖 … 𝑤 1 𝑛 𝑖−1 𝑖 ⋮ 𝑤 𝑛 𝑖 1 𝑘 … 𝑤 𝑛 𝑖 𝑛 𝑖−1 𝑘 ⋅ 𝑎 1 𝑖−1 ⋮ 𝑎 𝑛 𝑖−1 𝑖−1 𝑛 𝑖 𝑛 𝑖−1
Small Leftover – The Bias Last activation is constant 1 Layer 1 Input Layer 2 Output +1 +1 +1 𝒂 𝟏 =𝜎 𝑊 1 ⋅𝐼 =𝜎 𝑤 1 1 1 𝑤 1 2 1 𝑤 1 3 1 𝑤 1 4 1 ⋮ 𝑤 5 1 1 𝑤 5 2 1 𝑤 5 3 1 𝑤 5 4 1 ⋅ 𝐼 1 𝐼 2 𝐼 3 1 𝒂 𝟐 =𝜎 𝑊 2 ⋅ 𝑎 1 =𝜎 𝑤 1 1 2 𝑤 1 2 2 … 𝑤 1 5 2 𝑤 1 6 2 ⋮ 𝑤 4 1 2 𝑤 4 2 2 … 𝑤 4 5 2 𝑤 4 6 2 ⋅ 𝑎 1 1 𝑎 2 1 ⋮ 𝑎 5 1 1
Classification – Softmax Layer Classification one output = ‘1’, the rest are ‘0’. Neuron’s weighted sum is not limited to 1. The solution: softmax 𝑧 𝑘 = 𝑗=1…𝑁 𝑤 𝑘 𝑗 1 ⋅ 𝐼 𝑗 𝑎 𝑘 = 𝑒 𝑧 𝑘 𝑗∈𝑜𝑢𝑡𝑝𝑢𝑡 𝑒 𝑧 𝑗 ∀𝑘: 𝑎 𝑘 ∈(0,1]
Overview The motivation for NNs Neuron – The basic unit Fully-connected Neural Networks Feedforward (inference) The linear algebra behind Convolutional Neural Networks & Deep Learning
Convolutional Neural Networks A main building-block of deep learning Called Convnets / CNNs in short Motivation: When the input data is an image 1000 1000 Spatial correlation is local Better to put resources elsewhere
Convolutional Layer Reduce connectivity to local regions Example: 1000×1000 image 100 different filters Filter size: 10×10 10k parameters Every filter is different
Convnets Sliding window computation: 𝑾 𝟏,𝟑 𝑾 𝟏,𝟐 𝑾 𝟏,𝟏 𝑾 𝟐,𝟑 𝑾 𝟐,𝟐 𝑾 𝟐,𝟏 𝑾 𝟑,𝟑 𝑾 𝟑,𝟐 𝑾 𝟑,𝟏
Convnets Sliding window computation:
Convnets Sliding window computation:
Convnets Sliding window computation:
Convnets Sliding window computation:
Convnets Sliding window computation:
Convnets Sliding window computation:
Convnets Sliding window computation:
Convnets Sliding window computation:
* = Conv Layer – The Math 𝑤 – Filter kernel of size 𝐾×𝐾 𝑥 - Input 𝑎 𝑖,𝑗 =𝑤∗ 𝑥 𝑖,𝑗 = 𝑝=1 𝐾 𝑞=1 𝐾 𝑤 𝑝, 𝑞 ⋅ 𝑥 𝑖+𝑝,𝑗+𝑞 * = Filter
Conv Layer - Parameters Zero padding – add surrounding zeros so that output size = input size Stride – number of pixels to move the filter * = Stride = 2
Conv Layer – Multiple Inputs Output = sum of convolutions Multiple outputs called Feature Maps Σ Output Feature maps for (n = 0; n < N; n++) for (m = 0; m < M; m ++) for(y = 0; y<Y; y++) for(x = 0; x<X; x++) for (p = 0; p< K; p++) for (q = 0; q< K; q++) AL (n; x, y) += AL-1(m, x+p, y+q) * w (m , n; p, q); Input Feature maps Single Input Filter Kernel
Pooling Layer Multiple inputs single output Reduces amount of data Several types: Max (most common) Average …
Putting it All Together - AlexNet The first CNN to win ImageNet challenge ImageNet: 1.2M 256×256 images,1000 classes Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." In European Conference on Computer Vision, 2014.
AlexNet – Inside the Feature Maps Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." In European Conference on Computer Vision, pp. 818-833. Springer International Publishing, 2014.
Putting it All Together - AlexNet Trained for 1 week on 2 GPUs Each Geforce GTX 580 – 3GB memory Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
Architecture for Classification category prediction Total nr. params: 60M 4M Total nr. flops: 832M 4M LINEAR 16M 37M FULLY CONNECTED 16M 37M FULLY CONNECTED MAX POOLING 442K CONV 74M 1.3M 884K CONV 224M 149M CONV MAX POOLING LOCAL CONTRAST NORM 307K CONV 223M MAX POOLING LOCAL CONTRAST NORM CONV 35K 105M 110 Ranzato Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
Convnet - Bigger is Better? GoogLeNet (2014) – ImageNet winner Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
Thank you