Download presentation
Published byAvis Malone Modified over 8 years ago
1
Understanding Convolutional Neural Networks for Object Recognition
Domen Tabernik University of Ljubljana Faculty of Computer and Information Science Visual Cognitive Systems Laboratory
2
Visual object recognition
How to capture representation of objects in computational/mathematical model? Impossible to explicitly model each one - too many objects, too many variations Let machine learn the model from samples Inspiration from biological (human/primate) visual system Key element: a hierarchy Raw pixels (x,y,RGB/Gray) ? Objects: a category [position or segmentation] Biological plausibility Part sharing between objects/categories Efficient representation Object/part as a composition of other parts Compositional interpretation Kruger et al, Deep Hierarchies in the Primate Visual Cortex : What Can We Learn For Computer Vision ?, PAMI 2012
3
Deep learning – a sigmoid neuron
Basic element: a sigmoid neuron (improved perceptron from 60‘s) Mathematical form: Weighted linear combination of inputs + bias Sigmoid function: Why sigmoid? Equals to a smooth threshold function Smoothness nice mathematical properties (derivatives) Threshold adds non-linearity when stacked Captures more complex representations A probability of a car y =0.78 Image/pixels
4
Deep learning – a sigmoid neuron
Basic element: a sigmoid neuron (improved perceptron from 60‘s) Learning process: Known values: x, y learning input values Unknown values: w, b learned output parameters Which w,b will for ALL learning images xn produce its correct output yn? Basically, cost is an average difference between neuron‘s outputs and actual correct outputs A probability of a car y =0.78 Image/pixels
5
Deep learning – optimization
Best solution when cost is lowest, therefore our goal is a minimal C(w,b): Basic optimization problem: When is the function at a minimal point? (high-school math problem) When its derivative is at ZERO. How to find ZERO derivative? Analytically? Need N > num(w) Not possible when stacked Naive iterative approach: Start at random position Find small combination of ∆w to min. C If num(w) big, too many combinations and checks Use gradient descent instead!
6
Deep learning – gradient descent
Iterative process: Start at random position Compute activations for all samples yn Find partial derivative/gradient for each parameter w (and b) Move each wi (and b) in its gradient direction (actually in negative gradient) Repeat until cost low enough Stochastic gradient/mini batch: Take smaller subset of samples at each step Has still enough gradient information Not perfect – has multiple solutions ! Local minima Plateaus
7
Deep learning – gradient descent
Heuristics to avoid local minima and plateaus Chose step size carefully Too small: slow convergence and unable to escape local maxima Too big: will not converge! Momentum: Considers gradients from previous steps to increase or decrease step size Helps escape local maxima without manually increasing step size parameter Weight decay (regularization) Main goal: want to have only a small number of weights active/big Primarily used to fight overfitting issues but helps to escape local maxima as well Second order derivatives Gradients for wi considers other gradients wj (where i != j) Approximations to second order derivatives Nesterov‘s algorithm AdaGrad AdaDelta …
8
Deep learning – back-propagation
Single neuron: Stacked (deep) neurons: Keep repeating the chaining process from top to bottom Take into account all paths where wi appears Chain rule for derivative of f(g(x))
9
Deep learning – convolutional net
Previous slides all general (not computer vision specific) Appling only fully connected deep neural network to image not feasible Image size 128x128 16k pixels Input neurons = 16k First layer neurons = 4k (lets say we want to reduce dimensions at each layer by 2x) Number of weights = 16*4m (64 mio for first layer only !!) We can exploit spatial locality of images Features are local, only small neighborhood of pixels are needed Feature repeat thorough the image Local connections and weight sharing: Divide neurons into sub-features, each RGB channel is a separate feature One neuron looks only at a small local neighborhood of pixels (3x3, 5x5, 7x7,…) Neurons of the same feature but each at different positions share weights
10
Deep learning - ReLU How does sigmoid function affect learning?
Enables easier computation of derivative but has negative effects: Neuron never reaches 1 or 0 saturating Gradient reduces the magnitude of error Leads to two problems: Slow learning when neurons saturated i.e. big z values Vanishing gradient problem (gradient always 25% of error from previous layer!!)
11
Deep learning - ReLU Alex Krizhevsky (2011) proposed Rectified Linear Unit instead of sigmoid function Main purpose of ReLu: reduces saturation and vanishing gradient issues Still not perfect: Stops learning at negative z values (can use piecewise linear - Parametric ReLu, He 2015 from Microsoft) Bigger risk of saturating neurons to infinity
12
Deep learning - dropout
Too many weights cause overfitting issues Weight decay (regularization) helps but is not perfect Also adds another hyper-parameter to setup manually Srivastava et al. (2014) proposed a kind of „bagging“ for deep nets (actually Alex Krizhevsky already used it in AlexNet in 2011) Main point: Robustify network by disabling neurons Each neuron has a probability, usually of 0.4, of being disabled Remaining neurons must adept to work without them Applied only to fully connected layers Conv. layers less susceptible to overfitting Srivastava et al., Dropout : A Simple Way to Prevent Neural Networks from Overfitting, JMLR 2014
13
Deep learning – batch norm
Input needs to be whitened i.e. normalized (LeCun 1998, Efficient BackProp) Usually done on first layer input only The same reason for normalization of first layer exists for other layers as well Ioffe and Szegedy, Batch Normalization, 2015 Normalize input to each layer Reduce internal covariance shift Too slow to normalize all input data (>1M samples) Instead normalize within mini-batch only Learning: norm over mini-batch data Inference: norm over all trained input data Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015 Better results while allowing to use higher learning rate, higher decay, no dropout, no LRN.
14
Deep learning - residual learning
Current state-of-the-art on ImageNet classification: CNN with ~150 layer (by Microsoft China) Key features: Requires a reduction of internal covariance shift (Batch Normalization) Only ~2M parameters (using many small kernels, 1x1, 3x3) CNN with 1500 layers had ~20M parameters and had overfitting issues Adds identity bypass: Why bypass? If layer will not be needed it can simply be ignored; it will just forward input as output By default weights are really small and F(x,{Wi}) is negligible compared to x He et al., Deep Residual Learning for Image Recognition, CVPR2016
15
Deep learning - visualizing features
Difficult to understand internals of CNNs Many visualization attempts, most quite complex Zeiler and Fergus, Visualizing and Understanding Convolutional Networks, ECCV2013 Simonyan et al., Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR2014 Mahendran et al., Understanding Deep Image Representations by Inverting Them, CVPR2015 Yosinski et al., Understanding Neural Networks Through Deep Visualization, ICML2015 Strange properties of CNNs: adversarial examples Add invisible permutations to pixels completely incorrect classifications Prediction: unknown a car Image diff: Prediction: a car unknown Image diff: Szegedy et al., Intriguing properties of neural networks, ICLR2014
16
Deep learning - visualizing features
Zeiler and Fergus, Visualizing and Understanding Convolutional Networks, ECCV2013
17
Deep learning - visualizing features
Zeiler and Fergus, Visualizing and Understanding Convolutional Networks, ECCV2013
18
Deep learning - visualizing features
Zeiler and Fergus, Visualizing and Understanding Convolutional Networks, ECCV2013
19
Deep learning - visualizing features
Simonyan et al., Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR2014
20
Deep learning - visualizing features
Mahendran et al., Understanding Deep Image Representations by Inverting Them, CVPR2015
21
Deep learning - visualizing features
Yosinski et al., Understanding Neural Networks Through Deep Visualization, ICML2015
22
PART II
23
Convolutional neural networks
Trained filters for part on a second layer: Which parts on first layer are important? Can we deduce anything about the object/part modeled this way? Compositional interpretation? CNN hierarchical but not compositional
24
Our approach CNN might learn compositions but compositions are not explicit Cannot utilize advantages of compositions Capture compositions as structure in filters with weight parametrization: Use Gaussian distribution as model:
25
Compositional neural network
Compositional deep nets Convolutional neural nets Possible benefits: Model can be interpreted as hierarchical composition! Reduced number of parameters (faster learning?, less training samples?) Combine generative learning (co-occurance statistics from compositional hierarchies) with discriminative optimization (gradient descent from CNN) Visualizations based on compositions without additional data or complex optimization
26
Compositional neural network
Back-propagation remains the same: Minimize loss function C w.r.t.: Weights Means Variance
27
Compositional neural network
28
First layer - activations
… Input image (normalized) First and last only blob detectors (with positive or negative weights) Ones in the midle are edge detectors – this can be also deduced from looking at the filter!
29
16 different channel filters per one feature
Second layer - weights 16 different channel filters per one feature 16 different features at second layer Gaussian CNN Standard CNN
30
Second layer - activations
… Input image (normalized) … Some featuers still just edges, but some already corner points Input image (normalized)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.