Introduction to Convolutional Neural Networks boris.ginsburg@gmail.com
Acknowledgments This course is heavily based on Lecun’ , Ng, and Bengio’ tutorials http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php http://deeplearning.net/reading-list/tutorials/ http://deeplearning.net/tutorial/lenet.html http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial … and many other presentations, blogs, and papers.
Agenda Course overview Introduction to Deep Learning Classical Computer Vision vs. Deep learning Basic CNN Architecture Large Scale Image Classifications How deep should be Conv Nets? Deep Learning Applications Consider outlining your agenda verbally. May not need an agenda for 8 slides
Course overview Introduction CNN Training Intro to Deep Learning Network topology, layers definition, forward propagation Caffe: Getting started and MNIST CNN Training Backward propagation Optimization for Deep Learning: SGD with momentum, rate adaptation, Adagrad and Nesterov Saddle points problem Caffe: CIFAR10
Course overview-2 GPGPU programming with CUDA Advanced CNN topics Regularization: Dropout, Stochastic pooling Tricks of Trade: Data augmentation Imagenet training Localization and Detection with Convnets Overfeat , R-CNN, Spatial Pyramid Pooling with CNN CPU parallelization and performance optimization OpenMP, and BLAS/MKL Vtune
Course overview-3 Seminars Projects: Training of MINIST, CIFAR-10, and Imagenet Re-implementation of convolutional layer Projects: New layers and algorithms for caffe New datasets Additional Topics: Unsupervised training with Auto-encoders, Siamese networks, Recurrent NN and LSTM Language Processing & Speech Recognition with DL, Reinforcement Learning and Games
Introduction to Deep Learning
Buzz…
Deep Learning – from Research to Technology Deep Learning - breakthrough in computer vision, speech recognition and language processing
Classical Computer Vision Pipeline
Classical Computer Vision Pipeline. CV experts Select / develop features: SURF, HoG, SIFT, RIFT, … Add on top of this Machine Learning for multi-class recognition and train classifier Feature Extraction: SIFT, HoG... Detection, Classification Recognition Classical CV feature definition is domain-specific and time-consuming
Deep Learning –based Vision Pipeline. Build features automatically based on training data Joint training of feature extraction and classification DL experts: define NN topology and train NN Deep NN... Classification “The battle between SIFT/HOG vs Convolutional NN based features for recognition is over. CNN have won“ Prof. Malik, Berkeley
Computer Vision +Deep Learning + Machine Learning We want to combine Deep Learning + CV + ML Use deep learning for feature extraction; Classical CV for Region detection , Pyramid pooling etc Use best ML methods for multi-class recognition Deep NN... Spatial Pyramid Pooling ML AdaBoost … Deep Learning promise: train good feature automatically, same method for different domain
Deep Learning Basics Deep Learning – is a set of machine learning algorithms based on multi-layer networks CAT DOG OUTPUTS HIDDEN NODES INPUTS
Deep Learning Basics Deep Learning – is a set of machine learning algorithms based on multi-layer networks CAT DOG Training
Deep Learning Basics Deep Learning – is a set of machine learning algorithms based on multi-layer networks CAT DOG
Deep Learning Basics Deep Learning – is a set of machine learning algorithms based on multi-layer networks CAT DOG
Deep Learning Taxonomy Supervised: Convolutional NN ( LeCun) Recurrent NN (Schmidhuber ) Unsupervised Deep Belief Nets / Stacked RBMs (Hinton) Autoencoders (Bengio, LeCun, A. Ng, )
Convolutional Networks
Convolutional NN Modern Convolutional Neural Networks is extension of traditional Multi-layer Perceptron, based on 3 basic ideas: Local receptive fields with Shared weights (“convolutional filter”) Spatial / temporal sub-sampling (“pooling”) New type of non-linear activation function - ReLU LeCun paper (1998) on text recognition: http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
What is Convolutional NN ? CNN - multi-layer NN architecture Convolutional + Non-Linear Layer Sub-sampling Layer Convolutional +Non-L inear Layer Fully connected layers Supervised Feature Extraction Classi- fication
What is Convolutional NN ? 2x2 Convolution + NL Sub-sampling Convolution + NL
CNN story: 1996 - MNIST Lenet-5 (1996) : core of CNR check reading system, used by US banks.
CNN story: 2012 - ILSVRC Imagenet data base: 14 mln labeled images, 20K categories
ILSVRC: Classification
Imagenet Classifications 2012
ILSVRC 2012: top rankers http://www.image-net.org/challenges/LSVRC/2012/results.html N Error-5 Algorithm Team Authors 1 0.153 Deep Conv. Neural Network Univ. of Toronto Krizhevsky et al 2 0.262 Features + Fisher Vectors + Linear classifier ISI Gunji et al 3 0.270 Features + FV + SVM OXFORD_VGG Simonyan et al 4 0.271 SIFT + FV + PQ + SVM XRCE/INRIA Perronin et al 5 0.300 Color desc. + SVM Univ. of Amsterdam van de Sande et al
Imagenet 2013: top rankers http://www.image-net.org/challenges/LSVRC/2013/results.php N Error-5 Algorithm Team Authors 1 0.117 Deep Convolutional Neural Network Clarifi Zeiler 2 0.129 Deep Convolutional Neural Networks Nat.Univ Singapore Min LIN 3 0.135 NYU Fergus 4 Andrew Howard 5 0.137 Overfeat Pierre Sermanet et al
Imagenet Classifications 2013
Conv Net Topology 5 convolutional layers 3 fully connected layers + soft-max 650K neurons , 60 Mln weights
Why ConvNet should be Deep? Rob Fergus, NIPS 2013
Why ConvNet should be Deep?
Why ConvNet should be Deep?
Why ConvNet should be Deep?
Why ConvNet should be Deep?
Deep Learning Applications
Machine Learning Workflow
Traditional Machine Learning Carrier flow http://blog.bigml.com/2013/06/13/matter-over-mind-in-machine-learning/
Deep Learning Carrier Flow Use pre-trained CNN for similar problem or re-train networks http://blog.bigml.com/2013/06/13/matter-over-mind-in-machine-learning/
CNN applications CNN is a big hammer Plenty low hanging fruits You need just a right nail!
Conv NN: Detection Sermanet, CVPR 2014
Conv NN: Scene parsing Farabet, PAMI 2013
CNN: indoor semantic labeling RGBD Farabet, 2013
Conv NN: Action Detection Taylor, ECCV 2010
Conv NN: Image Processing Eigen , ICCV 2010
Baidu Deep Speech: Scaling up end-to-end speech recognition ASR system developed using e2e deep learning. Baidu system is significantly simpler than traditional systems, which rely on laboriously engineered processing pipelines. Deep speech does not need hand-designed components to model background noise, speaker variation etc, but instead directly learns them
RNN-based Language Models
Playing games Existing Go-playing computer programs are still not competitive with Go professionals on 19×19 boards