A shallow look at Deep Learning

Slides:



Advertisements
Similar presentations
Convolutional Neural Networks
Advertisements

Advanced topics.
ImageNet Classification with Deep Convolutional Neural Networks
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deconvolutional Networks
Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.
Large-Scale Object Recognition with Weak Supervision
Image classification by sparse coding.
Learning Convolutional Feature Hierarchies for Visual Recognition
Deep Learning.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Convolutional Neural Nets
Image Classification using Sparse Coding: Advanced Topics
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
Playing with features for learning and prediction Jongmin Kim Seoul National University.
CSC2535: Advanced Machine Learning Lecture 6a Convolutional neural networks for hand-written digit recognition Geoffrey Hinton.
A shallow introduction to Deep Learning
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
Object detection, deep learning, and R-CNNs
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
Deep Convolutional Nets
Neural networks in modern image processing Petra Budíková DISA seminar,
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
Introduction to Convolutional Neural Networks
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Convolutional Neural Networks
Recent developments in object detection
Convolutional Neural Network
Deep Learning Amin Sobhani.
Object detection with deformable part-based models
ECE 5424: Introduction to Machine Learning
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Learning Insights and Open-ended Questions
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
Deep learning and applications to Natural language processing
Training Techniques for Deep Neural Networks
Deep Belief Networks Psychology 209 February 22, 2013.
Dipartimento di Ingegneria «Enzo Ferrari»
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
State-of-the-art face recognition systems
Computer Vision James Hays
Introduction to Neural Networks
Image Classification.
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Convolutional Neural Networks
Deep learning Introduction Classes of Deep Learning Networks
Goodfellow: Chapter 14 Autoencoders
Visualizing and Understanding Convolutional Networks
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Presentation transcript:

A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially) and Rob Fergus https://sites.google.com/site/deeplearningcvpr2014

Goal of this final course section To understand deep learning as a computer vision practitioner To understand the mechanics of a deep convolutional network at test time (the forward pass) To have an intuition for the learning mechanisms used to train CNNs (the backward pass / backpropagation) To know of recent success stories with learned representations in computer vision To gain an intuition for what is going on inside deep convolutional networks

http://www.cc.gatech.edu/~zk15/DL2016/deep_learning_course.html

Traditional Recognition Approach Features are not learned feature representation (hand-crafted) Learning Algorithm (e.g., SVM) Input data (pixels) Image Low-level vision features (edges, SIFT, HOG, etc.) Object detection / classification

Computer vision features SIFT Spin image Textons HoG and many others: SURF, MSER, LBP, Color-SIFT, Color histogram, GLOH, …..

Motivation Features are key to recent progress in recognition Multitude of hand-designed features currently in use Where next? Better classifiers? building better features? Felzenszwalb, Girshick, McAllester and Ramanan, PAMI 2007 Yan & Huang (Winner of PASCAL 2010 classification competition) Slide: R. Fergus

What Limits Current Performance? Ablation studies on Deformable Parts Model – Felzenszwalb, Girshick, McAllester, Ramanan, PAMI’10 Replace each part with humans (Amazon Turk): Parikh & Zitnick, CVPR’10 Also removal of part deformations has small (<2%) effect. – Are “Deformable Parts” necessary in the Deformable Parts Model? Divvala, Hebert, Efros, ECCV 2012 Slide: R. Fergus

Mid-Level Representations Mid-level cues Continuation Parallelism Junctions Corners “Tokens” from Vision by D.Marr: Object parts: Difficult to hand-engineer  What about learning them? Slide: R. Fergus

Learning Feature Hierarchy Learn hierarchy All the way from pixels  classifier One layer extracts features from output of previous layer Image/Video Pixels Simple Classifier Layer 1 Layer 2 Layer 3 Train all layers jointly Slide: R. Fergus

Learning Feature Hierarchy 1. Learn useful higher-level features from images Feature representation 3rd layer “Objects” Input data 2nd layer “Object parts” 1st layer “Edges” Lee et al., ICML 2009; CACM 2011 Pixels 2. Fill in representation gap in recognition

Learning Feature Hierarchy Better performance Other domains (unclear how to hand engineer): Kinect Video Multi spectral Feature computation time Dozens of features now regularly used [e.g., MKL] Getting prohibitive for large datasets (10’s sec /image) Slide: R. Fergus

Approaches to learning features Supervised Learning End-to-end learning of deep architectures (e.g., deep neural networks) with back-propagation Works well when the amounts of labels is large Structure of the model is important (e.g. convolutional structure) Unsupervised Learning Learn statistical structure or dependencies of the data from unlabeled data Layer-wise training Useful when the amount of labels is not large

Taxonomy of feature learning methods Supervised • T Support Vector Machine Deep Neural Net Logistic Regression Convolutional Neural Net Perceptron Recurrent Neural Net Shallow Deep Denoising Autoencoder Deep (stacked) Denoising Autoencoder* Restricted Boltzmann machines* Deep Belief Nets* Sparse coding* Deep Boltzmann machines* Hierarchical Sparse Coding* Unsupervised * supervised version exists

Supervised Learning

Example: Convolutional Neural Networks LeCun et al. 1989 Neural network with specialized connectivity structure Slide: R. Fergus

Convolutional Neural Networks Feature maps Feed-forward: Convolve input Non-linearity (rectified linear) Pooling (local max) Supervised Train convolutional filters by back-propagating classification error LeCun et al. 1998 Pooling Non-linearity Convolution (Learned) Input Image Slide: R. Fergus

Components of Each Layer Filter with Dictionary (convolutional or tiled) + Non-linearity Pixels / Features Spatial/Feature (Sum or Max) Normalization between feature responses [Optional] Output Features Slide: R. Fergus

Filtering Convolutional Input Feature Map Dependencies are local Translation equivariance Tied filter weights (few params) Stride 1,2,… (faster, less mem.) . Input Feature Map Slide: R. Fergus

Non-Linearity Non-linearity Per-element (independe nt) Tanh Sigmoid: 1/(1+exp(-x)) Rectified linear Simplifies backprop Makes learning faster Avoids saturation issues  Preferred option nt) Slide: R. Fergus

Pooling Spatial Pooling Non-overlapping / overlapping regions Sum or max Boureau et al. ICML’10 for theoretical analysis Max Sum Slide: R. Fergus

After Contrast Normalization Contrast normalization (across feature maps) Local mean = 0, local std. = 1, “Local”  7x7 Gaussian Equalizes the features maps Feature Maps After Contrast Normalization Feature Maps Slide: R. Fergus

Compare: SIFT Descriptor Image Pixels Apply Gabor filters Spatial pool (Sum) Feature Vector Normalize to unit length Slide: R. Fergus

Applications Handwritten text/digits Simpler recognition benchmarks – MNIST (0.17% error [Ciresan et al. 2011]) – Arabic & Chinese [Ciresan et al. 2012] Simpler recognition benchmarks – CIFAR-10 (9.3% error [Wan et al. 2013]) Traffic sign recognition 0.56% error vs 1.16% for humans [Ciresan et al. 2011] Slide: R. Fergus

Application: ImageNet ~14 million labeled images, 20k classes Images gathered from Internet Human labels via Amazon Turk [Deng et al. CVPR 2009]

Krizhevsky et al. [NIPS 2012] Same model as LeCun’98 but: Bigger model (8 layers) More data (106 vs 103 images) GPU implementation (50x speedup over CPU) Better regularization (DropOut) 7 hidden layers, 650,000 neurons, 60,000,000 parameters Trained on 2 GPUs for a week

ImageNet Classification 2012 Krizhevsky et al. -- 16.4% error (top-5) Next best (non-convnet) – 26.2% error 35 30 25 20 15 10 5 Top-5 error rate % SuperVision ISI Oxford INRIA Amsterdam

ImageNet Classification 2013 Results http://www.image-net.org/challenges/LSVRC/2013/results.php 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 Test error (top-5)

Feature Generalization (Caltech-101,256) (Caltech-101, SunS) (VOC 2012) (lots of datasets) Zeiler & Fergus, arXiv 1311.2901, 2013 Girshick et al. CVPR’14 Oquab et al. CVPR’14 Razavian et al. arXiv 1403.6382, 2014 6 training examples Pre-train on Imagnet Retrain classifier on Caltech256 CNN features From Zeiler & Fergus, Visualizing and Understanding Convolutional Networks, arXiv 1311.2901, 2013 Bo, Ren, Fox, CVPR 2013 Sohn, Jung, Lee, Hero, ICCV 2011 Slide: R. Fergus

Industry Deployment Used in Facebook, Google, Microsoft Image Recognition, Speech Recognition, …. Fast at test time Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14 Slide: R. Fergus

Unsupervised Learning

Unsupervised Learning Model distribution of input data Can use unlabeled data (unlimited) Can be refined with standard supervised techniques (e.g. backprop) Useful when the amount of labels is small

Unsupervised Learning Main idea: model distribution of input data Reconstruction error + regularizer (sparsity, denoising, etc.) Log-likelihood of data Models Basic: PCA, KMeans Denoising autoencoders Sparse autoencoders Restricted Boltzmann machines Sparse coding Independent Component Analysis – …

Example: Auto-Encoder Output Features Feed-back / generative / top-down path Decoder Encoder Feed-forward / bottom-up path Input (Image/ Features) Bengio et al., NIPS’07; Vincent et al., ICML’08 Slide: R. Fergus

Stacked Auto-Encoders Class label Decoder Encoder Features Decoder Encoder Bengio et al., NIPS’07; Vincent et al., ICML’08; Ranzato et al., NIPS’07 Input Image Slide: R. Fergus

At Test Time Encoder Encoder Encoder Class label Remove decoders Features Use feed-forward path Encoder Gives standard (Convolutional) Neural Network Features Can fine-tune with backprop Encoder Bengio et al., NIPS’07; Vincent et al., ICML’08; Ranzato et al., NIPS’07 Input Image Slide: R. Fergus

Learning basis vectors for images Natural Images Learned bases: “Edges” Test example ~ 0.8 * + 0.3 * + 0.5 * x w + 0.3 * w42 w65 36 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = coefficients (feature representation) Compact & easily interpretable [Olshausen & Field, Nature 1996, Ranzato et al., NIPS 2007; Lee et al., NIPS 2007; Lee et al., NIPS 2008; Jarret et al., CVPR 2009; etc.]

Learning Feature Hierarchy Higher layer (Combinations of edges) First layer (edges) Input image (pixels) [Olshausen & Field, Nature 1996, Ranzato et al., NIPS 2007; Lee et al., NIPS 2007; Lee et al., NIPS 2008; Jarret et al., CVPR 2009; etc.]

Learning object representations Learning objects and parts in images Large image patches contain interesting higher-level structures. – E.g., object parts and full objects

Unsupervised learning of feature hierarchy Advantage of shrinking “Eye detector” Filter size is kept small Invariance “Shrink” (max over 2x2) filter1 filter2 filter3 filter4 “Filtering” output Input image H. Lee, R. Grosse, R. Ranganath, A. Ng, ICML 2009; Comm. ACM 2011

Unsupervised learning of feature hierarchy Layer 3 Layer 3 activation (coefficients) W3 Layer 2 Layer 2 activation (coefficients) W2 Layer 1 Layer 1 activation (coefficients) Filter visualization W1 Input image Example image H. Lee, R. Grosse, R. Ranganath, A. Ng, ICML 2009; Comm. ACM 2011

Unsupervised learning from natural images Second layer bases contours, corners, arcs, surface boundaries First layer bases localized, oriented edges Related work: Zeiler et al., CVPR’10, ICCV’11; Kavuckuglou et al., NIPS’09

Learning object-part decomposition Faces Cars Elephants Chairs Applications: Object recognition (Lee et al., ICML’09, Sohn et al., ICCV’11; Sohn et al., ICML’13) Verification (Huang et al., CVPR’12) • Cf. Convnet [Krizhevsky et al., 2012]; Image alignment (Huang et al., NIPS’12) Deconvnet [Zeiler et al., CVPR 2010]

Large-scale unsupervised learning Large-scale deep autoencoder (three layers) Each stage consists of local filtering L2 pooling local contrast normalization Training jointly the three layers by: reconstructing the input of each layer sparsity on the code Le et al. “Building high-level features using large-scale unsupervised learning, 2011 Slide: M. Ranzato

Large-scale unsupervised learning Large-scale deep autoencoder Discovers high-level features from large amounts of unlabeled data Achieved state-of-the-art performance on Imagenet classification 10k categories Le et al. “Building high-level features using large-scale unsupervised learning, 2011

Supervised vs. Unsupervised Supervised models Work very well with large amounts of labels (e.g., imagenet) Convolutional structure is important Unsupervised models Work well given limited amounts of labels. Promise of exploiting virtually unlimited amount of data without need of labeling

Summary Deep Learning of Feature Hierarchies showing great promises for computer vision problems More details will be presented later: Basics: Supervised and Unsupervised Libraries: Torch7, Theano/Pylearn2, CAFFE Advanced topics: Object detection, localization, structured output prediction, learning from videos, multimodal/multitask learning, structured output prediction

Tutorial Overview https://sites.google.com/site/deeplearningcvpr2014 Basics – Introduction Honglak Lee Marc’Aurelio Ranzato Graham Taylor Supervised Learning Unsupervised Learning Libraries Torch7 Theano/Pylearn2 CAFFE Advanced topics Object detection Regression methods for localization Large scale classification and GPU parallelization Learning transformations from videos Multimodal and multi task learning Structured prediction Marc’Aurelio Ranzato Ian Goodfellow Yangqing Jia Pierre Sermanet Alex Toshev Alex Krizhevsky Roland Memisevic Honglak Lee Yann LeCun