Lecture: Deep Convolutional Neural Networks

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

ImageNet Classification with Deep Convolutional Neural Networks
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
Spatial Pyramid Pooling in Deep Convolutional
Object detection, deep learning, and R-CNNs
Deep Convolutional Nets
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.
Deep Residual Learning for Image Recognition
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Today’s Lecture Neural networks Training
Recent developments in object detection
CNN architectures Mostly linear structure
Demo.
Convolutional Neural Network
The Relationship between Deep Learning and Brain Function
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Machine Learning for Big Data
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Chilimbi, et al. (2014) Microsoft Research
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Matt Gormley Lecture 16 October 24, 2016
Lecture 24: Convolutional neural networks
Inception and Residual Architecture in Deep Convolutional Networks
Intelligent Information System Lab
ECE 6504 Deep Learning for Perception
Supervised Training of Deep Networks
Lecture 5 Smaller Network: CNN
Neural Networks 2 CS446 Machine Learning.
Convolution Neural Networks
How it Works: Convolutional Neural Networks
Training Techniques for Deep Neural Networks
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Image Question Answering
Introduction to Convolutional Neural Network (CNN/ConvNET)-insights from amateur George (Tian Zhou)
State-of-the-art face recognition systems
Fully Convolutional Networks for Semantic Segmentation
Bird-species Recognition Using Convolutional Neural Network
Computer Vision James Hays
Introduction to Neural Networks
Image Classification.
Two-Stream Convolutional Networks for Action Recognition in Videos
Convolutional Neural Networks
Object Classification through Deconvolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
Neural Networks Geoff Hulten.
Visualizing and Understanding Convolutional Networks
Designing Neural Network Architectures Using Reinforcement Learning
Problems with CNNs and recent innovations 2/13/19
Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures
Inception-v4, Inception-ResNet and the Impact of
Course Recap and What’s Next?
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
CIS 519 Recitation 11/15/18.
Natalie Lang Tomer Malach
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
Semantic Segmentation
Image recognition.
Presentation transcript:

Lecture: Deep Convolutional Neural Networks Shubhang Desai Stanford Vision and Learning Lab

Today’s agenda Deep convolutional networks History of CNNs CNN dev Architecture search

Classification Output Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Classification Output Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Classification Output Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Previously… 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 𝑦 𝐶𝐸 𝐿 3) Using gradient descent! 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier One question you might be having… 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value 3) Using gradient descent!

Previously… 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 𝑦 𝐶𝐸 𝐿 Why only one convolution? 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value 3) Using gradient descent!

Convolutions Convolutions = Insights More Convolutions = More Insights?

Recall Hubel and Weisel…

Recall Hubel and Weisel… The edges can be grouped into triangles and ovals… It’s a mouse toy! The thing has edges… The triangles are ears, the oval is a body… Hierarchical approach to how it sees things

Recall Hubel and Weisel… The edges can be grouped into triangles and ovals… It’s a mouse toy! The thing has edges… The triangles are ears, the oval is a body… Make a computer to do the same thing? Notice that we know we can make Sobel filter (i.e. very simple filter) to do low-level tasks

Convolutions Across Channels 28×28×3 Image 15×15×3 Filter 14×14×1 Output

Convolutions Across Channels Why would we want this? 28×28×3 Image 15×15×3×4 Filter 14×14×4 Output

Convolutions Across Channels more output channels = more filters = more features we can learn! 28×28×3 Image 15×15×3×4 Filter 14×14×4 Output

Convolutions Across Channels 15×15×3×4 Conv Block For simplicity

Stacking Convolutions 15×15×4×6 Conv Block 8×8×6×8 Conv Block 7×7×8×10 Conv Block 5×5×3×4 Conv Block Input turns into smaller spatial dimensions, channels increase in size to give more “features” to learn First few layers give edges, then shapes, then concepts, then classification 32×32×3 Input 28×28×4 Output 14×14×6 Output 7×7×8 Output 1×1×10 Output

Stacking Convolutions 15×15×4×6 Conv Block 8×8×6×8 Conv Block 7×7×8×10 Conv Block 5×5×3×4 Conv Block CONVOLUTIONAL NEURAL NETWORK! Input turns into smaller spatial dimensions, channels increase in size to give more “features” to learn First few layers give edges, then shapes, then concepts, then classification 32×32×3 Input 28×28×4 Output 14×14×6 Output 7×7×8 Output 1×1×10 Output

Convolutional Neural Networks (ConvNets) Neural networks which involve the stacking of multiple convolutional layers to produce output Often times end in fully-connected layers as the “classifier” Conv is featurizer, FC is classifer

History of ConvNets LeNet – 1998 Built at NYU in yann lecunn’s group They do average pooling After conv we stretch and then FC MNIST is 0-9 0.7% test accuracy on MNIST LeNet – 1998

History of ConvNets AlexNet – 2012 ImageNet (1000 fine-grained classes) 16.4% top5% test error rate (down from 26% test error rate) on Year when people started taking notice AlexNet – 2012

History of ConvNets NiN – 2013 1x1 convolutions Conv is done channel-wise We learn how to agglomerate the channels from the previous layer in the next layer We have, in a sense, a fully-connected layer that is learned to produce an output from the previous channels 8.8% test error rate on CIFAR-10 NiN – 2013

History of ConvNets Inception Network – 2015 Google We have 1x1, 3x3, 5x5, why are we picking? Why are we constrained to only picking one and going for it? We do everything and then concatenate the outputs in these inception modules We inject additional supervision into earlier layers bc deep nets are hard to train Why it’s called inception? (network in network or gotta go deeper) Official name is GoogleNet 6.7% test error rate on ImageNet (16-6 in just 3, 26-6 in just 4) Inception Network – 2015

Why Do They Work So Well?

Why Do They Work So Well?

Why Do They Work So Well?

Why Do They Work So Well?

This is the neural network’s “receptive field”—it’s able to see! Why Do They Work So Well? This is the neural network’s “receptive field”—it’s able to see! Sees it, thinks and learns, and then passes Spatial dependence, learns local regions Instead of look at the whole image at once It builds local features into hierarchical understanding

Great Applications of ConvNets Fine-Grained Recognition Segmentation Art Generation Facial Recognition “Staffordshire Bull Terrier” Segmentation uses deconvolutions Facial recognitions is different from classification Want to instead embed image into low-dimensional space Similar to eigenfaces Faces don’t all need to look super similar We learn the function to do this! “Ranjay Krishna”

What is CNN Dev? Define the objective Create the architecture What is the input/output? What is the loss/objective function? Create the architecture How many conv layers? What size are the convolutions? How many fully-connected layers? Define hyperparameters What is the learning rate? Train and evaluate How did we do? How can we do better?

What is CNN Dev? Define the objective Create the architecture What is the input/output? What is the loss/objective function? Create the architecture How many conv layers? What size are the convolutions? How many fully-connected layers? Define hyperparameters What is the learning rate? Train and evaluate How did we do? How can we do better? Can this be automated?

Neural Architecture Search Automatically finds the best architecture for a given task Before we had to find best featurizer for a fixed classifier—now we find the best classifier and featurizer in tandem! Use RL and RNN to learn the best way to learn how to do a task Ask RNN to produce the best hyperparameters Accuracy gives reward signal for policy networks Before we had fixed classifier (KNN, Linear, SVM) and finding best features Now we are finding the best classifier and the best featurizer all at once (and what’s the best way to formulate all this all at once)

In summary… We can use convolutions as a basis to build powerful visual systems We can leverage deep learning to automatically learn the best ways to do previously difficult tasks in computer vision Still lots of open questions! If you’re interested in machine learning and/or deep learning, take: Machine Learning (CS 229) Deep Learning (CS 230) NLP with Deep Learning (CS 224n) Convolutional Neural Networks (CS 231n) Don’t think we’ve solved computer vision