Lecture: Deep Convolutional Neural Networks

Lecture: Deep Convolutional Neural Networks
Shubhang Desai Stanford Vision and Learning Lab

Today’s agenda Deep convolutional networks History of CNNs CNN dev
Architecture search

Classification Output
Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Previously… 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 𝑦 𝐶𝐸 𝐿 3) Using gradient descent!
32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier One question you might be having… 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value 3) Using gradient descent!

Previously… 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 𝑦 𝐶𝐸 𝐿 Why only one convolution?
32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value 3) Using gradient descent!

Convolutions Convolutions = Insights More Convolutions = More Insights?

Recall Hubel and Weisel…

The edges can be grouped into triangles and ovals… It’s a mouse toy! The thing has edges… The triangles are ears, the oval is a body… Hierarchical approach to how it sees things

The edges can be grouped into triangles and ovals… It’s a mouse toy! The thing has edges… The triangles are ears, the oval is a body… Make a computer to do the same thing? Notice that we know we can make Sobel filter (i.e. very simple filter) to do low-level tasks

Convolutions Across Channels
28×28×3 Image 15×15×3 Filter 14×14×1 Output

Why would we want this? 28×28×3 Image 15×15×3×4 Filter 14×14×4 Output

more output channels = more filters = more features we can learn! 28×28×3 Image 15×15×3×4 Filter 14×14×4 Output

15×15×3×4 Conv Block For simplicity

Stacking Convolutions
15×15×4×6 Conv Block 8×8×6×8 Conv Block 7×7×8×10 Conv Block 5×5×3×4 Conv Block Input turns into smaller spatial dimensions, channels increase in size to give more “features” to learn First few layers give edges, then shapes, then concepts, then classification 32×32×3 Input 28×28×4 Output 14×14×6 Output 7×7×8 Output 1×1×10 Output

Stacking Convolutions
15×15×4×6 Conv Block 8×8×6×8 Conv Block 7×7×8×10 Conv Block 5×5×3×4 Conv Block CONVOLUTIONAL NEURAL NETWORK! Input turns into smaller spatial dimensions, channels increase in size to give more “features” to learn First few layers give edges, then shapes, then concepts, then classification 32×32×3 Input 28×28×4 Output 14×14×6 Output 7×7×8 Output 1×1×10 Output

Convolutional Neural Networks (ConvNets)
Neural networks which involve the stacking of multiple convolutional layers to produce output Often times end in fully-connected layers as the “classifier” Conv is featurizer, FC is classifer

History of ConvNets LeNet – 1998 Built at NYU in yann lecunn’s group
They do average pooling After conv we stretch and then FC MNIST is 0-9 0.7% test accuracy on MNIST LeNet – 1998

History of ConvNets AlexNet – 2012
ImageNet (1000 fine-grained classes) 16.4% top5% test error rate (down from 26% test error rate) on Year when people started taking notice AlexNet – 2012

History of ConvNets NiN – 2013 1x1 convolutions
Conv is done channel-wise We learn how to agglomerate the channels from the previous layer in the next layer We have, in a sense, a fully-connected layer that is learned to produce an output from the previous channels 8.8% test error rate on CIFAR-10 NiN – 2013

History of ConvNets Inception Network – 2015 Google
We have 1x1, 3x3, 5x5, why are we picking? Why are we constrained to only picking one and going for it? We do everything and then concatenate the outputs in these inception modules We inject additional supervision into earlier layers bc deep nets are hard to train Why it’s called inception? (network in network or gotta go deeper) Official name is GoogleNet 6.7% test error rate on ImageNet (16-6 in just 3, 26-6 in just 4) Inception Network – 2015

Why Do They Work So Well?

This is the neural network’s “receptive field”—it’s able to see!
Why Do They Work So Well? This is the neural network’s “receptive field”—it’s able to see! Sees it, thinks and learns, and then passes Spatial dependence, learns local regions Instead of look at the whole image at once It builds local features into hierarchical understanding

Great Applications of ConvNets
Fine-Grained Recognition Segmentation Art Generation Facial Recognition “Staffordshire Bull Terrier” Segmentation uses deconvolutions Facial recognitions is different from classification Want to instead embed image into low-dimensional space Similar to eigenfaces Faces don’t all need to look super similar We learn the function to do this! “Ranjay Krishna”

What is CNN Dev? Define the objective Create the architecture
What is the input/output? What is the loss/objective function? Create the architecture How many conv layers? What size are the convolutions? How many fully-connected layers? Define hyperparameters What is the learning rate? Train and evaluate How did we do? How can we do better?

What is CNN Dev? Define the objective Create the architecture
What is the input/output? What is the loss/objective function? Create the architecture How many conv layers? What size are the convolutions? How many fully-connected layers? Define hyperparameters What is the learning rate? Train and evaluate How did we do? How can we do better? Can this be automated?

Neural Architecture Search
Automatically finds the best architecture for a given task Before we had to find best featurizer for a fixed classifier—now we find the best classifier and featurizer in tandem! Use RL and RNN to learn the best way to learn how to do a task Ask RNN to produce the best hyperparameters Accuracy gives reward signal for policy networks Before we had fixed classifier (KNN, Linear, SVM) and finding best features Now we are finding the best classifier and the best featurizer all at once (and what’s the best way to formulate all this all at once)

In summary… We can use convolutions as a basis to build powerful visual systems We can leverage deep learning to automatically learn the best ways to do previously difficult tasks in computer vision Still lots of open questions! If you’re interested in machine learning and/or deep learning, take: Machine Learning (CS 229) Deep Learning (CS 230) NLP with Deep Learning (CS 224n) Convolutional Neural Networks (CS 231n) Don’t think we’ve solved computer vision

Lecture: Deep Convolutional Neural Networks

Similar presentations

Presentation on theme: "Lecture: Deep Convolutional Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture: Deep Convolutional Neural Networks

Similar presentations

Presentation on theme: "Lecture: Deep Convolutional Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback