Lecture: Deep Convolutional Neural Networks

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

ImageNet Classification with Deep Convolutional Neural Networks

K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.

Spatial Pyramid Pooling in Deep Convolutional

Object detection, deep learning, and R-CNNs

Deep Convolutional Nets

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.

Deep Residual Learning for Image Recognition

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Today’s Lecture Neural networks Training

Recent developments in object detection

CNN architectures Mostly linear structure

Convolutional Neural Network

The Relationship between Deep Learning and Brain Function

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Machine Learning for Big Data

Deep Learning Amin Sobhani.

Compact Bilinear Pooling

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Chilimbi, et al. (2014) Microsoft Research

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Matt Gormley Lecture 16 October 24, 2016

Lecture 24: Convolutional neural networks

Inception and Residual Architecture in Deep Convolutional Networks

Intelligent Information System Lab

ECE 6504 Deep Learning for Perception

Supervised Training of Deep Networks

Lecture 5 Smaller Network: CNN

Neural Networks 2 CS446 Machine Learning.

Convolution Neural Networks

How it Works: Convolutional Neural Networks

Training Techniques for Deep Neural Networks

CS6890 Deep Learning Weizhen Cai

Machine Learning: The Connectionist

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Image Question Answering

Introduction to Convolutional Neural Network (CNN/ConvNET)-insights from amateur George (Tian Zhou)

State-of-the-art face recognition systems

Fully Convolutional Networks for Semantic Segmentation

Bird-species Recognition Using Convolutional Neural Network

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

Two-Stream Convolutional Networks for Action Recognition in Videos

Convolutional Neural Networks

Object Classification through Deconvolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

Neural Networks Geoff Hulten.

Visualizing and Understanding Convolutional Networks

Designing Neural Network Architectures Using Reinforcement Learning

Problems with CNNs and recent innovations 2/13/19

Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures

Inception-v4, Inception-ResNet and the Impact of

Course Recap and What’s Next?

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

CSC 578 Neural Networks and Deep Learning

CIS 519 Recitation 11/15/18.

Natalie Lang Tomer Malach

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

Semantic Segmentation

Image recognition.

Presentation transcript:

Lecture: Deep Convolutional Neural Networks Shubhang Desai Stanford Vision and Learning Lab

Today’s agenda Deep convolutional networks History of CNNs CNN dev Architecture search

Classification Output Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Classification Output Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Classification Output Previously… 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value

Previously… 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 𝑦 𝐶𝐸 𝐿 3) Using gradient descent! 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier One question you might be having… 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value 3) Using gradient descent!

Previously… 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 𝑦 𝐶𝐸 𝐿 Why only one convolution? 32x32x10 Conv Block 𝑎𝑟𝑔𝑚𝑎𝑥 𝑐 𝑝𝑟𝑒𝑑 Feature Extractor Classification Output Prediction 𝑦 Input Image Classifier 2) By modifying this… 1) Minimize this… 𝑦 𝐶𝐸 𝐿 Input Label Loss Function Loss Value 3) Using gradient descent!

Convolutions Convolutions = Insights More Convolutions = More Insights?

Recall Hubel and Weisel…

Recall Hubel and Weisel… The edges can be grouped into triangles and ovals… It’s a mouse toy! The thing has edges… The triangles are ears, the oval is a body… Hierarchical approach to how it sees things

Recall Hubel and Weisel… The edges can be grouped into triangles and ovals… It’s a mouse toy! The thing has edges… The triangles are ears, the oval is a body… Make a computer to do the same thing? Notice that we know we can make Sobel filter (i.e. very simple filter) to do low-level tasks

Convolutions Across Channels 28×28×3 Image 15×15×3 Filter 14×14×1 Output

Convolutions Across Channels Why would we want this? 28×28×3 Image 15×15×3×4 Filter 14×14×4 Output

Convolutions Across Channels more output channels = more filters = more features we can learn! 28×28×3 Image 15×15×3×4 Filter 14×14×4 Output

Convolutions Across Channels 15×15×3×4 Conv Block For simplicity

Stacking Convolutions 15×15×4×6 Conv Block 8×8×6×8 Conv Block 7×7×8×10 Conv Block 5×5×3×4 Conv Block Input turns into smaller spatial dimensions, channels increase in size to give more “features” to learn First few layers give edges, then shapes, then concepts, then classification 32×32×3 Input 28×28×4 Output 14×14×6 Output 7×7×8 Output 1×1×10 Output

Stacking Convolutions 15×15×4×6 Conv Block 8×8×6×8 Conv Block 7×7×8×10 Conv Block 5×5×3×4 Conv Block CONVOLUTIONAL NEURAL NETWORK! Input turns into smaller spatial dimensions, channels increase in size to give more “features” to learn First few layers give edges, then shapes, then concepts, then classification 32×32×3 Input 28×28×4 Output 14×14×6 Output 7×7×8 Output 1×1×10 Output

Convolutional Neural Networks (ConvNets) Neural networks which involve the stacking of multiple convolutional layers to produce output Often times end in fully-connected layers as the “classifier” Conv is featurizer, FC is classifer

History of ConvNets LeNet – 1998 Built at NYU in yann lecunn’s group They do average pooling After conv we stretch and then FC MNIST is 0-9 0.7% test accuracy on MNIST LeNet – 1998

History of ConvNets AlexNet – 2012 ImageNet (1000 fine-grained classes) 16.4% top5% test error rate (down from 26% test error rate) on Year when people started taking notice AlexNet – 2012

History of ConvNets NiN – 2013 1x1 convolutions Conv is done channel-wise We learn how to agglomerate the channels from the previous layer in the next layer We have, in a sense, a fully-connected layer that is learned to produce an output from the previous channels 8.8% test error rate on CIFAR-10 NiN – 2013

History of ConvNets Inception Network – 2015 Google We have 1x1, 3x3, 5x5, why are we picking? Why are we constrained to only picking one and going for it? We do everything and then concatenate the outputs in these inception modules We inject additional supervision into earlier layers bc deep nets are hard to train Why it’s called inception? (network in network or gotta go deeper) Official name is GoogleNet 6.7% test error rate on ImageNet (16-6 in just 3, 26-6 in just 4) Inception Network – 2015

Why Do They Work So Well?

Why Do They Work So Well?

Why Do They Work So Well?

Why Do They Work So Well?

This is the neural network’s “receptive field”—it’s able to see! Why Do They Work So Well? This is the neural network’s “receptive field”—it’s able to see! Sees it, thinks and learns, and then passes Spatial dependence, learns local regions Instead of look at the whole image at once It builds local features into hierarchical understanding

Great Applications of ConvNets Fine-Grained Recognition Segmentation Art Generation Facial Recognition “Staffordshire Bull Terrier” Segmentation uses deconvolutions Facial recognitions is different from classification Want to instead embed image into low-dimensional space Similar to eigenfaces Faces don’t all need to look super similar We learn the function to do this! “Ranjay Krishna”

What is CNN Dev? Define the objective Create the architecture What is the input/output? What is the loss/objective function? Create the architecture How many conv layers? What size are the convolutions? How many fully-connected layers? Define hyperparameters What is the learning rate? Train and evaluate How did we do? How can we do better?

What is CNN Dev? Define the objective Create the architecture What is the input/output? What is the loss/objective function? Create the architecture How many conv layers? What size are the convolutions? How many fully-connected layers? Define hyperparameters What is the learning rate? Train and evaluate How did we do? How can we do better? Can this be automated?

Neural Architecture Search Automatically finds the best architecture for a given task Before we had to find best featurizer for a fixed classifier—now we find the best classifier and featurizer in tandem! Use RL and RNN to learn the best way to learn how to do a task Ask RNN to produce the best hyperparameters Accuracy gives reward signal for policy networks Before we had fixed classifier (KNN, Linear, SVM) and finding best features Now we are finding the best classifier and the best featurizer all at once (and what’s the best way to formulate all this all at once)

In summary… We can use convolutions as a basis to build powerful visual systems We can leverage deep learning to automatically learn the best ways to do previously difficult tasks in computer vision Still lots of open questions! If you’re interested in machine learning and/or deep learning, take: Machine Learning (CS 229) Deep Learning (CS 230) NLP with Deep Learning (CS 224n) Convolutional Neural Networks (CS 231n) Don’t think we’ve solved computer vision