Problems with CNNs and recent innovations 2/13/19

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

CS590M 2008 Fall: Paper Presentation
Why equivariance is better than premature invariance
ImageNet Classification with Deep Convolutional Neural Networks
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Neural Networks for Machine Learning Lecture 16a Learning a joint.
A shallow introduction to Deep Learning
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 22: Transforming autoencoders for learning the right representation of shapes Geoffrey.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.
Recent developments in object detection
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Object Detection based on Segment Masks
Deep Learning Amin Sobhani.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Lecture 24: Convolutional neural networks
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Training Techniques for Deep Neural Networks
Deep Learning Qing LU, Siyuan CAO.
Convolutional Networks
Deep Belief Networks Psychology 209 February 22, 2013.
Machine Learning: The Connectionist
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Adversarially Tuned Scene Generation
Computer Vision James Hays
CSC 578 Neural Networks and Deep Learning
Incremental Training of Deep Convolutional Neural Networks
CS 4501: Introduction to Computer Vision Training Neural Networks II
Deep Learning Hierarchical Representations for Image Steganalysis
Deep learning Introduction Classes of Deep Learning Networks
Introduction of MATRIX CAPSULES WITH EM ROUTING
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
CSC 578 Neural Networks and Deep Learning
Creating Data Representations
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural Networks Geoff Hulten.
On Convolutional Neural Network
Lecture: Deep Convolutional Neural Networks
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio
Convolutional Neural Networks
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Done.
Inception-v4, Inception-ResNet and the Impact of
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Reuben Feinman Research advised by Brenden Lake
Human-object interaction
Natalie Lang Tomer Malach
Neural Machine Translation using CNN
End-to-End Facial Alignment and Recognition
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Problems with CNNs and recent innovations 2/13/19 CIS 700-004: Lecture 5W Problems with CNNs and recent innovations 2/13/19 Done

Problems with CNNs and recent innovations

Today's Agenda Good inductive biases The capsule nets architecture The dynamic routing algorithm Capsule nets in PyTorch Resnets

Motivating better architectures

Local translational invariance is bad

The Picasso problem The Picasso problem - the object is more than the sum of its parts Silicon valley reference -- food from many angles

Equivariance rather than invariance We want equivariants: properties that change predictably under transformation. Silicon valley reference, but globally we want invariance!

2. Human perception

There are many aspects of pose (vector). Pose: collection of spatially equivariant properties Translation Rotation Scale Reflection In today's context, includes non-spatial features Color Illumination

3. Objects and their parts

Inverse graphics: spatiotemporal continuity Hinton's motivation: https://youtu.be/rTawFwUvnLE?t=1265

4. Routing: reusing knowledge.

What we want: intelligent routing We would like the forward pass to be dynamic. Lower-level neurons should be able to predict higher-level neurons (a bit).

What max pooling does instead Dynamically routes … the loudest activation in a region. Ensure that information about exact localization is erased.

“The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.” -- Geoffrey Hinton

Given all these opportunities to improve CNNs, what might we hope for from a superior architecture?

Our wishlist for a new architecture Awesome priors Translational equivariance Hierarchical composition: the world is made up of objects that have properties. Inverse-graphics: objects move linearly in space (translation) and rotate. Information is properly routed to the appropriate neurons. Routes by "agreement" rather than by "volume." Interpretable Clear representation of learned features Visualization of internal representation Learns with very few examples (fewer than 5 per class?) Outperforms CNNs in accuracy Runs blazingly fast hierarchy

What capsule nets give us Awesome priors Translational equivariance Hierarchical composition: the world is made up of objects that have properties. Inverse-graphics: objects move linearly in space (translation) and rotate. Information is properly routed to the appropriate neurons. Routes by "agreement" rather than by "volume." Interpretable Clear representation of learned features Visualization of internal representation Learns with very few examples (fewer than 5 per class?) Outperforms CNNs in accuracy Runs blazingly fast

Geoffrey Hinton English-Canadian cognitive psychologist and computer scientist Popularized backpropagation The "Godfather of Deep Learning" Co-invented Boltzmann machines Contributed to AlexNet Advised Yann LeCunn, Ilya Sutskever, Radford Neal, Brendan Frey Creator of capsule nets

The architecture of capsule nets

What are capsules?

What are capsules? Capsules generalize the concept of neurons. Neurons map from a vector of scalars to a single scalar output. Capsules map from a vector of vectors to a vector output.

What are capsules? Capsules generalize the concept of neurons. Neurons map from a vector of scalars to a single scalar output. Capsules map from a vector of vectors to a vector output. A capsule semantically represents a feature. The vector output length is the probability that the feature is present in the input. The vector output direction encodes the properties of the feature.

Anatomy of a capsule f for faces Affine transform Prior probabilities Feature 1     … …     Feature n (Nose)     Static weight Dynamic weight Intermediate output Relates typical nose location to face location Estimated probability that noses relate to faces pos phrasing

Anatomy of a capsule f for faces Affine transform Prior probabilities Input Feature 1       … …     Feature n (Nose)       Static weight Dynamic weight Intermediate output Feature vector of nosiness

Anatomy of a capsule f for faces Affine transform Prior probabilities Input * Feature 1         … …     * Feature n (Nose)         Static weight Dynamic weight Intermediate output Nose feature's estimate for what the face should be like

Anatomy of a capsule f for faces Affine transform Prior probabilities Input * * Feature 1         … …       * * Feature n (Nose)         Static weight Dynamic weight Intermediate output

Our nonlinearity σ: the squash function Ask what we'd like to see from the nonlinearity

Our nonlinearity σ: the squash function Ask what we'd like to see from the nonlinearity

Our nonlinearity σ: the squash function Recall: each capsule’s output semantically represents a feature Vector length is the probability of the feature being in the input. Vector direction encodes properties of the features Ask what we'd like to see from the nonlinearity

Our nonlinearity σ: the squash function Recall: each capsule’s output semantically represents a feature Vector length is the probability of the feature being in the input. Vector direction encodes properties of the features We would like to bound the range of the output to [0, 1].

Our nonlinearity σ: the squash function Recall: each capsule’s output semantically represents a feature Vector length is the probability of the feature being in the input. Vector direction encodes properties of the features We would like to bound the range of the output to [0, 1]. Scaling factor Unit vector

Our nonlinearity σ: the squash function Recall: each capsule’s output semantically represents a feature Vector length is the probability of the feature being in the input. Vector direction encodes properties of the features We would like to bound the range of the output to [0, 1]. Scaling factor Unit vector

Anatomy of a capsule f for faces Affine transform Prior probabilities Input * * Feature 1         Output (Iteration 1) … …         * * Feature n (Nose)         Static weight Dynamic weight Intermediate output

Goal of routing by agreement

Routing between capsules (v1) Clusters are a powerful signal in high dimensions. How might we detect clusters in the forward pass?

Routing between capsules (v1) Hinton's visualization https://youtu.be/rTawFwUvnLE?t=3106

Routing between capsules (v2): dynamic routing

Anatomy of a capsule f for faces Affine transform Posterior probabilities Input * * Feature 1         Output (Iteration 1) … …       * * Feature n (Nose)         Static weight Dynamic weight Intermediate output

Anatomy of a capsule f for faces Affine transform Posterior probabilities Input * * Feature 1         Output (Iteration 2) … …         * * Feature n (Nose)         Static weight Dynamic weight Intermediate output

Anatomy of a capsule f for faces Affine transform Posterior probabilities Input * * Feature 1         Output (Iteration 2) … …       * * Feature n (Nose)         Static weight Dynamic weight Intermediate output

Anatomy of a capsule f for faces Affine transform Posterior probabilities Input * * Feature 1         Output (Iteration 2) … …         * * Feature n (Nose)         Static weight Dynamic weight Intermediate output

Anatomy of a capsule f for faces After r iterations… Affine transform Posterior probabilities Input * * Feature 1         Output (Iteration r) … …         * * Feature n (Nose)         Final face feature *

The overall capsule net architecture olshausen vanessen dynamic routing

Margin loss

Reconstruction: visualizing the architecture's encoding

Interpretation

Interpreting a Mistake Ordered triples are (true label, prediction, and reconstructed capsule)

Results

Capsule networks are state-of-the-art. MNIST: 0.25% error (current record) Baseline CNN: 35.4 million parameters Capsule Net: 6.8 million parameters Capsule nets can also get 1.75% error using only 25 labeled examples. MultiMNIST: 5.2% error (current record) CIFAR10: 10.6% error smallNORB: 2.7% error (current record, tied with LeCun et. al.) affNIST: 79% accuracy (compare to CNN with 66% accuracy)

Capsule nets in PyTorch https://github.com/gram-ai/capsule-networks/blob/master/capsule_network.py

What capsule nets give us Awesome priors Translational equivariance Hierarchical composition: the world is made up of objects that have properties. Inverse-graphics: objects move linearly in space (translation) and rotate. Information is properly routed to the appropriate neurons. Routes by "agreement" rather than by "volume." Interpretable Clear representation of learned features Visualization of internal representation Learns with very few examples (fewer than 5 per class?) Outperforms CNNs in accuracy Runs blazingly fast

Takeaways from capsule nets Thinking very carefully about your priors and biases can inform good architecture choices and lead to very good results. Interpretability is credibility for neural nets. This is probably the gold standard. Geoffrey Hinton is a badass.

A different problem: depth

Is depth good? Deeper networks can express more functions Biased towards learning the functions we want Hard to train E.g. exploding / vanishing gradients Deep learning => deeper nets, harder computations Can we have very deep networks that are easy to train?

ResNets Residual networks (ResNets): skip connections between non-consecutive layers

DenseNets If skip connections are a good thing, why don’t we do ALL of them?

Question Are there functions that can be computed by a ResNet but not by a normal deep net? No! ResNets represent an inductive bias rather than greater expressive power.

Results

ResNets act like ensembles of shallow nets Veit et al. (2016)

Deleting layers doesn’t kill performance Veit et al. (2016)

Loss landscapes Li et al. (2018)