CSC 578 Neural Networks and Deep Learning

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
Why equivariance is better than premature invariance
ImageNet Classification with Deep Convolutional Neural Networks
Spatial Pyramid Pooling in Deep Convolutional
Overview of Back Propagation Algorithm
Radial Basis Function (RBF) Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Multiclass object recognition
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 22: Transforming autoencoders for learning the right representation of shapes Geoffrey.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Lecture 4b Data augmentation for CNN training
Facial Detection via Convolutional Neural Network Nathan Schneider.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Big data classification using neural network
Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images
Convolutional Sequence to Sequence Learning
Unsupervised Learning of Video Representations using LSTMs
Learning to Compare Image Patches via Convolutional Neural Networks
Convolutional Neural Network
Deep Feedforward Networks
The Relationship between Deep Learning and Brain Function
Deep Neural Net Scenery Generation
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
DeepCount Mark Lenson.
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Classification with Perceptrons Reading:
Intelligent Information System Lab
Intro to NLP and Deep Learning
Lecture 5 Smaller Network: CNN
Deep Belief Networks Psychology 209 February 22, 2013.
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Introduction to Neural Networks
Goodfellow: Chap 6 Deep Feedforward Networks
Counting in Dense Crowds using Deep Learning
Convolutional Neural Networks
Deep learning Introduction Classes of Deep Learning Networks
Introduction of MATRIX CAPSULES WITH EM ROUTING
Object Classification through Deconvolutional Neural Networks
CSC 578 Neural Networks and Deep Learning
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
LECTURE 35: Introduction to EEG Processing
On Convolutional Neural Network
Papers 15/08.
LECTURE 33: Alternative OPTIMIZERS
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Convolutional Neural Networks
Autoencoders hi shea autoencoders Sys-AI.
Problems with CNNs and recent innovations 2/13/19
CSC 578 Neural Networks and Deep Learning
Department of Computer Science Ben-Gurion University of the Negev
Automatic Handwriting Generation
Image recognition.
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
Random Neural Network Texture Model
Presentation transcript:

CSC 578 Neural Networks and Deep Learning Fall 2018/19 10. Capsule (Overview) Noriko Tomuro

Introduction to Capsule A Capsule Network (CapsNet) is a new approach proposed by Geoffrey Hinton (although his original idea dates back to 1990’s). CapsNets are intended to overcome the difficulty of CNNs, in particular MaxPooling. Downsampling by MaxPooling is effective for reducing the size of feature maps as well as for finding important features existing in the image. Noriko Tomuro

Identified features are location invariant, and that’s one of the strengths of MaxPooling. However, identified features are independent, and have lost spatial relationship between themselves. Also high level features (composed of low level features) are not robust to pose (translational and rotational) relationship. Noriko Tomuro

Solution: Capsel Networks Hinton himself stated that the fact that max pooling is working so well is a big mistake and a disaster: Solution: Capsel Networks Hinton took inspiration from a field that already solved that problem: 3D computer graphics. In 3D graphics, a pose matrix is a special technique to represent the relationships between objects. Poses are essentially matrices representing translation plus rotation. It also more closely mimic the human visual system, which creates a tree-like hierarchical structure for each focal point to recognize objects. “The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster.” Noriko Tomuro

Part-Whole Hierarchies Any object is made of parts which themselves might be viewed as objects. The parts will have instantiation parameters, all the way down the parse tree. The object may be defined not only by set of parts that compose it, but also the relationship among their instantiation parameters. human arm torso leg thumb index finger wrist

https://www.slideshare.net/charlesmartin141/capsule-networks-84754653

https://www. slideshare https://www.slideshare.net/aureliengeron/introduction-to-capsule-networks-capsnets

https://www.slideshare.net/charlesmartin141/capsule-networks-84754653

https://twitter.com/KirkDBorne

Step 1: Input images. MNIST dataset. Step 2: Convolution. Apply 256 9x9 filters and obtain 256 20x20 feature maps (after ReLU). Step 1: Input images. MNIST dataset. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

Step 3a: Primary Caps. Apply a 9x9x256 filter with stride 2. Each filter yields a 6x6 map (where (20-9)+1)/2 = 6). Do this 256 times => we get a stack of 256 6x6 maps. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

We cut the stack up into 32 decks with 8 cards each deck. We can call this deck a “capsule layer.” Each capsule layer has 36 “capsules.” Each capsule has an array of 8 values. This is what we can call a “vector.” https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

These “capsules” are our new pixel. With a capsule we can store 8 values (not just 1) per location! That gives us the opportunity to store more information than just whether or not we found a shape in that spot. We store details we need to describe the shape: Type of shape Position Rotation Color Size We can call these “instantiation parameters.” With more complex images we will end up needing more details. They can include pose (position, size, orientation), deformation, velocity, albedo, hue, texture, and so on.

Noriko Tomuro

Learning in Capsule Then, how do we coax the network into actually wanting to learn these things? When training a traditional CNN, we only care about whether or not the model predicts the right classification. With a capsule network, we have something called a “reconstruction.” A reconstruction takes the vector we created and tries to recreate the original input image, given only this vector. We then grade the model based on how close the reconstruction matches the original image.

https://medium. freecodecamp https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

Step 3b: Squashing. Apply the Squashing function Step 3b: Squashing. Apply the Squashing function. This function scales the values of the vector so that only the length of the vector changes, not the angle. This way we can make the vector between 0 and 1 so it’s an actual probability. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

https://medium. freecodecamp https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

This is what lengths of the capsule vectors look like after squashing This is what lengths of the capsule vectors look like after squashing. At this point it’s almost impossible to guess what each capsule is looking for. Keep in mind that each pixel is actually a vector of length 8. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

Step 4: Routing by Agreement Step 4: Routing by Agreement. This step decides what information to send to the next level. Each capsule tries to predict the next layer’s activations based on itself: https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

Mapping From Capsules in One Layer to the Next Michael Mozer, http://www.cs.colorado.edu/~mozer/Teaching/syllabi/DeepLearningFall2017/

Capsule Coupling Michael Mozer, http://www.cs.colorado.edu/~mozer/Teaching/syllabi/DeepLearningFall2017/

Capsule Coupling/Agreement Michael Mozer, http://www.cs.colorado.edu/~mozer/Teaching/syllabi/DeepLearningFall2017/

Routing Algorithm: Probabilities of Capsules to be coupled The algorithm is essentially an EM clustering algorithm. Michael Mozer, http://www.cs.colorado.edu/~mozer/Teaching/syllabi/DeepLearningFall2017/

Step 5: DigitCaps. After agreement, we end up with ten 16 dimensional vectors, one vector for each digit. This matrix is our final prediction. The length of the vector is the confidence of the digit being found — the longer the better. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

Step 6: Reconstruction. A 3-layer fully connected decoder Step 6: Reconstruction. A 3-layer fully connected decoder. The final activity vector is used to generate a reconstruction of the input image via a CNN decoder consisting of 3 fully connected layers. The reconstruction loss minimizes the sum of squared differences between the outputs of the logistic units and the pixel intensities. https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

https://medium. freecodecamp https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

The network is trained by minimizing the euclidean distance between the image and the output of a CNN that reconstructs the input from the output of the terminal capsules. The network is discriminatively trained, using iterative routing-by-agreement. Margin Loss Reconstruction Loss https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

Capsule – Future of ANN? Everybody agrees with the idea. But it has not been tested with other large data. The first results seem promising, but so far tested with a few datasets. Also the implemented systems are very slow to train. [Quora] “At this moment in time it is not possible to say whether capsule networks are the future for neural AI. Other experiments besides image classification will need to be conducted to proof that the techniques is robust for all other kinds of learning that involve other aspects of perception besides the visual one. … Lots more work has to be done in the structure of these learning architectures.” Noriko Tomuro