CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Slides:

Advertisements

Similar presentations

Classification spotlights

Advertisements

ImageNet Classification with Deep Convolutional Neural Networks

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Spatial Pyramid Pooling in Deep Convolutional

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

CSC2535: Advanced Machine Learning Lecture 6a Convolutional neural networks for hand-written digit recognition Geoffrey Hinton.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

Fully Convolutional Networks for Semantic Segmentation

Deep Convolutional Nets

Neural networks in modern image processing Petra Budíková DISA seminar,

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Deep Residual Learning for Image Recognition

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Bias and Variance (Machine Learning 101)

Convolutional Neural Network

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Deep Learning Amin Sobhani.

Compact Bilinear Pooling

Data Mining, Neural Network and Genetic Programming

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Article Review Todd Hricik.

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Lecture 24: Convolutional neural networks

Combining CNN with RNN for scene labeling (segmentation)

Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.

ECE 6504 Deep Learning for Perception

Supervised Training of Deep Networks

Lecture 5 Smaller Network: CNN

Training Techniques for Deep Neural Networks

Convolutional Networks

Deep Belief Networks Psychology 209 February 22, 2013.

CS6890 Deep Learning Weizhen Cai

Machine Learning: The Connectionist

Non-linear classifiers Neural networks

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants

State-of-the-art face recognition systems

Fully Convolutional Networks for Semantic Segmentation

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

Deep learning Introduction Classes of Deep Learning Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

CSC 578 Neural Networks and Deep Learning

KFC: Keypoints, Features and Correspondences

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

On Convolutional Neural Network

Visualizing and Understanding Convolutional Networks

Convolutional Neural Networks

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

CSC 578 Neural Networks and Deep Learning

Natalie Lang Tomer Malach

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Image recognition.

Object Detection Implementations

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder position audience

Recognizing Handprinted Digits: 2 Vs. 3 how many hidden units are needed?

Recognizing An Image Input is 5x5 pixel array Simple back propagation net output hidden

Recognizing An Object With Unknown Location Object can appear either in the left image or in the right image Output indicates presence of object regardless of position What do we know about the weights? output hidden hidden

Generalizing To Many Locations Each possible location the object can appear in has its own set of hidden units Each set detects the same features except in a different location Locations can overlap output hidden hidden hidden

Convolutional Neural Net Each patch of the image is processed by a different set of hidden units But mapping from patch to hidden is the same everywhere Achieves translation invariant recognition Can be convolutional in 2D as well as 1D output hidden hidden hidden

The Input Layer Input layer typically represents pixels present at a given (x,y) location of a particular color (R, G, B) 3D lattice height X width X # channels

The Hidden Layer Each hidden unit (i) is replicated across (x,y) positions of every patch Instead of drawing pools of hidden, draw one (x,y) map for each hidden unit type (i) Hidden unit i at (x,y) gets input from a cube of activity from its corresponding input patch Each hidden unit in a map has the same incoming weights output hidden hidden hidden

Jargon Each hidden unit channel Weights for each channel also called map, feature, feature type, dimension Weights for each channel also called kernels Input patch to a hidden unit at (x,y) also called receptive field

Convolution Input dimensions Input is processed in M by M patches 𝑿×𝒀 Input is processed in M by M patches One hidden pool per complete patch in the image hidden dimensions 𝑿−𝑴+𝟏 × (𝒀−𝑴+𝟏) with zero padding, we have dimensions 𝑿×𝒀 If pools are offset by 𝑺 pixels in both horizontal and vertical directions (versus 𝑺=𝟏) 𝑺: stride hidden dimensions 𝑿−𝑴+𝟏 𝑺 × 𝒀−𝑴+𝟏 𝑺

Pooling If all the hidden units are detecting the same features, and we simply want to determine whether the object appeared in any location, we can combine hidden representations Sum pooling vs. max pooling output sum of hidden hidden hidden hidden

Transformation types Each layer in a convolutional net has a 3D lattice structure width X height X feature type Three types of transformations between layers convolution activation function (nonlinearity) sum or max pooling Full blown convolutional net performs these transformations repeatedly -> Deep net higher-order feature detectors after convolution lower spatial resolution after pooling

Putting It All Together source: benanne.github.io

Architecture Of Primate Visual System Visual hierarchy Transformation from simple, low-order features to complex, high-order features Transformation from position-specific features to position-invariant features source: neuronresearch.net/vision Example of a domain-appropriate bias: CONV NETS FOR VISION hierarchy of visual layers; simple, position specific -> position invariant simple -> complex

Domain-Appropriate Bias Built Into Convolutional Net source: neuronresearch.net/vision Spatial locality features at nearby locations in an image are most likely to have joint causes and consequences Spatial position homogeneity features deemed significant in one region of an image are likely to be significant in others Spatial scale homogeneity locality and position homogeneity should apply across a range of spatial scales [END OF SLIDE] I love the Goodfellow, Bengio, & Courville text But it says nothing about domain-appropriate bias and how to incorporate knowledge you have into a model. I love tf/theano/etc. but like all simulation environments, they make it really easy to pick defaults and use generic domain-independent methods. ML is really about crafting models to the domain, not tossing unknown data into a generic architecture. Deep learning seems to mask this issue. source: benanne.github.io

Videos And Demos LeCun’s work Karpathy Demos Yann’s early convolutional nets LeNet-5 Karpathy Demos Javascript convolutional net demo

ImageNet > 15M high resolution images over 22K categories labeled by Mechanical Turk workers E.g., fungi Fungi

ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) 2010-2017 multiple challenges classification classification and localization segmentation 2010 Classification Challenge 1.2 M training images, 1000 categories (general and specific) 200K test images output a list of 5 object categories in descending order of confidence two error rates: top-1 and top-5

AlexNet (Krizhevsky, Sutskever, & Hinton, 2012) Architecture 5 convolutional layers, split across two GPUs 2 fully connected layers 1000-way softmax output layer Trained with SGD for ~ 1 week 650k neurons 60M parameters 630M connections

AlexNet (Krizhevsky, Sutskever, & Hinton, 2012) Downsampled images shorter dimension 256 pixels, longer dimension cropped about center to 256 pixels R, G, B channels Mean subtraction from inputs Data set augmentation we’ll discuss in a subsequent class includes variations in intensity, color, translation, mirror reflection 10 test images: anchored at upper left, upper right, lower left, lower right, and center of image

Key Ideas ReLU instead of logistic or tanh units Training on multiple GPUs cross talk only in certain layers balance speed vs. connectivity Normalize output of ReLU output in map 𝒊 at (𝒙,𝒚) based on activity of features in adjacent maps at (𝒙,𝒚) Overlapping pooling pooling units spaced 𝒔 pixels apart, summing over a 𝒛×𝒛 neighborhood, with 𝒔 < 𝒛 a^i_{x,y} is ReLU output, b is normalized output, k, n, alpha, beta are chosen by cross validation

Results 2010 competition data set 2012 competition data set

Since 2012 Figure credit: devblogs.nvidia.com Figure 1: The top-5 error rate in the ImageNet Large Scale Visual Recognition Challenge has been rapidly reducing since the introduction of deep neural networks in 2012. 2015 Residual networks (coming soon) Figure credit: devblogs.nvidia.com

Time-Delay Neural Networks One dimensional convolution (Peddinti, Povey & Khudanpur, 2015) (Waibel et al., 1990)

Using Convolutional Nets For Segmentation (Long, Shelhamer, Darrell, 2015) Determine what objects are where “what” depends on global information “where” depends on local information last image is ground truth (GT)

Fully Convolutional Networks Higher layers of typical convolutional nets are nonspatial If every layer remains spatial, then output can specify where as well as what (albeit coarsely)

Upsampling How do we connect coarse output to dense pixels? Naïve approach interpolation Deconvolution approach if you want to increase resolution by a factor f, use convolution with fractional input stride 1/f

Architecture Note: pool 5 reflects single pixel in coarse heatmap pool

Results 20% relative improvement over state- of-the-art (SDS) …And runs faster