Convolutional Neural Network

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Lecture 14 – Neural Networks

Generic object detection with deformable part-based models

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Object detection, deep learning, and R-CNNs

Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.

Convolutional Neural Network

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Deep Residual Learning for Image Recognition

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Recent developments in object detection

Big data classification using neural network

The Relationship between Deep Learning and Brain Function

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Deep Learning Amin Sobhani.

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

ECE 5424: Introduction to Machine Learning

DeepCount Mark Lenson.

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Article Review Todd Hricik.

Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.

Lecture 24: Convolutional neural networks

Classification with Perceptrons Reading:

ECE 6504 Deep Learning for Perception

Lecture 5 Smaller Network: CNN

Neural Networks 2 CS446 Machine Learning.

Training Techniques for Deep Neural Networks

Convolutional Networks

CS6890 Deep Learning Weizhen Cai

Machine Learning: The Connectionist

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Non-linear classifiers Neural networks

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

By: Kevin Yu Ph.D. in Computer Engineering

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

Object Detection + Deep Learning

Smart Robots, Drones, IoT

network of simple neuron-like computing elements

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Object Recognition Today we will move on to… April 12, 2018

CSC 578 Neural Networks and Deep Learning

Creating Data Representations

LECTURE 35: Introduction to EEG Processing

On Convolutional Neural Network

Lecture: Deep Convolutional Neural Networks

Outline Background Motivation Proposed Model Experimental Results

LECTURE 33: Alternative OPTIMIZERS

Analysis of Trained CNN (Receptive Field & Weights of Network)

RCNN, Fast-RCNN, Faster-RCNN

Explainable Machine Learning

Convolutional Neural Networks

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

CSC 578 Neural Networks and Deep Learning

Department of Computer Science Ben-Gurion University of the Negev

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

Semantic Segmentation

Image recognition.

Object Detection Implementations

Debasis Bhattacharya, JD, DBA University of Hawaii Maui College

End-to-End Facial Alignment and Recognition

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Convolutional Neural Network Disclaimer: This PPT is modified based on Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html Can the network be simplified by considering the properties of images?

Why CNN for Image Some patterns are much smaller than the whole image A neuron does not have to see the whole image to discover the pattern. Connecting to small region with less parameters Simples version of fully connected network “beak” detector

Why CNN for Image The same patterns appear in different regions. “upper-left beak” detector Do almost the same thing They can use the same set of parameters. Detect small pattern “middle beak” detector

Why CNN for Image Subsampling the pixels will not change the object bird bird subsampling We can subsample the pixels to make image smaller Less parameters for the network to process the image

Fully Connected Feedforward network The whole CNN cat dog …… Convolution Fully Connected Feedforward network Max Pooling Can repeat many times Convolution Max Pooling Flatten

The whole CNN Property 1 Some patterns are much smaller than the whole image Convolution Property 2 Max Pooling The same patterns appear in different regions. Can repeat many times Property 3 Convolution Subsampling the pixels will not change the object Max Pooling Flatten

Fully Connected Feedforward network The whole CNN cat dog …… Convolution Fully Connected Feedforward network Max Pooling Can repeat many times Convolution Max Pooling Flatten

CNN – Convolution …… Those are the network parameters to be learned. 1 -1 1 Filter 1 Matrix -1 1 Filter 2 Sliding window. window size? Multiple filters (similar to weights in DNN, need to learn through training) Different (filter) pattern will produce different patterns to be detected. Matrix …… 6 x 6 image Each filter detects a small pattern (3 x 3). Property 1

CNN – Convolution 1 -1 Filter 1 stride=1 1 3 -1 6 x 6 image 3 -1 Sliding window and filter  take inner product. Element products and then sum. Stride: how large to slide/move. 6 x 6 image

CNN – Convolution 1 -1 Filter 1 If stride=2 1 3 -3 3 -3 First row, done, then second row. Slide according to stride. We set stride=1 below 6 x 6 image

CNN – Convolution 1 -1 Filter 1 stride=1 1 3 -1 -3 -1 -3 1 -3 -3 -3 1 3 -1 -3 -1 -3 1 -3 This filter is to find diagonal with ones. All sliding windows with same patter will produce large outputs. Same pattern can be found in different regions. -3 -3 1 3 -2 -2 -1 6 x 6 image Property 2

CNN – Convolution Do the same process for every filter Feature Map -1 stride=1 Do the same process for every filter 1 3 -1 -3 -1 -1 -1 -1 -1 -3 1 -3 -1 -1 -2 1 Feature Map Feature map: can be taken as another lower dim of input images, with a 3*3 filter. 6*6  4*4. Original image is binary, 0/1. New image is with more different values, depending filter. -3 -3 1 -1 -1 -2 1 3 -2 -2 -1 6 x 6 image -1 -4 3 4 x 4 image

CNN – Colorful image 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 Filter 1 Filter 2 1 1 Color image filter is with size of 3*3*3 (red, blue, green).

Convolution v.s. Fully Connected 1 1 -1 -1 1 convolution image 1 Fully-connected …… ……

… … Less parameters! … 1: 1 -1 1 Filter 1 2: 3: 3 4: 1 7: 8: 1 9: 10: 3: 3 4: 1 … 7: 8: 1 9: 10: … First vectorized by row in whole input image: 6*6 13: 6 x 6 image 14: Less parameters! Only connect to 9 input, not fully connected 15: 1 16: 1 …

… … Less parameters! Even less parameters! … 1: 1 -1 1 2: Filter 1 3: 3: 3 4: 1 … 7: 8: 1 -1 9: 10: … Shared weights: same color. It is due to filters. Different input values. So different outputs. 13: 6 x 6 image 14: Less parameters! 15: 1 16: Shared weights Even less parameters! 1 …

Fully Connected Feedforward network The whole CNN cat dog …… Convolution Fully Connected Feedforward network Max Pooling Can repeat many times Convolution Max Pooling Flatten

CNN – Max Pooling 1 -1 -1 1 Filter 1 Filter 2 3 -1 -3 -1 -1 -1 -1 -1 -3 -1 -1 -2 1 Max-pooling: similar to Maxout. Grouping  insider groups, take max. Except Max-pooling, we can also use average-pooling. -3 -3 1 -1 -1 -2 1 3 -2 -2 -1 -1 -4 3

CNN – Max Pooling New image but smaller Conv Max Pooling Each filter 1 Conv 3 -1 1 Max Pooling 3 1 Filter smaller output image. 6*6  4*4. Max-pooling  even smaller output image. 4*4  2*2. 3 2 x 2 image 6 x 6 image Each filter is a channel

The whole CNN A new image Smaller than the original image 3 1 -1 Convolution Max Pooling Can repeat many times A new image Convolution Smaller than the original image Max Pooling The number of the channel is the number of filters

Fully Connected Feedforward network The whole CNN cat dog …… Convolution Fully Connected Feedforward network Max Pooling A new image Convolution Max Pooling A new image Flatten

Fully Connected Feedforward network 3 Flatten 1 3 1 -1 3 Fully Connected Feedforward network -1 Flatten Flatten: when output size is small enough from max-pooling outputs from different filters. First row-vectorized each max-pooling output from one filter. Then concatenate all such vectors, called flatten. Whole procedure: Choose filters; Consider sliding windows (called convolution) Consider max-pooling Flatten by vectorized and concatenate. 1 3

Conv2D in Keras keras.layers.Conv2D(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

Only modified the network structure and input format (vector -> 3-D tensor) CNN in Keras input Convolution 1 -1 -1 1 There are 25 3x3 filters. …… Max Pooling Input_shape = ( 28 , 28 , 1) 28 x 28 pixels 1: black/white, 3: RGB Convolution 3 -1 3 Max Pooling -3 1

Only modified the network structure and input format (vector -> 3-D tensor) CNN in Keras input 1 x 28 x 28 Convolution How many parameters for each filter? 9 25 x 26 x 26 Max Pooling 25 x 13 x 13 Convolution How many parameters for each filter? First convolution: Input image size of 28*28. 25 filters. Filter size of 3*3. Output image size of 26*26. Max-pooling with grouping of 2*2. Output size of 13*13. Second convolution: Input image size of 13*13. 50 filters. Filter size of 3*3. Output image size of 11*11 Output size of 5*5. [discard extra pixels] Rule: smaller # of filters for layers closer to input layer; (simpler pattern) larger # of filters for layers closer to output layer; (complex pattern) 3*3=9 3*3*25=225 225 50 x 11 x 11 Max Pooling 50 x 5 x 5

Fully Connected Feedforward network Only modified the network structure and input format (vector -> 3-D tensor) CNN in Keras input 1 x 28 x 28 output Convolution Fully Connected Feedforward network 25 x 26 x 26 Max Pooling 25 x 13 x 13 Convolution 5*5*25=1250. MNist dataset: 10 digits with 10 outputs. 50 x 11 x 11 Max Pooling 1250 50 x 5 x 5 Flatten

First Convolution Layer Typical-looking filters on the trained first layer The color/grayscale features are clustered because the AlexNet contains two separate streams of processing, and an apparent consequence of this architecture is that one stream develops high-frequency grayscale features and the other low-frequency color features Typical-looking filters on the first CONV layer (left), and the 2nd CONV layer (right) of a trained AlexNet. Notice that the first-layer weights are very nice and smooth, indicating nicely converged network. Generally only take first layer out to take a deeper look at weights. Filter size: 11 x 11 (AlexNet) http://cs231n.github.io/understanding-cnn/

How about higher layers? Which images make a specific neuron activate From “rcnn” (a famous paper) Maximally activating images for some POOL5 (5th pool layer) neurons of an AlexNet. The activation values and the receptive field of the particular neuron are shown in white. (In particular, note that the POOL5 neurons are a function of a relatively large portion of the input image!) It can be seen that some neurons are responsive to upper bodies, text, or specular highlights. Higher level filter. White box: what to see in an input image # means activation rate??? One row one filter. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR, 2014

What does CNN learn? 𝜕 𝑎 𝑘 𝜕 𝑥 𝑖𝑗 x 𝑎 𝑖𝑗 𝑘 𝜕 𝑎 𝑘 𝜕 𝑥 𝑖𝑗 The output of the k-th filter is a 11 x 11 matrix. x input 𝑎 𝑘 = 𝑖=1 11 𝑗=1 11 𝑎 𝑖𝑗 𝑘 Degree of the activation of the k-th filter: 25 3x3 filters Convolution 𝑥 ∗ =𝑎𝑟𝑔 max 𝑥 𝑎 𝑘 (gradient ascent) 11 Max Pooling 3 -1 -3 1 -2 …… 50 3x3 filters Convolution 𝑎 𝑖𝑗 𝑘 Use GD method by taking input as parameters. Fixed output, optimize input. This is different from loss function: with fixed input, optimize loss function. 11 50 x 11 x 11 Max Pooling

What does CNN learn? 𝜕 𝑎 𝑘 𝜕 𝑥 𝑖𝑗 𝜕 𝑎 𝑘 𝜕 𝑥 𝑖𝑗 The output of the k-th filter is a 11 x 11 matrix. input Degree of the activation of the k-th filter: 𝑎 𝑘 = 𝑖=1 11 𝑗=1 11 𝑎 𝑖𝑗 𝑘 25 3x3 filters Convolution 𝑥 ∗ =𝑎𝑟𝑔 max 𝑥 𝑎 𝑘 (gradient ascent) Max Pooling 50 3x3 filters Convolution Only display first 12 filter outputs, among 50 filters, Activate when each filter output/pattern is activated. 50 x 11 x 11 Max Pooling For each filter

What does CNN learn? 𝑎 𝑗 input Find an image maximizing the output of neuron: Convolution 𝑥 ∗ =𝑎𝑟𝑔 max 𝑥 𝑎 𝑗 Max Pooling Convolution Max Pooling flatten Analyze Fully-connected Network. First 9 neurous. Each figure corresponds to a neuron 𝑎 𝑗

What does CNN learn? 𝑦 𝑖 input 𝑥 ∗ =𝑎𝑟𝑔 max 𝑥 𝑦 𝑖 Can we see digits? Convolution Max Pooling 1 2 Convolution Max Pooling 3 4 5 flatten Inputted neurons for MNist. 6 7 8 Deep Neural Networks are Easily Fooled https://www.youtube.com/watch?v=M2IebCN9Ht4 𝑦 𝑖

What does CNN learn? Over all pixel values 𝑥 ∗ =𝑎𝑟𝑔 max 𝑥 𝑦 𝑖 − 𝑖,𝑗 𝑥 𝑖𝑗 𝑥 ∗ =𝑎𝑟𝑔 max 𝑥 𝑦 𝑖 1 2 1 2 Regularization to get right panel. Try Different initialization. 3 4 5 3 4 5 6 7 8 6 7 8

Dalmatian: 達爾馬提亞狗 Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR, 2014

| 𝜕 𝑦 𝑘 𝜕 𝑥 𝑖𝑗 | 𝑦 𝑘 : the predicted class of the model Pixel 𝑥 𝑖𝑗 | 𝜕 𝑦 𝑘 𝜕 𝑥 𝑖𝑗 | 𝑦 𝑘 : the predicted class of the model Pixel 𝑥 𝑖𝑗 Derivative value: large is important; Karen Simonyan, Andrea Vedaldi, Andrew Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR, 2014

Cover certain image position and DL can’t recognition the objects. Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833)

Deep Dream Given a photo, machine adds what it sees …… CNN Modify image Given a photo, machine adds what it sees …… 3.9 −1.5 2.3 ⋮ http://speech.ee.ntu.edu.tw/dokuwiki/lib/exe/fetch.php?media=speech:meeting:paper_report:paper_report_2015-08-12.pdf Make Bigger values even bigger; Smaller values even smaller. CNN exaggerates what it sees http://deepdreamgenerator.com/

Deep Dream Given a photo, machine adds what it sees …… http://deepdreamgenerator.com/

Deep Style Given a photo, make its style like famous paintings https://dreamscopeapp.com/

Deep Style Given a photo, make its style like famous paintings https://dreamscopeapp.com/

Deep Style ? CNN CNN content style CNN A Neural Algorithm of Artistic Style https://arxiv.org/abs/1508.06576 Find CNN such that to left for content; for right for style. CNN ?

More Application: Playing Go Network (19 x 19 positions) Next move 19 x 19 matrix (image) 19 x 19 vector 19 x 19 vector Taken as 19*19 image. Fully-connected feedforward network can be used Black: 1 white: -1 none: 0 But CNN performs much better.

More Application: Playing Go record of previous plays Training: 黑: 5之五白: 天元黑: 五之5 … Target: “天元” = 1 else = 0 CNN Output as target. Target: “五之 5” = 1 else = 0 CNN

Why CNN for playing Go? Some patterns are much smaller than the whole image The same patterns appear in different regions. Alpha Go uses 5 x 5 for first layer 5*5 for first layer in Alpha Go.

Alpha Go does not use Max Pooling …… Why CNN for playing Go? Subsampling the pixels will not change the object Max Pooling How to explain this??? Very good example for design your network Domain knowledge: 48 filters No max-pooling for Alpha Go. Alpha Go does not use Max Pooling ……

More Application: Speech The filters move in the frequency direction. CNN Frequency Image Time Spectrogram

More Application: Text ? Source of image: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.703.6858&rep=rep1&type=pdf

To learn more: CS231N class Consider: Class Website: http://cs231n.stanford.edu/ PPT: https://github.com/w-ww/cs231n/tree/master/slides YouTube playlist: https://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj- zLfQRF3EO8sYv

To learn more… Lecture 5 | Convolutional Neural Networks http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture05.pdf https://github.com/w-ww/cs231n/tree/master/slides cs231n_2018_lecture05.pdf YouTube playlist: https://www.youtube.com/watch?v=bNb2fEVKeEo&t=2858s

To learn more… Lecture 6 | Training Neural Networks I http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture06.pdf https://github.com/w-ww/cs231n/tree/master/slides cs231n_2018_lecture06.pdf YouTube playlist: https://www.youtube.com/watch?v=wEoyxE0GP2M

To learn more… Lecture 7 | Training Neural Networks II http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture07.pdf https://github.com/w-ww/cs231n/tree/master/slides cs231n_2018_lecture07.pdf YouTube playlist: https://www.youtube.com/watch?v=_JB0AO7QxSA

To learn more… Lecture 9 | CNN Architectures http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture09.pdf https://github.com/w-ww/cs231n/tree/master/slides cs231n_2018_lecture09.pdf YouTube playlist: https://www.youtube.com/watch?v=DAOcjicFr1Y

To learn more… Lecture 13 | Visualizing and Understanding http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture13.pdf YouTube playlist: https://www.youtube.com/watch?v=6wcs6szJWMY