Lecture 5: CNN: Regularization

Slides:

Advertisements

Similar presentations

Multi-Layer Perceptron (MLP)

Advertisements

NEURAL NETWORKS Backpropagation Algorithm

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ImageNet Classification with Deep Convolutional Neural Networks

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Lecture 14 – Neural Networks

Artificial Neural Networks ECE 398BD Instructor: Shobha Vasudevan.

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.

Lecture 4: CNN: Optimization Algorithms

Lecture 3: CNN: Back-propagation

Overview of Back Propagation Algorithm

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

Explorations in Neural Networks Tianhui Cai Period 3.

Backpropagation An efficient way to compute the gradient Hung-yi Lee.

Classification / Regression Neural Networks 2

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Deep Convolutional Nets

Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Lecture 4b Data augmentation for CNN training

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.

Lecture 2b: Convolutional NN: Optimization Algorithms

Lecture 3a Analysis of training of NN

Understanding Convolutional Neural Networks for Object Recognition

Sentiment analysis using deep learning methods

Convolutional Neural Network

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Object Classification through Deconvolutional Neural Networks

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

CS 2770: Computer Vision Neural Networks

Matt Gormley Lecture 16 October 24, 2016

Neural Networks CS 446 Machine Learning.

ECE 6504 Deep Learning for Perception

Training Techniques for Deep Neural Networks

Machine Learning: The Connectionist

Prof. Carolina Ruiz Department of Computer Science

Convolutional Neural Networks

Introduction to Neural Networks

Artificial Intelligence Methods

CS 4501: Introduction to Computer Vision Training Neural Networks II

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

Object Classification through Deconvolutional Neural Networks

Smart Robots, Drones, IoT

CSC 578 Neural Networks and Deep Learning

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Neural Networks Geoff Hulten.

Capabilities of Threshold Neurons

Visualizing and Understanding Convolutional Networks

Example of a simple deep network architecture.

Overview of Neural Network Architecture Assignment Code

Backpropagation Disclaimer: This PPT is modified based on

Artificial Intelligence 10. Neural Networks

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Department of Computer Science Ben-Gurion University of the Negev

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Debasis Bhattacharya, JD, DBA University of Hawaii Maui College

Example of a simple deep network architecture.

Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.

CSC 578 Neural Networks and Deep Learning

Artificial Neural Network learning

Prof. Carolina Ruiz Department of Computer Science

Presentation transcript:

Lecture 5: CNN: Regularization boris. ginzburg@intel.com

Agenda Data augmentation Dropout (Hinton et al ) Stochastic pooling (Zeiler, Fergus) Maxout (I.Goodfellow)

Overfitting Alexnet has 60 mln parameters. Dataset: 1000 classes, 1.5 mln images, 50K validating 150K testing. How to reduce over-fitting? The easiest and most common method to reduce over-fitting on image data is to artificially enlarge the dataset using label-preserving transformations: generating image translations and horizontal reflections altering the intensities of the RGB channels in training images Elastic deformation (Simard, 2003)

Alexnet: Dropout Test: Technique proposed by Hinton et al. See : http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf Dropout was used for training of fully connected layers. Training: setting to 0 the output of each hidden neuron with probability 0.5. The neurons which are “dropped out” in this way do not contribute to the forward pass and do not participate in back-propagation. So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights. Test: At test time, we use all the neurons but multiply their outputs by 0.5. Caffe: implemented as “dropout layer”

Learning rate and dropout “Optimization proceeds very differently when using dropout than when using ordinary stochastic gradient descent. SGD usually works best with a small learning rate that results in a smoothly decreasing objective function, while dropout works best with a large learning rate, resulting in a constantly fluctuating objective function. Dropout rapidly explores many different directions and rejects the ones that worsen performance, while SGD moves slowly and steadily in the most promising direction.” http://arxiv.org/pdf/1302.4389.pdf

Zeiler & Fergus: Stochastic Pooling Similar to dropout technique , but used for pooling in convolutional layers: http://arxiv.org/pdf/1302.4389.pdf Training: Compute probability for each element in pooling region through normalization of activation inside pooling region: 𝑝 𝑖 = 𝑎 𝑖 𝑘 ∈𝑅 𝑎 𝑘 Pool activation based on Probabilities from step 1. Testing: weighted pooling 𝑠= 𝑘∈𝑅 𝑝 𝑘 𝑎 𝑘

Zeiler & Fergus: Stochastic Pooling

Goodfellow: Maxout http://www-etud.iro.umontreal.ca/~goodfeli/maxout.html In a convolutional network, a maxout feature map can be constructed by taking the maximum across k affine feature maps (i.e., pool across channels, in addition spatial locations) ℎ 𝑖 = max 𝑗=1..𝑀 ∗ 𝑧 𝑖𝑗 = max 𝑗=1..𝑀 ( 𝑤 𝑖𝑗 ∗ 𝑣 𝑗 + 𝑏 𝑖𝑗 )

Maxout results

Exercises CIFAR-10 - experiment with Dropout layer Implement Stochastic pooling and Max-out layers