Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson. http://www.cs.toronto.edu/~ranzato/files/ranzato_CNN_stanford2015.pdf http://neuralnetworksanddeeplearning.com/

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

A brief review of non-neural-network approaches to deep learning

Neural networks Introduction Fitting neural networks

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

ImageNet Classification with Deep Convolutional Neural Networks

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Neural networks and support vector machines

Big data classification using neural network

Regularization Techniques in Neural Networks

Learning to Compare Image Patches via Convolutional Neural Networks

Deep Feedforward Networks

Artificial Neural Networks

Summary of “Efficient Deep Learning for Stereo Matching”

Data Mining, Neural Network and Genetic Programming

[Ran Manor and Amir B.Geva] Yehu Sapir Outlines Review

Data Mining, Neural Network and Genetic Programming

ECE 5424: Introduction to Machine Learning

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

第 3 章神经网络.

Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Convolution Neural Networks

Training Techniques for Deep Neural Networks

Convolutional Networks

Introduction to Deep Learning for neuronal data analyses

Machine Learning Today: Reading: Maria Florina Balcan

Introduction to Neural Networks

Image Classification.

ANN Design and Training

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

CS 4501: Introduction to Computer Vision Training Neural Networks II

Deep Learning Hierarchical Representations for Image Steganalysis

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

Artificial Intelligence Chapter 3 Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Neural Networks Geoff Hulten.

On Convolutional Neural Network

Deep Learning for Non-Linear Control

ML – Lecture 3B Deep NN.

Artificial Intelligence Chapter 3 Neural Networks

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Artificial Intelligence Chapter 3 Neural Networks

Life Long Learning Hung-yi Lee 李宏毅

Neural networks (1) Traditional multi-layer perceptrons

Machine learning overview

Mihir Patel and Nikhil Sardana

Image Classification & Training of Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Word embeddings (continued)

Computer Vision Lecture 19: Object Recognition III

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Reuben Feinman Research advised by Brenden Lake

Department of Computer Science Ben-Gurion University of the Negev

Introduction to Neural Networks

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Image recognition.

Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson. http://www.cs.toronto.edu/~ranzato/files/ranzato_CNN_stanford2015.pdf http://neuralnetworksanddeeplearning.com/

http://illusionoftheyear.com/2009/05/the-illusion-of-sex/ http://public.gettysburg.edu/~rrussell/Russell_2009.pdf Left appears female, right appears male. Eyes and lips have not had their brightness/contrast adjusted, but it looks like the face on the left has darker lips This is an example of simultaneous contrast adjustment Russell, 2009

Simultaneous contrast http://pixgood.com/color-contrast-art-examples.html

In this one, the skin was left unchanged, but the lips and eyes were lightened on the left and darkened on the right.

No illusion – both faces appear androgenous “A sex difference in facial contrast and its exaggeration by cosmetics” Female faces have greater luminance contrast between the eyes, lips, and surrounding skin than do male faces. This contrast change in some way represents ‘femininity’. Cosmetics could be seen to increase contrast of these areas, and so increase femininity.

Perceptron: This is convolution!

v v v Shared weights Output should be a little smaller, as we aren’t padding v

Filter = ‘local’ perceptron. Also called kernel. This is the major trick. For each layer of the neural network, we’re going to learn multiple filters.

By pooling responses at different locations, we gain robustness to the exact spatial location of image features.

FLOPS = just to execute, not train.

Universality A single-layer network can learn any function: So long as it is differentiable To some approximation; More perceptrons = a better approximation Visual proof (Michael Nielson): http://neuralnetworksanddeeplearning.com/chap4.html Visual proof is with sigmoid activation functions

If a single-layer network can learn any function… …given enough parameters… …then why do we go deeper? Intuitively, composition is efficient because it allows reuse. Empirically, deep networks do a better job than shallow networks at learning such hierarchies of knowledge.

Problem of fitting Too many parameters = overfitting Not enough parameters = underfitting More data = less chance to overfit How do we know what is required?

Regularization Attempt to guide solution to not overfit But still give freedom with many parameters

Data fitting problem [Nielson]

Which is better? Which is better a priori? How do we know a priori how many parameters we need? Cross-validation is one way. 9th order polynomial 1st order polynomial [Nielson]

Regularization Attempt to guide solution to not overfit But still give freedom with many parameters Idea: Penalize the use of parameters to prefer small weights.

Regularization: Idea: add a cost to having high weights λ = regularization parameter n is size of training data Intuitively, the effect of regularization is to make it so the network prefers to learn small weights, all other things being equal. Large weights will only be allowed if they considerably improve the first part of the cost function. Put another way, regularization can be viewed as a way of compromising between finding small weights and minimizing the original cost function. [Nielson]

Both can describe the data… …but one is simpler. Occam’s razor: “Among competing hypotheses, the one with the fewest assumptions should be selected” For us: Large weights cause large changes in behaviour in response to small changes in the input. Simpler models (or smaller changes) are more robust to noise. Why does this help reduce overfitting?

Regularization Idea: add a cost to having high weights λ = regularization parameter n is size of training data Intuitively, the effect of regularization is to make it so the network prefers to learn small weights, all other things being equal. Large weights will only be allowed if they considerably improve the first part of the cost function. Put another way, regularization can be viewed as a way of compromising between finding small weights and minimizing the original cost function. Normal cross-entropy loss (binary classes) Regularization term [Nielson]

Regularization: Dropout Our networks typically start with random weights. Every time we train = slightly different outcome. Why random weights? If weights are all equal, response across filters will be equivalent. Network doesn’t train. http://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers [Nielson]

Regularization Our networks typically start with random weights. Every time we train = slightly different outcome. Why not train 5 different networks with random starts and vote on their outcome? Works fine! Helps generalization because error is averaged. Any problem here? But, kind of slow?

Regularization: Dropout [Nielson]

Regularization: Dropout At each mini-batch: Randomly select a subset of neurons. Ignore them. On test: half weights outgoing to compensate for training on half neurons. Effect: Neurons become less dependent on output of connected neurons. Forces network to learn more robust features that are useful to more subsets of neurons. Like averaging over many different trained networks with different random initializations. Except cheaper to train. [Nielson]

More regularization Adding more data is a kind of regularization Pooling is a kind of regularization Data augmentation is a kind of regularization

CNNs for Medical Imaging - Connectomics This is some of James’ work with Harvard Brain Science Center. The brain represented as a network: image obtained from MR connectomics. (Provided by Patric Hagmann, CHUV-UNIL, Lausanne, Switzerland) [Patric Hagmann]

Vision for understanding the brain 1mm cubed of brain Image at 5-30 nanometers How much data? [Kaynig-Fittkau et al.]

Vision for understanding the brain 1mm cubed of brain Image at 5-30 nanometers How much data? 1 Petabyte – 1,000,000,000,000,000 ~ All photos uploaded to Facebook per day [Kaynig-Fittkau et al.]

[Kaynig-Fittkau et al.]

Vision for understanding the brain Use computer vision to classify neurons in the brain. Attempt to build a connectivity graph of all neurons – ‘connectome’ – so that we can map and simulate brain function.

[Kaynig-Fittkau et al.]

https://arxiv.org/abs/1704.00848 [Haehn et al.]

Network Architecture [Haehn et al.] 170,000 parameters 9000 positive / negative example patches 75 x 75 80,000 image patches https://arxiv.org/abs/1704.00848 [Haehn et al.]

https://arxiv.org/abs/1704.00848 [Haehn et al.]

https://arxiv.org/abs/1704.00848 [Haehn et al.]