A shallow introduction to Deep Learning

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Neural networks Introduction Fitting neural networks
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
How to do backpropagation in a brain
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
How to do backpropagation in a brain
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Deep Convolutional Nets
Neural networks in modern image processing Petra Budíková DISA seminar,
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Introduction to Deep Learning
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Convolutional Neural Network
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
The Relationship between Deep Learning and Brain Function
Deep Learning Amin Sobhani.
ECE 5424: Introduction to Machine Learning
Deep Learning Insights and Open-ended Questions
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Deep Learning Yoshua Bengio, U. Montreal
Intelligent Information System Lab
Supervised Training of Deep Networks
Deep learning and applications to Natural language processing
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
Structure learning with deep autoencoders
Deep Learning Workshop
State-of-the-art face recognition systems
Neural Networks Geoff Hulten.
Deep neural networks (DNNs)
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
实习生汇报 ——北邮 张安迪.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Introduction to Neural Networks
Learning and Memorization
Presentation transcript:

A shallow introduction to Deep Learning Zhiting Hu 2014-4-1

Outline Motivation: why go deep? DL since 2006 Some DL Models Discussion

Outline Motivation: why go deep? DL since 2006 Some DL Models Discussion

Motivation Definition Deep Learning A wide class of machine learning techniques and architectures, with the hallmark of using many layers of non-linear information processing that are hierarchical in nature. An Example: Deep Neural Networks

Motivation Definition Example Neural Network

Neural Network Input: x = Output: Y = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) Motivation Definition Motivation Definition Example Neural Network Input: x = Output: Y = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0)

Deep Neural Network (DNN) Motivation Definition Example Motivation Definition Deep Neural Network (DNN)

Parameter learning: Back-propagation Motivation Definition Example Motivation Definition Parameter learning: Back-propagation Given a training dataset: (X, Y) Learn parameters: W

Parameter learning: Back-propagation Motivation Definition Motivation Definition Example Parameter learning: Back-propagation Given a training dataset: (X, Y) Learn parameters: W 2 phases:

Parameter learning: Back-propagation Motivation Definition Motivation Definition Example Parameter learning: Back-propagation Given a training dataset: (X, Y) Learn parameters: W 2 phases: (1) Forward propagation

Parameter learning: Back-propagation Motivation Definition Example Parameter learning: Back-propagation Given a training dataset: (X, Y) Learn parameters: W 2 phases: (1) Forward propagation (2) backward propagation

Motivation: why go deep? Why Deep? Motivation: why go deep? Brains have a deep architecture Humans organize their ideas hierarchically, through composition of simpler ideas Insufficiently deep architectures can be exponentially inefficient Distributed representations are necessary to achieve non-local generalization Intermediate representations allow sharing statistical strength

Brains have a deep architecture Motivation Why Deep? Brains have a deep architecture

Brains have a deep architecture Motivation Why Deep? Brains have a deep architecture [Lee, Grosse, Ranganath & Ng, 2009]

Brains have a deep architecture Motivation Why Deep? Brains have a deep architecture Deep Learning = Learning Hierarchical Representations (features) [Lee, Grosse, Ranganath & Ng, 2009]

Deep Architecture in our Mind Motivation Why Deep? Deep Architecture in our Mind Humans organize their ideas and concepts hierarchically Humans first learn simpler concepts and then compose them to represent more abstract ones Engineers break-up solutions into multiple levels of abstraction and processing

Insufficiently deep architectures can be exponentially inefficient Motivation Why Deep? Insufficiently deep architectures can be exponentially inefficient Theoretical arguments Two layers of neurons = universal approximator Some functions compactly represented with k layers may require exponential size with 2 layers Theorems on advantage of depth: (Hastad et al 86 & 91, Bengio et al 2007, Bengio & Delalleau 2011, Braverman 2011)

Insufficiently deep architectures can be exponentially inefficient Motivation Why Deep? Insufficiently deep architectures can be exponentially inefficient “Shallow” computer program “Deep” computer program

Outline Motivation: why go deep? DL since 2006 Some DL Models Discussion

Why now? “Winter of Neural Networks” Since 90’s DL since 2006 Why Now? Why now? “Winter of Neural Networks” Since 90’s Before 2006 training deep architectures was unsuccessful (except for Convolutional Neural Nets) Main difficulty: local optima in the non-convex objective function of the deep networks Back-propagation (local gradient descent, random initialization) often gets trapped in poor local optima

Why now? “Winter of Neural Networks” Since 90’s DL since 2006 Why Now? Why now? “Winter of Neural Networks” Since 90’s Before 2006 training deep architectures was unsuccessful (except for Convolutional Neural Nets) Main difficulty: local optima in the non-convex objective function of the deep networks Back-propagation (local gradient descent, random initialization) often gets trapped in poor local optima Others: Too many parameters, so small labeled dataset => overfitting Hard to do theoretical analysis Need a lot of tricks to play with ….

Why now? “Winter of Neural Networks” Since 90’s DL since 2006 Why Now? Why now? “Winter of Neural Networks” Since 90’s Before 2006 training deep architectures was unsuccessful (except for Convolutional Neural Nets) Main difficulty: local optima in the non-convex objective function of the deep networks Back-propagation (local gradient descent, random initialization) often gets trapped in poor local optima Others: Too many parameters, so small labeled dataset => overfitting Hard to do theoretical analysis Need a lot of tricks to play with …. So people turned to shallow models with convex loss function (e.g., SVMs, CRFs etc.)

DL since 2006 Why Now? What has changed? New methods for unsupervised pre-training have been developed Unsupervised: use unlabeled data Pre-training: better initialization => better local optima

DL since 2006 Why Now? What has changed? New methods for unsupervised pre-training have been developed Unsupervised: use unlabeled data Pre-training: better initialization => better local optima GPU, distributed systems Large-scale learning

Success in object recognition DL since 2006 Success in object recognition Task: classify the 1.2 million images in the ImageNet LSVRC-2010 contest into the 1000 different classes.

Success in object recognition DL since 2006 Success in object recognition Task: classify the 1.2 million images in the ImageNet LSVRC-2010 contest into the 1000 different classes.

Success in object recognition DL since 2006 Success in object recognition Task: classify the 1.2 million images in the ImageNet LSVRC-2010 contest into the 1000 different classes.

Success in speech recognition DL since 2006 Success in speech recognition Google uses DL in their android speech recognizer (both server-side and on some phones with enough memory) Results from Google, IBM, Microsoft

Success in NLP Neural Word embedding DL since 2006 Success in NLP Neural Word embedding Use neural network to learn vector representation of a word Semantic relations appear as linear relationships in the space of learned representations King – Queen ~= Man – Woman Paris – France + Italy ~= Rome

DL in Industry Microsoft DL since 2006 DL in Industry Microsoft First successful DL models for speech recognition, by MSR in 2009 Google “Google Brain” Led by Google fellow Jeff Dean Large-scale deep learning infrastructure (Le et al, ICML’12) 10 million 200*200 images. Network with 1 billion connections, train on 1000 machines (16K cores) for 3 days Facebook Facebook hires NYU deep learning expert to run its new AI lab (2013.12)

Outline Motivation: why go deep? DL since 2006 Some DL Models Convolutional Neural Networks Deep Belief Nets Stacked auto-encoders / sparse coding Discussion

Convolutional Neural Networks (CNNs) DL Models CNN Convolutional Neural Networks (CNNs) Proposed by (LeCun et al., 1989), the “only” successful DL model before 2006 Widely used to image data (recently also to other tasks)

Convolutional Neural Networks (CNNs) DL Models CNN Convolutional Neural Networks (CNNs) Proposed by (LeCun et al., 1989), the “only” successful DL model before 2006 Widely used to image data (recently also to other tasks) Nearby pixels are more strongly correlated than more distant pixels Translation invariance

Convolutional Neural Networks (CNNs) DL Models CNN Convolutional Neural Networks (CNNs) Proposed by (LeCun et al., 1989), the “only” successful DL model before 2006 Widely used to image data (recently also to other tasks) Nearby pixels are more strongly correlated than more distant pixels Translation invariance CNNs Local receptive fields Weight sharing All of the units in the convolutional layer detect the same patterns but at different locations in the input image Subsampling Be relatively insensitive to small shifts of the image

Convolutional Neural Networks (CNNs) DL Models CNN Convolutional Neural Networks (CNNs) Proposed by (LeCun et al., 1989), the “only” successful DL model before 2006 Widely used to image data (recently also to other tasks) Nearby pixels are more strongly correlated than more distant pixels Translation invariance CNNs Local receptive fields Weight sharing All of the units in the convolutional layer detect the same patterns but at different locations in the input image Subsampling Be relatively insensitive to small shifts of the image

Convolutional Neural Networks (CNNs) DL Models CNN Convolutional Neural Networks (CNNs) Proposed by (LeCun et al., 1989), the “only” successful DL model before 2006 Widely used to image data (recently also to other tasks) Nearby pixels are more strongly correlated than more distant pixels Translation invariance CNNs Local receptive fields Weight sharing All of the units in the convolutional layer detect the same patterns but at different locations in the input image Subsampling Be relatively insensitive to small shifts of the image Training Back-propagation

Convolutional Neural Networks (CNNs) DL Models CNN Convolutional Neural Networks (CNNs) MNIST handwritten digits benchmark State-of-the-art: 0.35% error rate (IJCAI 2011)

Outline Motivation: why go deep? DL since 2006 Some DL Models Convolutional Neural Networks Deep Belief Nets Stacked auto-encoders / sparse coding Discussion

Restricted Boltzmann Machine (RBM) DL Models DBN RBM Restricted Boltzmann Machine (RBM) Building block of Deep Belief Nets (DBNs) and Deep Boltzmann Machine (DBM) Bipartite undirected graphical model Define: Parameter learning: Model parameters: W, b, c Maximize Gradient Descent, but use Contrastive Divergence (CD) to approximate the gradient

Deep Belief Nets (DBNs) DL Models DBN Deep Belief Nets (DBNs)

DBNs Layer-wise pre-training DL Models DBN DBNs Layer-wise pre-training

DBNs Layer-wise pre-training DL Models DBN DBNs Layer-wise pre-training

DBNs Layer-wise pre-training DL Models DBN DBNs Layer-wise pre-training

Supervised fine-tuning DL Models DBN Supervised fine-tuning After pre-training, the parameters W and c for each layer can be used to initialize a deep multi-layer neural network. These parameters can then be fine-tuned using back-propagation on labeled data

Outline Motivation: why go deep? DL since 2006 Some DL Models Convolutional Neural Networks Deep Belief Nets Stacked auto-encoders / sparse coding Discussion

Stacked auto-encoders / sparse coding DL Models AE / Sparse coding Stacked auto-encoders / sparse coding Building blocks: auto-encoder / sparse coding (nonprobabilistic) Structure similar to DBNs

Stacked auto-encoders / sparse coding DL Models AE / Sparse coding Stacked auto-encoders / sparse coding Building blocks: auto-encoder / sparse coding (nonprobabilistic) Structure similar to DBNs Let’s skip it….

DL Models

DL Models

DL Models

Outline Motivation: why go deep? DL since 2006 Some DL Models Convolutional Neural Networks Deep Belief Nets Stacked auto-encoders / sparse coding Discussion

Deep Learning = Learning Hierarchical features Discussion Feature Learning Deep Learning = Learning Hierarchical features

Deep Learning = Learning Hierarchical features Discussion Feature Learning Deep Learning = Learning Hierarchical features The pipeline of machine visual perception

Deep Learning = Learning Hierarchical features Discussion Feature Learning Deep Learning = Learning Hierarchical features The pipeline of machine visual perception Features in NLP (hand-crafted)

Deep Learning = Learning Hierarchical features Discussion Feature Learning Deep Learning = Learning Hierarchical features

Discussion Problems Problems No need of feature engineering, but training DL models does require significant amount of engineering, e.g., parameter tuning #layer, layer size, connection Learning rate

Discussion Problems Problems No need of feature engineering, but training DL models does require significant amount of engineering, e.g., parameter tuning #layer, layer size, connection Learning rate Computational scaling Recent breakthroughs in speech, object recognition and NLP hinged on faster computing, GPUs, and large datasets

Discussion Problems Problems No need of feature engineering, but training DL models does require significant amount of engineering, e.g., parameter tuning #layer, layer size, connection Learning rate Computational scaling Recent breakthroughs in speech, object recognition and NLP hinged on faster computing, GPUs, and large datasets Lack of theoretical analysis

Outline Motivation: why go deep? DL since 2006 Some DL Models Convolutional Neural Networks Deep Belief Nets Stacked auto-encoders / sparse coding Discussion

References