Deep Learning.

Slides:



Advertisements
Similar presentations
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Advertisements

Neural networks Introduction Fitting neural networks
Deep Learning Bing-Chen Tsai 1/21.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
POSTER TEMPLATE BY: Multi-Sensor Health Diagnosis Using Deep Belief Network Based State Classification Prasanna Tamilselvan.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
How to do backpropagation in a brain
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
How to do backpropagation in a brain
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
A shallow introduction to Deep Learning
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.
Introduction to Deep Learning
Cognitive models for emotion recognition: Big Data and Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
Convolutional Neural Network
Deep Learning Amin Sobhani.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Restricted Boltzmann Machines for Classification
Multimodal Learning with Deep Boltzmann Machines
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Deep Learning Qing LU, Siyuan CAO.
Structure learning with deep autoencoders
Department of Electrical and Computer Engineering
Limitations of Traditional Deep Network Architectures
Deep Architectures for Artificial Intelligence
Deep Learning.
Neural Networks Geoff Hulten.
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Neural networks (3) Regularization Autoencoder
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Deep Learning

Why?

Source: Huang et al., Communications ACM 01/2014

the 2013 International Conference on Learning Representations, the 2013 ICASSP’s special session on New Types of Deep Neural Network Learning for Speech Recognition and Related Applications, the 2013 ICML Workshop for Audio, Speech, and Language Processing, the 2012, 2011, and 2010 NIPS Workshops on Deep Learning and Unsupervised Feature Learning, 2013 ICML Workshop on Representation Learning Challenges, 2013 Intern. Conf. on Learning Representations, 2012 ICML Workshop on Representation Learning, 2011 ICML Workshop on Learning Architectures, Representations, and Optimization for Speech and Visual Information Processing, 2009 ICML Workshop on Learning Feature Hierarchies, 2009 NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2012 ICASSP deep learning tutorial, the special section on Deep Learning for Speech and Language Processing in IEEE Trans. Audio, Speech, and Language Processing (January 2012), the special issue on Learning Deep Architectures in IEEE Trans.. Pattern Analysis and Machine Intelligence (2013)

Geoffrey Hinton University of Toronto ”A fast learning algorithm for deep belief nets” -- Hinton et al., 2006 ”Reducing the dimensionality of data with neural networks” -- Hinton & Salakhutdinov Hat 1985 mehrere hidden layers eingeführt, University toronto Microsoft research google Neurowissenschaften sprache (Baker et al., 2009, 2009a; Deng, 1999, 2003) Visuell (George, 2008; Bouvrie, 2009; Poggio, 2007). Geoffrey Hinton University of Toronto

How?

Shallow learning SVM Linear & Kernel Regression Hidden Markov Models (HMM) Gaussian Mixture Models (GMM) Single hidden layer MLP ... Limited modeling capability of concepts Cannot make use of unlabeled data

Neuronal Networks Machine Learning Classification Neurons Knowledge from high dimensional data Classification Input: features of data supervised vs unsupervised labeled data Neurons

Multi Layer Perceptron Multiple Layers Feed Forward Connected Weights 1-of-N Output [ Y1 , Y2 ] hidden output k wjk j 1 vij i input [ X1 , X2 , X3 ]

Backpropagation wjk vij k j i Minimize error of calculated output Adjust weights Gradient Descent Procedure Forward Phase Backpropagation of errors For each sample, multiple epochs k wjk j vij i

Best Practice Normalization Overfitting/Generalisation Prevent very high weights, Oscillation Overfitting/Generalisation Validation Set, Early Stopping Mini-Batch Learning update weights with multiple input vectors combined

Problems with Backpropagation Multiple hidden Layers Get stuck in local optima start weights from random positions Slow convergence to optimum large training set needed Only use labeled data most data is unlabeled Generative Approach

Restricted Boltzmann Machines Unsupervised Find complex regularities in training data Bipartite Graph visible, hidden layer Binary stochastic units On/Off with probability 1 Iteration Update Hidden Units Reconstruct Visible Units Maximum Likelihood of training data i j visible hidden wij

Restricted Boltzmann Machines Training Goal: Best probable reproduction unsupervised data find latent factors of data set Adjust weights to get maximum probability of input data i j visible hidden wij

Training: Contrastive Divergence Start with a training vector on the visible units. Update all the hidden units in parallel. Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again. j j i i t = 0 t = 1 data reconstruction

Example: Handwritten 2s 50 binary neurons that learn features 50 binary neurons that learn features Increment weights between an active pixel and an active feature Decrement weights between an active pixel and an active feature 16 x 16 pixel image 16 x 16 pixel image data (reality) reconstruction

The final 50 x 256 weights: Each unit grabs a different feature

Example: Reconstruction Reconstruction from activated binary features Reconstruction from activated binary features Data Data New test image from the digit class that the model was trained on Image from an unfamiliar digit class The network tries to see every image as a 2.

Deep Architecture Backpropagation, RBM as building blocks Multiple hidden layers Motivation (why go deep?) Approximate complex decision boundary Fewer computational units for same functional mapping Hierarchical Learning Increasingly complex features work well in different domains Vision, Audio, …

Hierarchical Learning Natural progression from low level to high level structure as seen in natural complexity Easier to monitor what is being learnt and to guide the machine to better subspaces

Stacked RBMs First learn one layer at a time by stacking RBMs. Treat this as “pre-training” that finds a good initial set of weights which can then be fine-tuned by a local search procedure. Backpropagation can be used to fine-tune the model to be better at discrimination. Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first

Dimensionality reduction Uses Dimensionality reduction

Dimensionality reduction Use a stacked RBM as deep auto-encoder Train RBM with images as input & output Limit one layer to few dimensions  Information has to pass through middle layer

Dimensionality reduction Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625  30) Original Deep RBN PCA

Dimensionality reduction 804’414 Reuters news stories, reduction to 2 dimensions PCA Deep RBN

Uses Classification

Unlabeled data Unlabeled data is readily available Example: Images from the web Download 10’000’000 images Train a 9-layer DNN Concepts are formed by DNN  70% better than previous state of the art Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng

Uses AI

Artificial intelligence Enduro, Atari 2600 Expert player: 368 points Deep Learning: 661 points Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

Uses Generative (Demo)

How to use it

How to use it Home page of Geoffrey Hinton https://www.cs.toronto.edu/~hinton/ Portal http://deeplearning.net/ Accord.NET http://accord-framework.net/