Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Learning.

Similar presentations


Presentation on theme: "Deep Learning."— Presentation transcript:

1 Deep Learning

2 Why?

3 Source: Huang et al., Communications ACM 01/2014

4

5

6 the 2013 International Conference on Learning Representations, the 2013 ICASSP’s special session on New Types of Deep Neural Network Learning for Speech Recognition and Related Applications, the 2013 ICML Workshop for Audio, Speech, and Language Processing, the 2012, 2011, and 2010 NIPS Workshops on Deep Learning and Unsupervised Feature Learning, 2013 ICML Workshop on Representation Learning Challenges, 2013 Intern. Conf. on Learning Representations, 2012 ICML Workshop on Representation Learning, ICML Workshop on Learning Architectures, Representations, and Optimization for Speech and Visual Information Processing, 2009 ICML Workshop on Learning Feature Hierarchies, 2009 NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, ICASSP deep learning tutorial, the special section on Deep Learning for Speech and Language Processing in IEEE Trans. Audio, Speech, and Language Processing (January 2012), the special issue on Learning Deep Architectures in IEEE Trans.. Pattern Analysis and Machine Intelligence (2013)

7 Geoffrey Hinton University of Toronto
”A fast learning algorithm for deep belief nets” -- Hinton et al., 2006 ”Reducing the dimensionality of data with neural networks” -- Hinton & Salakhutdinov Hat 1985 mehrere hidden layers eingeführt, University toronto Microsoft research google Neurowissenschaften sprache (Baker et al., 2009, 2009a; Deng, 1999, 2003) Visuell (George, 2008; Bouvrie, 2009; Poggio, 2007). Geoffrey Hinton University of Toronto

8 How?

9 Shallow learning SVM Linear & Kernel Regression
Hidden Markov Models (HMM) Gaussian Mixture Models (GMM) Single hidden layer MLP ... Limited modeling capability of concepts Cannot make use of unlabeled data

10 Neuronal Networks Machine Learning Classification Neurons
Knowledge from high dimensional data Classification Input: features of data supervised vs unsupervised labeled data Neurons

11 Multi Layer Perceptron
Multiple Layers Feed Forward Connected Weights 1-of-N Output [ Y1 , Y2 ] hidden output k wjk j 1 vij i input [ X1 , X2 , X3 ]

12 Backpropagation wjk vij k j i Minimize error of calculated output
Adjust weights Gradient Descent Procedure Forward Phase Backpropagation of errors For each sample, multiple epochs k wjk j vij i

13 Best Practice Normalization Overfitting/Generalisation
Prevent very high weights, Oscillation Overfitting/Generalisation Validation Set, Early Stopping Mini-Batch Learning update weights with multiple input vectors combined

14 Problems with Backpropagation
Multiple hidden Layers Get stuck in local optima start weights from random positions Slow convergence to optimum large training set needed Only use labeled data most data is unlabeled Generative Approach

15 Restricted Boltzmann Machines
Unsupervised Find complex regularities in training data Bipartite Graph visible, hidden layer Binary stochastic units On/Off with probability 1 Iteration Update Hidden Units Reconstruct Visible Units Maximum Likelihood of training data i j visible hidden wij

16 Restricted Boltzmann Machines
Training Goal: Best probable reproduction unsupervised data find latent factors of data set Adjust weights to get maximum probability of input data i j visible hidden wij

17 Training: Contrastive Divergence
Start with a training vector on the visible units. Update all the hidden units in parallel. Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again. j j i i t = t = 1 data reconstruction

18 Example: Handwritten 2s
50 binary neurons that learn features 50 binary neurons that learn features Increment weights between an active pixel and an active feature Decrement weights between an active pixel and an active feature 16 x 16 pixel image 16 x 16 pixel image data (reality) reconstruction

19

20

21

22

23

24 The final 50 x 256 weights: Each unit grabs a different feature

25 Example: Reconstruction
Reconstruction from activated binary features Reconstruction from activated binary features Data Data New test image from the digit class that the model was trained on Image from an unfamiliar digit class The network tries to see every image as a 2.

26 Deep Architecture Backpropagation, RBM as building blocks
Multiple hidden layers Motivation (why go deep?) Approximate complex decision boundary Fewer computational units for same functional mapping Hierarchical Learning Increasingly complex features work well in different domains Vision, Audio, …

27 Hierarchical Learning
Natural progression from low level to high level structure as seen in natural complexity Easier to monitor what is being learnt and to guide the machine to better subspaces

28 Stacked RBMs First learn one layer at a time by stacking RBMs.
Treat this as “pre-training” that finds a good initial set of weights which can then be fine-tuned by a local search procedure. Backpropagation can be used to fine-tune the model to be better at discrimination. Compose the two RBM models to make a single DBN model Then train this RBM copy binary state for each v Train this RBM first

29 Dimensionality reduction
Uses Dimensionality reduction

30 Dimensionality reduction
Use a stacked RBM as deep auto-encoder Train RBM with images as input & output Limit one layer to few dimensions  Information has to pass through middle layer

31 Dimensionality reduction
Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625  30) Original Deep RBN PCA

32 Dimensionality reduction
804’414 Reuters news stories, reduction to 2 dimensions PCA Deep RBN

33 Uses Classification

34 Unlabeled data Unlabeled data is readily available
Example: Images from the web Download 10’000’000 images Train a 9-layer DNN Concepts are formed by DNN  70% better than previous state of the art Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng

35 Uses AI

36 Artificial intelligence
Enduro, Atari 2600 Expert player: 368 points Deep Learning: 661 points Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

37 Uses Generative (Demo)

38 How to use it

39 How to use it Home page of Geoffrey Hinton Portal Accord.NET


Download ppt "Deep Learning."

Similar presentations


Ads by Google