Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Deep Learning

Similar presentations


Presentation on theme: "Introduction to Deep Learning"— Presentation transcript:

1 Introduction to Deep Learning

2 What is DL? (1) “...a class of machine learning techniques, developed mainly since 2006, where many layers of non-linear information processing stages or hierarchical architectures are exploited.” - Deep learning is distinguished by its learning of multiple levels of features, a hierarchy of features These multiple levels of representation correspond to multiple levels of abstraction

3 What is DL? (2)

4 What is DL? (3) So basically... We're interested in learning representations of data that are useful We want to do this automatically because designing features is hard Representation learning is the field of study concerned with automatically finding features from data

5 What is DL? (4)

6 A (very brief) history (1)
First artificial neuron was the Threshold Logic Unit (or binary threshold neuron) – McCulloch- Pitts Hebbian Learning – Weights (synapses) adapt during learning. These are physical changes in the brain Led to Perceptron learning rule and also Widrow-Hoff learning rule. Early implementations at Stanford: Adaline and Madaline ([Many] Adaptive Linear Element[s]) 1960's: Workable AI just 10 years away! Spoiler alert: It wasn't

7 McCulloch Pitts Neuron Model

8 A (very brief) history (2)
Problems: Can't learn or even represent XNOR, XOR Can't learn to discriminate between non-linearly seperable inputs [Minsky and Papert, 1969] In order to overcome these limitations, we need an extra layer in the network

9 A (very brief) history (3)
Backpropagation! Discovered in the 1970's Forgotten Discovered in the 1980's (Hinton) Hype, excitement Lots of failure in industry “AI Winter” SVMs

10 A (very brief) history (4)
Training deep architectures met mostly with failure (Why should we care about deep architectures anyway?) A Fast Learning Algorithm for Deep Belief Nets – 2006 Efficient Learning of Sparse Representations with an Energy-Based Model Greedy Layer-Wise Training of Deep Networks Since these papers, hundreds of papers have been published

11 Why do we want deep nets? (1)
Artificial neural nets with one hidden layer can approximate any* function Number of nodes required to do so could grow very quickly For some function classes, a network with (k-1) layers would need a number of nodes exponential in the number of inputs, whilst a k layer network would be polynomial (parity function is an example of this). *under some assumptions

12 Why do we want deep nets? (2)
Brain has a deep architecture Cognitive processes seem deep Humans organize thoughts hierarchically First learn simple concepts and then compose them into more difficult ones

13 Why is training so difficult?
Before 2006, training deep networks yielded worse results (with the exception of convolutional neural nets) Until then, researchers were randomly initializing weights, then training using a labeled training set. This was unsuccesful Scarcity of labeled data Bad local optima – minimizing error involves optimizing a highly non-convex function. Not only local minima, but saddle points Diffusion of gradients – when error derivatives are propagated back, the gradients rapidly diminish as the depth of the network increases

14 What to do? 2006 – 3 important papers, spearheaded by Hinton et al's “A fast learning algorithm for deep belief nets” 3 Key principles in the papers: Unsupervised learning of representations is used to pre-train each layer Unsupervised learning is used to learn representations from the features learned in the previous layer Supervised learning used to fine tune all the layers.

15 Greedy Layerwise learning
Unsupervised learning is about finding structure in data It is used to initialize the parameters of the hidden layers Turns out this procedure initializes weights in a region near good local minima We want to learn not only a good classifier, but also something about the structure of the input Greedy? (point 2 of previous slide)

16 Distributed Belief Networks
Idea of unsupervised pretraining came from Hinton's work on distributed belief networks These use Restricted Boltzmann Machines as the building blocks Train 2 layers at a time and ignore the rest Use features learned in previous layer to train next layer

17 Different Architectures
The paper by Deng [1] outlines three broad classes of deep learning architectures: Generative Discriminative Hybrid [1] Li Deng. Three classes of deep learning architectures and their applications: A tutorial survey.

18 Generative “...are intended to characterize the high-order correlation properties of the observed or visible data for pattern analysis or synthesis purposes, and/or characterize the joint statistical distributions of the visible data and their associated classes...” Can still use this type of architecture via Bayes rule as discriminative type

19 Discriminative “...are intended to directly provide discriminative power for pattern classification, often by characterizing the posterior distributions of classes conditioned on the visible data...”

20 Hybrid “...the goal is discrimination but is assisted with the outcomes of generative architectures, or discriminative criteria are used to learn the parameters in any of the deep generative models...”

21 Applications ernrecognition.html “...our NN achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition of INI/RUB [14,14b]. Humans got 1.16% on average (over 2 times worse - some humans will do better than that though)...” Single-image, multi-class classification problem More than 40 classes More than 50,000 images in total Large, lifelike database

22 Applications “...has already been put to use in services like Apple’s Siri virtual personal assistant, which is based on Nuance Communications’ speech recognition service, and in Google’s Street View, which uses machine vision to identify specific addresses” deep-learning-a-part-of-artificial-intelligence.html?hpw&pagewanted=all


Download ppt "Introduction to Deep Learning"

Similar presentations


Ads by Google