Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Multi-Layer Perceptron (MLP)
Greedy Layer-Wise Training of Deep Networks
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Backpropagation Learning Algorithm
NEURAL NETWORKS Backpropagation Algorithm
Deep Learning Bing-Chen Tsai 1/21.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advanced topics.
Deep Learning and Neural Nets Spring 2015
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Neural Network Intro Slides
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
How to do backpropagation in a brain
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
How to do backpropagation in a brain
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Cascade Correlation Architecture and Learning Algorithm for Neural Networks.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CSE & CSE6002E - Soft Computing Winter Semester, 2011 Neural Networks Videos Brief Review The Next Generation Neural Networks - Geoff Hinton.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Introduction to Deep Learning
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Welcome deep loria !.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Neural Network Architecture Session 2
Learning Deep Generative Models by Ruslan Salakhutdinov
Deep Learning Amin Sobhani.
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
Restricted Boltzmann Machines for Classification
Spring Courses CSCI 5922 – Probabilistic Models
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
Structure learning with deep autoencoders
Deep Architectures for Artificial Intelligence
of the Artificial Neural Networks.
Artificial Intelligence 12. Two Layer ANNs
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
CSC 578 Neural Networks and Deep Learning
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Deep Learning: Back To The Future

Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers got faster Larger data sets became available What is hot 25 years later Neural networks … but they are informed by graphical models!

Brief History Of Machine Learning 1960s Perceptrons 1969 Minsky & Papert book Neural Nets and Back Propagation Support-Vector Machines Bayesian Models Deep Networks

What My Lecture Looked Like In 1987

The Limitations Of Two Layer Networks Many problems cant be learned without a layer of intermediate or hidden units. Problem Where does training signal come from? Teacher specifies target outputs, not target hidden unit activities. If you could learn input->hidden and hidden->output connections, you could learn new representations! But how do hidden units get an error signal?

Why Stop At One Hidden Layer? E.g., vision hierarchy for recognizing handprinted text Wordoutput layer Characterhidden layer 3 Strokehidden layer 2 Edgehidden layer 1 Pixelinput layer

Demos Yann LeCuns LeNet5

Why Deeply Layered Networks Fail Credit assignment problem How is a neuron in layer 2 supposed to know what it should output until all the neurons above it do something sensible? How is a neuron in layer 4 supposed to know what it should output until all the neurons below it do something sensible? Mathematical manifestation Error gradients get squashed as they are passed back through a deep network

Solution Traditional method of training Random initial weights Alternative Do unsupervised learning layer by layer to get weights in a sensible configuration for the statistics of the input. Then when net is trained in a supervised fashion, credit assignment will be easier.

Autoencoder Networks Self-supervised training procedure Given a set of input vectors (no target outputs) Map input back to itself via a hidden layer bottleneck How to achieve bottleneck? Fewer neurons Sparsity constraint Information transmission constraint (e.g., add noise to unit, or shut off randomly, a.k.a. dropout)

Autoencoder Combines An Encoder And A Decoder Encoder Decoder

Stacked Autoencoders Note that decoders can be stacked to produce a generative model of the domain copy... deep network

Neural Net Can Be Viewed As A Graphical Model Deterministic neuron Stochastic neuron x1x1 x2x2 x4x4 x3x3 y

Boltzmann Machine (Hinton & Sejnowski, circa 1985) Undirected graphical model Each node is a stochastic neuron Potential function defined on each pair of neurons Algorithms were developed for doing inference for special cases of the architecture. E.g., Restricted Boltzmann Machine 2 layers Completely interconnected between layers No connections within layer

Punch Line Deep network can be implemented as a multilayer restricted Boltzmann machine Sequential layer-to-layer training procedure Training requires probabilistic inference Update rule: contrastive divergence Different research groups prefer different neural substrate, but it doesnt really matter if you use deterministic neural net vs. RBM

From Ngs group

Suskever, Martens, Hinton (2011) Generating Text From A Deep Belief Net Wikipedia The meaning of life is the tradition of the ancient human reproduction: it is less favorable to the good boy for when to remove her bigger. In the shows agreement unanimously resurfaced. The wild pasteured with consistent street forests were incorporated by the 15th century BE. In 1996 the primary rapford undergoes an effort that the reserve conditioning, written into Jewish cities, sleepers to incorporate the.St Eurasia that activates the population. Mar??a Nationale, Kelli, Zedlat-Dukastoe, Florendon, Ptus thought is. To adapt in most parts of North America, the dynamic fairy Dan please believes, the free speech are much related to the NYT while he was giving attention to the second advantage of school building a 2-for-2 stool killed by the Cultures saddled with a half- suit defending the Bharatiya Fernall s office. Ms. Claire Parters will also have a history temple for him to raise jobs until naked Prodiena to paint baseball partners, provided people to ride both of Manhattan in 1978, but what was largely directed to China in 1946, focusing on the trademark period is the sailboat yesterday and comments on whom they obtain overheard within the 120th anniversary, where many civil rights defined, officials said early that forms, said Bernard J. Marco Jr. of Pennsylvania, was monitoring New York

2013 News No need to use unsupervised training or probabilistic models if… You use clever tricks of the neural net trade, i.e., Back propagation with deep networks rectified linear units dropout weight maxima

Krizhevsky, Sutskever, & Hinton ImageNet competition 15M images in 22k categories For contest, 1.2M images in 1k categories Classification: can you name object in 5 guesses?

2012 Results 2013: Down to 11% error