Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio


Similar presentations
Multi-Layer Perceptron (MLP)

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Machine Learning Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Back-Propagation Algorithm
Artificial Neural Networks
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
How to do backpropagation in a brain
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
Neural Networks in Computer Science n CS/PY 231 Lab Presentation # 1 n January 14, 2005 n Mount Union College.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CS621 : Artificial Intelligence
Neural Networks and Deep Learning Slides credit: Geoffrey Hinton and Yann LeCun.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Perceptrons Michael J. Watts
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Chapter 6 Neural Network.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
1 Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSC321: Neural Networks Lecture 1: What are neural networks? Geoffrey Hinton
Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.
Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks.
Neural networks and support vector machines
Neural Networks.
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Learning Amin Sobhani.
an introduction to: Deep Learning
Learning in Neural Networks
Data Mining, Neural Network and Genetic Programming
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Intelligent Information System Lab
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSC 578 Neural Networks and Deep Learning
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
An Introduction To The Backpropagation Algorithm
Capabilities of Threshold Neurons
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Computer Vision Lecture 19: Object Recognition III
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks

Word representations David Kauchak CS158 – Fall 2016.
Modeling IDS using hybrid intelligent systems
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio Deep Learning Sen Song Slides adapted from Geoffrey Hinton, Yann Le Cun, Yoshua Bengio and G. A. Tagliarini

Part 0 Success of Deep Learning

Lecture 10 Deep Learning Sen Song 2013/12/14 Adapted from slides by Geoffrey Hinton and Yann Lecun

深度学习的成功: 回过头看: 很早就发现的受大脑启发的算法 + 大数据 大的计算机 但真正的历史是曲折的。。。

物体识别问题 Selectivity vs Invariance A key computational issue in what pathway (object recognition) is the selectivity-invariance trade-off: Recognition must be able to finely discriminate between different objects or object classes (such as the faces ) while being tolerant of object transformations (such as scaling, translation, illumination, changes in viewpoint, and clutter), as well as non-rigid transformations (such as variations in shape within a class), as in the change of facial expression in recognizing faces. Serre and Poggio, 2010

Hierarchy of visual areas What What versus Where pathways What pathway: 32 cortical areas 187 linkages Felleman and Van Essen, 1991

Orientation selectivity of V1 Simple Cell Hubel and Wiesel, 1968

Selectivity emerging: Simple Cell Hubel and Wiesel, 1962

Invariance emerging: Complex Cell Hubel and Wiesel, 1962

The architecture of LeNet5 很早就实现就手写体识别,为啥没早日解决物体识别?

David Marr

Marr’s approach 1. Computational theory: What is the goal of the computation, why is it appropriate, and what is the logic of the strategy by which it can be carried out? 2. Representation and algorithm: How can this computational theory be implemented? In particular, what is the representation for the input and output, and what is the algorithm for the transformation? 3. Hardware implementation: How can the representation and algorithm be realized physically? Marr puts great importance to the first level: ”To phrase the matter in another way, an algorithm is likely to be understood more readily by understanding the nature of the problem being solved than by examining the mechanism (and hardware) in which it is embodied.”

Part I The ups and downs of backpropagation

A brief history of backpropagation The backpropagation algorithm for learning multiple layers of features was invented several times in the 70’s and 80’s: Bryson & Ho (1969) linear Werbos (1974) Rumelhart et. al. in 1981 Parker (1985) LeCun (1985) Rumelhart et. al. (1985) Backpropagation clearly had great promise for learning multiple layers of non-linear feature detectors. But by the late 1990’s most serious researchers in machine learning had given up on it. It was still widely used in psychological models and in practical applications such as credit card fraud detection.

Why backpropagation failed The popular explanation of why backpropagation failed in the 90’s: It could not make good use of multiple hidden layers. (except in convolutional nets) It did not work well in recurrent networks or deep auto-encoders. Support Vector Machines worked better, required less expertise, produced repeatable results, and had much fancier theory. The real reasons it failed: Computers were thousands of times too slow. Labeled datasets were hundreds of times too small. Deep networks were too small and not initialized sensibly. These issues prevented it from being successful for tasks where it would eventually be a big win.

A spectrum of machine learning tasks Typical Statistics---------------Artificial Intelligence Low-dimensional data (e.g. less than 100 dimensions) Lots of noise in the data. Not much structure in the data. The structure can be captured by a fairly simple model. The main problem is separating true structure from noise. Not ideal for non-Bayesian neural nets. Try SVM or GP. High-dimensional data (e.g. more than 100 dimensions) The noise is not the main problem. There is a huge amount of structure in the data, but its too complicated to be represented by a simple model. The main problem is figuring out a way to represent the complicated structure so that it can be learned. Let backpropagation figure it out.

Why Support Vector Machines were never a good bet for Artificial Intelligence tasks that need good representations View 1: SVM’s are just a clever reincarnation of Perceptrons. They expand the input to a (very large) layer of non-linear non-adaptive features. They only have one layer of adaptive weights. They have a very efficient way of fitting the weights that controls overfitting. View 2: SVM’s are just a clever reincarnation of Perceptrons. They use each input vector in the training set to define a non-adaptive “pheature”. The global match between a test input and that training input. They have a clever way of simultaneously doing feature selection and finding weights on the remaining features.

Historical document from AT&T Adaptive Systems Research Dept Historical document from AT&T Adaptive Systems Research Dept., Bell Labs

Why we need backpropagation Networks without hidden units are very limited in the input-output mappings they can model. More layers of linear units do not help. Its still linear. Fixed output non-linearities are not enough We need multiple layers of adaptive non-linear hidden units. This gives us a universal approximator. But how can we train such nets? We need an efficient way of adapting all the weights, not just the last layer. This is hard. Learning the weights going into hidden units is equivalent to learning features. Nobody is telling us directly what hidden units should do.

What perceptrons cannot do The binary threshold output units cannot even tell if two single bit numbers are the same! Same: (1,1)  1; (0,0)  1 Different: (1,0)  0; (0,1)  0 The following set of inequalities is impossible: Data Space 0,1 1,1 weight plane output =1 output =0 0,0 1,0 The positive and negative cases cannot be separated by a plane

XOR Problem 26

Learning by perturbing weights Randomly perturb one weight and see if it improves performance. If so, save the change. Very inefficient. We need to do multiple forward passes on a representative set of training data just to change one weight. Towards the end of learning, large weight perturbations will nearly always make things worse. We could randomly perturb all the weights in parallel and correlate the performance gain with the weight changes. Not any better because we need lots of trials to “see” the effect of changing one weight through the noise created by all the others. output units hidden units input units Learning the hidden to output weights is easy. Learning the input to hidden weights is hard.

The idea behind backpropagation We don’t know what the hidden units ought to do, but we can compute how fast the error changes as we change a hidden activity. Instead of using desired activities to train the hidden units, use error derivatives w.r.t. hidden activities. Each hidden activity can affect many output units and can therefore have many separate effects on the error. These effects must be combined. We can compute error derivatives for all the hidden units efficiently. Once we have the error derivatives for the hidden activities, its easy to get the error derivatives for the weights going into a hidden unit.

Non-linear neurons with smooth derivatives For backpropagation, we need neurons that have well-behaved derivatives. Typically they use the logistic function The output is a smooth function of the inputs and the weights. 1 0.5

Basic Neuron Model In A Feedforward Network Inputs xi arrive through pre-synaptic connections Synaptic efficacy is modeled using real weights wi The response of the neuron is a nonlinear function f of its weighted inputs 30 Copyright G. A. Tagliarini, PhD 2019/2/18

Feedforward Network Training by Backpropagation: Process Summary Select an architecture Randomly initialize weights While error is too large Select training pattern and feedforward to find actual network output Calculate errors and backpropagate error signals Adjust weights Evaluate performance using the test set 41 Copyright G. A. Tagliarini, PhD 2019/2/18