Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Deep Learning and Neural Nets Spring 2015
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
Lecture 14 – Neural Networks
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
2/13/2018 4:38 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Neural networks and support vector machines
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: Neural Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Learning Amin Sobhani.
Deep Learning for Program Repair
Machine Learning & Deep Learning
Data Mining, Neural Network and Genetic Programming
Learning with Perceptrons and Neural Networks
ECE 5424: Introduction to Machine Learning
Computer Science and Engineering, Seoul National University
Deep Learning Insights and Open-ended Questions
Real Neurons Cell structures Cell body Dendrites Axon
Matt Gormley Lecture 16 October 24, 2016
ICS 491 Big Data Analytics Fall 2017 Deep Learning
CS 4501: Introduction to Computer Vision Basics of Neural Networks, and Training Neural Nets I Connelly Barnes.
Neural Networks CS 446 Machine Learning.
Intelligent Information System Lab
Neural Networks 2 CS446 Machine Learning.
Convolution Neural Networks
Deep Learning Workshop
CSE P573 Applications of Artificial Intelligence Neural Networks
A brief introduction to neural network
Introduction to Deep Learning for neuronal data analyses
CSE 473 Introduction to Artificial Intelligence Neural Networks
RNNs: Going Beyond the SRN in Language Prediction
Neural network systems
Goodfellow: Chap 6 Deep Feedforward Networks
Deep Learning Packages
Recurrent Neural Networks
Convolutional Neural Networks
Deep learning Introduction Classes of Deep Learning Networks
Overview of Machine Learning
Smart Robots, Drones, IoT
CSE 573 Introduction to Artificial Intelligence Neural Networks
Long Short Term Memory within Recurrent Neural Networks
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Capabilities of Threshold Neurons
Deep Learning for Non-Linear Control
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
RNNs: Going Beyond the SRN in Language Prediction
Neural networks (1) Traditional multi-layer perceptrons
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Backpropagation and Neural Nets
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Recurrent Neural Networks (RNNs)
Automatic Handwriting Generation
Introduction to Neural Networks
Deep Learning Libraries
Debasis Bhattacharya, JD, DBA University of Hawaii Maui College
CSC 578 Neural Networks and Deep Learning
CRCV REU 2019 Aaron Honculada.
Parallel Systems to Compute
An introduction to neural network and machine learning
Presentation transcript:

Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures Deep Learning Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures

Motivation AlphaGo Zero: improve over best humans in 40 days of training

A neuron f ∑ Output axon/synapses Cell body Dendrites y y = f(w1 • x1 + w2 • x2 + w3 • x3) = f(w • x) x: input vector w: weight vector ∑: summation f: activation function y: output vector w1 w2 w3 x1 x2 x3 x

A few neurons (a layer) f ∑ f ∑ f ∑ y1 y2 y3 y1 = f(w1 • X) W = [w1, w2, w3] y = f(W • X) W is now a weight matrix 1 layer “fully- connected” w1,1 w1,2 w1,3 w1 w3,1 w3,2 w3,3 w3 w2,1 w2,2 w2,3 w2 x1 x2 x3 x

A few stacked layers f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ Y fc3 Y = fc3 = f(W3 • fc2) = f(W3 • f(W2 • f(W1 • X))) fc2 = f(W2 • fc1) = f(W2 • f(W1 • X)) fc1 = f(W1 • X) Each layer usually contains thousands to hundred thousands of neurons Combine multiple input vectors x1, x2, …, xn to input matrix X which yields an output matrix Y. f ∑ f ∑ f ∑ fc2 f ∑ f ∑ f ∑ fc1 X

A few stacked layers Y fc3 Y = fc3 = f(W3 • fc2) = f(W3 • f(W2 • f(W1 • X))) fc2 = f(W2 • fc1) = f(W2 • f(W1 • X)) fc1 = f(W1 • X) Each layer usually contains thousands to hundred thousands of neurons Combine multiple input vectors x1, x2, …, xn to input matrix X which yields an output matrix Y. W3 fc2 W2 fc1 W1 X

A few more (the real world) VGG16 ResNet A few more (the real world) Alexnet InceptionV3 / GoogLeNet

How to train W? 1. Loss functions Exchangeable terms: Loss function = cost function = -(objective function) In supervised learning, loss functions describe deviation between predictions Ŷ and ground-truth outputs Y. Pick an objective which best describes prediction error for your task! Regression: L2 loss / mean squared error Classification: relative entropy

How to train W? 2. Follow the gradient If our network is completely linear, we can maximize the objective using basic calculus. (This yields linear regression or logistic regression.) Nonlinearities create non-convexity (saddle points and local optima) in the objective But gradient descent, a popular convex optimization tool, still works!

Gradient descent

How to train W with many layers? 3. Backpropagation! [Werbos+ 1982; Parker+ 1985; LeCun+ 1985; Rumelhart, Hinton+ 1986; …] Considering a single weight j at neuron i (w_ij): given error E, the neuron’s output o_i, and learning rate η, the weight change is: minimize gradient this is the backprop part The Wikipedia article on this is pretty good! https://en.wikipedia.org/wiki/Backpropagation Chain rule

Oodles of ways to follow the gradient Big ideas: Use the gradient to specify an acceleration rather than a velocity (Momentum, NAG, RMSProp) Optimize with a memory of past updates (Adagrad, Adadelta)

Initialization: where do the weights come from? What happens if we initialize the weights to zero? What happens if we initialize the weights to the same number? What happens if we initialize the weights to numbers of different magnitude?

Convolutions (ConvNets/CNNs) Fully-connected layers can be hard to train due to the number of weights Idea: if a feature (e.g. edge) at the bottom left of the image is important, it is also important at all other locations → apply the same weights (“filters”) across space Josh McDermott and Google also use this in audio http://cs231n.github.io/convolutional-networks/#conv

Recurrency (RNNs) Use to handle variable-length input (e.g. text in Natural Language Processing) Basically fully-connected layers with weights shared over time More advanced RNNs with explicit memory: LSTM, GRU

Attention Shift focus by weighing encoded input with recurrent attention matrix Relevant for images, text, ... Input Encoder Attention Decoder Output

Deep Reinforcement Learning Train deep nets with e.g. REINFORCE → deep RL

Generative Adversarial Networks (GANs)

Hyperparameters Architecture (so many!) Optimizer Training details (so many!) Automated Architecture Search Model proposes Generator evaluate Performance new knowledge

Frameworks Recommended (highly dependent on use-case!) Matlab (very high-level, low flexibility to develop new models, but perhaps suited for quick feature computation) Keras (high-level, some flexibility; better flexibility when coding in TensorFlow backend) PyTorch (medium-level, full flexibility, good debugging) Others: TensorFlow, Torch, Caffe, CNTK, MXNet, Theano

Open Questions Few-shot learning: DL needs millions of examples to train successfully - humans can learn from often just a single example and generalize like crazy Unsupervised learning Multi-sensory/-domain integration Memory? Do we need feedback loops? Importance of “sleep” How to reason about decisions? What is actual “understanding”? How can we intuitively understand millions of weights? Is backpropagation/SGD the right learning rule? Does the brain do backprop? Maybe. Rethink everything and use capsules? ... If you have neuroscience-y ideas for any of these, let’s chat! There’s likely a project in it

Reading recommendations Pattern Recognition and Machine Learning, Chris Bishop. 2006 (the ML classic book, good to understand the maths) Deep learning, Yann LeCun and Yoshua Bengio and Geoffrey Hinton. Nature 2015 (short overview) Deep Learning in Neural Networks: An Overview, Jürgen Schmidhuber. Neural Networks, 2015 (historical overview) Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville. 2016 (many people recommend this book, fully available online)

Backup

DiCarlo, unpublished

Some training tips Split data into dev (80%) and test (20%) Split dev into dev-train (60%) dev-validation (20%) ← avoid overfitting on dev-train dev-test (20%) ← test generalization to optimize model Test ← test actual generalization of model (we optimize on dev-test by hand)