Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures

Slides:

Advertisements

Similar presentations

Neural networks Introduction Fitting neural networks

Advertisements

Deep Learning and Neural Nets Spring 2015

Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.

Lecture 14 – Neural Networks

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Lecture 12. Outline of Rule-Based Classification 1. Overview of ANN 2. Basic Feedforward ANN 3. Linear Perceptron Algorithm 4. Nonlinear and Multilayer.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

2/13/2018 4:38 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.

Neural networks and support vector machines

RNNs: An example applied to the prediction task

CS 388: Natural Language Processing: Neural Networks

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Deep Learning Amin Sobhani.

Deep Learning for Program Repair

Machine Learning & Deep Learning

Data Mining, Neural Network and Genetic Programming

Learning with Perceptrons and Neural Networks

ECE 5424: Introduction to Machine Learning

Computer Science and Engineering, Seoul National University

Deep Learning Insights and Open-ended Questions

Real Neurons Cell structures Cell body Dendrites Axon

Matt Gormley Lecture 16 October 24, 2016

ICS 491 Big Data Analytics Fall 2017 Deep Learning

CS 4501: Introduction to Computer Vision Basics of Neural Networks, and Training Neural Nets I Connelly Barnes.

Neural Networks CS 446 Machine Learning.

Intelligent Information System Lab

Neural Networks 2 CS446 Machine Learning.

Convolution Neural Networks

Deep Learning Workshop

CSE P573 Applications of Artificial Intelligence Neural Networks

A brief introduction to neural network

Introduction to Deep Learning for neuronal data analyses

CSE 473 Introduction to Artificial Intelligence Neural Networks

RNNs: Going Beyond the SRN in Language Prediction

Neural network systems

Goodfellow: Chap 6 Deep Feedforward Networks

Deep Learning Packages

Recurrent Neural Networks

Convolutional Neural Networks

Deep learning Introduction Classes of Deep Learning Networks

Overview of Machine Learning

Smart Robots, Drones, IoT

CSE 573 Introduction to Artificial Intelligence Neural Networks

Long Short Term Memory within Recurrent Neural Networks

Neural Networks Geoff Hulten.

Lecture: Deep Convolutional Neural Networks

Capabilities of Threshold Neurons

Deep Learning for Non-Linear Control

Neural Networks ICS 273A UC Irvine Instructor: Max Welling

RNNs: Going Beyond the SRN in Language Prediction

Neural networks (1) Traditional multi-layer perceptrons

实习生汇报 ——北邮张安迪.

Neural networks (3) Regularization Autoencoder

Backpropagation and Neural Nets

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

CSC321: Neural Networks Lecture 11: Learning in recurrent networks

Recurrent Neural Networks (RNNs)

Automatic Handwriting Generation

Introduction to Neural Networks

Deep Learning Libraries

Debasis Bhattacharya, JD, DBA University of Hawaii Maui College

CSC 578 Neural Networks and Deep Learning

CRCV REU 2019 Aaron Honculada.

Parallel Systems to Compute

An introduction to neural network and machine learning

Presentation transcript:

Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures Deep Learning Martin Schrimpf & Jon Gauthier MIT BCS Peer Lectures

Motivation AlphaGo Zero: improve over best humans in 40 days of training

A neuron f ∑ Output axon/synapses Cell body Dendrites y y = f(w1 • x1 + w2 • x2 + w3 • x3) = f(w • x) x: input vector w: weight vector ∑: summation f: activation function y: output vector w1 w2 w3 x1 x2 x3 x

A few neurons (a layer) f ∑ f ∑ f ∑ y1 y2 y3 y1 = f(w1 • X) W = [w1, w2, w3] y = f(W • X) W is now a weight matrix 1 layer “fully- connected” w1,1 w1,2 w1,3 w1 w3,1 w3,2 w3,3 w3 w2,1 w2,2 w2,3 w2 x1 x2 x3 x

A few stacked layers f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ f ∑ Y fc3 Y = fc3 = f(W3 • fc2) = f(W3 • f(W2 • f(W1 • X))) fc2 = f(W2 • fc1) = f(W2 • f(W1 • X)) fc1 = f(W1 • X) Each layer usually contains thousands to hundred thousands of neurons Combine multiple input vectors x1, x2, …, xn to input matrix X which yields an output matrix Y. f ∑ f ∑ f ∑ fc2 f ∑ f ∑ f ∑ fc1 X

A few stacked layers Y fc3 Y = fc3 = f(W3 • fc2) = f(W3 • f(W2 • f(W1 • X))) fc2 = f(W2 • fc1) = f(W2 • f(W1 • X)) fc1 = f(W1 • X) Each layer usually contains thousands to hundred thousands of neurons Combine multiple input vectors x1, x2, …, xn to input matrix X which yields an output matrix Y. W3 fc2 W2 fc1 W1 X

A few more (the real world) VGG16 ResNet A few more (the real world) Alexnet InceptionV3 / GoogLeNet

How to train W? 1. Loss functions Exchangeable terms: Loss function = cost function = -(objective function) In supervised learning, loss functions describe deviation between predictions Ŷ and ground-truth outputs Y. Pick an objective which best describes prediction error for your task! Regression: L2 loss / mean squared error Classification: relative entropy

How to train W? 2. Follow the gradient If our network is completely linear, we can maximize the objective using basic calculus. (This yields linear regression or logistic regression.) Nonlinearities create non-convexity (saddle points and local optima) in the objective But gradient descent, a popular convex optimization tool, still works!

Gradient descent

How to train W with many layers? 3. Backpropagation! [Werbos+ 1982; Parker+ 1985; LeCun+ 1985; Rumelhart, Hinton+ 1986; …] Considering a single weight j at neuron i (w_ij): given error E, the neuron’s output o_i, and learning rate η, the weight change is: minimize gradient this is the backprop part The Wikipedia article on this is pretty good! https://en.wikipedia.org/wiki/Backpropagation Chain rule

Oodles of ways to follow the gradient Big ideas: Use the gradient to specify an acceleration rather than a velocity (Momentum, NAG, RMSProp) Optimize with a memory of past updates (Adagrad, Adadelta)

Initialization: where do the weights come from? What happens if we initialize the weights to zero? What happens if we initialize the weights to the same number? What happens if we initialize the weights to numbers of different magnitude?

Convolutions (ConvNets/CNNs) Fully-connected layers can be hard to train due to the number of weights Idea: if a feature (e.g. edge) at the bottom left of the image is important, it is also important at all other locations → apply the same weights (“filters”) across space Josh McDermott and Google also use this in audio http://cs231n.github.io/convolutional-networks/#conv

Recurrency (RNNs) Use to handle variable-length input (e.g. text in Natural Language Processing) Basically fully-connected layers with weights shared over time More advanced RNNs with explicit memory: LSTM, GRU

Attention Shift focus by weighing encoded input with recurrent attention matrix Relevant for images, text, ... Input Encoder Attention Decoder Output

Deep Reinforcement Learning Train deep nets with e.g. REINFORCE → deep RL

Generative Adversarial Networks (GANs)

Hyperparameters Architecture (so many!) Optimizer Training details (so many!) Automated Architecture Search Model proposes Generator evaluate Performance new knowledge

Frameworks Recommended (highly dependent on use-case!) Matlab (very high-level, low flexibility to develop new models, but perhaps suited for quick feature computation) Keras (high-level, some flexibility; better flexibility when coding in TensorFlow backend) PyTorch (medium-level, full flexibility, good debugging) Others: TensorFlow, Torch, Caffe, CNTK, MXNet, Theano

Open Questions Few-shot learning: DL needs millions of examples to train successfully - humans can learn from often just a single example and generalize like crazy Unsupervised learning Multi-sensory/-domain integration Memory? Do we need feedback loops? Importance of “sleep” How to reason about decisions? What is actual “understanding”? How can we intuitively understand millions of weights? Is backpropagation/SGD the right learning rule? Does the brain do backprop? Maybe. Rethink everything and use capsules? ... If you have neuroscience-y ideas for any of these, let’s chat! There’s likely a project in it

Reading recommendations Pattern Recognition and Machine Learning, Chris Bishop. 2006 (the ML classic book, good to understand the maths) Deep learning, Yann LeCun and Yoshua Bengio and Geoffrey Hinton. Nature 2015 (short overview) Deep Learning in Neural Networks: An Overview, Jürgen Schmidhuber. Neural Networks, 2015 (historical overview) Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville. 2016 (many people recommend this book, fully available online)

Backup

DiCarlo, unpublished

Some training tips Split data into dev (80%) and test (20%) Split dev into dev-train (60%) dev-validation (20%) ← avoid overfitting on dev-train dev-test (20%) ← test generalization to optimize model Test ← test actual generalization of model (we optimize on dev-test by hand)