Deep Learning – An Introduction

Slides:

Advertisements

Similar presentations

Greedy Layer-Wise Training of Deep Networks

Advertisements

Deep Learning Bing-Chen Tsai 1/21.

Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.

CS590M 2008 Fall: Paper Presentation

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

Advanced topics.

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

ImageNet Classification with Deep Convolutional Neural Networks

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.

Deep Learning Early Work Why Deep Learning Stacked Auto Encoders

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

How to do backpropagation in a brain

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Neural Networks Basic concepts ArchitectureOperation.

Deep Belief Networks for Spam Filtering

IT 691 Final Presentation Pace University Created by: Robert M Gust Mark Lee Samir Hessami Mark Lee Samir Hessami.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.

Comp 5013 Deep Learning Architectures Daniel L. Silver March,

Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk

How to do backpropagation in a brain

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

Advances in Modeling Neocortex and its impact on machine intelligence Jeff Hawkins Numenta Inc. VS265 Neural Computation December 2, 2010 Documentation.

A shallow introduction to Deep Learning

NEURAL NETWORKS FOR DATA MINING

Varieties of Helmholtz Machine Peter Dayan and Geoffrey E. Hinton, Neural Networks, Vol. 9, No. 8, pp , 1996.

ARTIFICIAL NEURAL NETWORKS. Overview EdGeneral concepts Areej:Learning and Training Wesley:Limitations and optimization of ANNs Cora:Applications and.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.

Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.

Introduction to Deep Learning

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.

Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Unsupervised Learning of Video Representations using LSTMs

Learning Deep Generative Models by Ruslan Salakhutdinov

Deep Learning Amin Sobhani.

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Article Review Todd Hricik.

Matt Gormley Lecture 16 October 24, 2016

Restricted Boltzmann Machines for Classification

Generative Adversarial Networks

Multimodal Learning with Deep Boltzmann Machines

Supervised Training of Deep Networks

Deep learning and applications to Natural language processing

Structure learning with deep autoencoders

Unsupervised Learning and Autoencoders

Deep Learning Workshop

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Department of Electrical and Computer Engineering

Deep learning Introduction Classes of Deep Learning Networks

Deep Architectures for Artificial Intelligence

ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)

Neural Networks Geoff Hulten.

Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.

Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Deep Learning – An Introduction Aaron Crandall, 2015

What is Deep Learning? Architectures with more mathematical transformations from source to target Sparse representations Stacking based learning approaches More focus on handling unlabeled data More complex nodes in the network I'm not sure this is needed

Motivations for Deep Learning Automatic feature extraction Less human effort Unsupervised learning Modern data sets are enormous Concept learning We want stable concept learners Learning from unlabeled data Not only unsup, but unlabeled

Why Deep Learning? Shallow models are not for learning high-level abstractions Ensembles do not learn features first Graphical models could be deep nets, but mostly not Unsupervised learning could be “local-learning” Resemble boosting with each layer being like a weak learner

More of Why Learning is weak in directed graphical models with many hidden variables Sparsity and regularization Existing unsupervised learning often do not learn multiple levels of representation Layer-wised unsupervised learning Multi-task learning transfer learning and self-taught learning Other issues: scalability & parallelism big data

Shallow vs. Deep Learning Most AI has been shallow architectures: 1-3 layers of transformation Deep architectures just do more: 4-7 layers (or more) of transformation Deep is also a comparative term

Depth Comparisons Different algorithms have depths in transformations HMM: 2-3 Neural Nets: 2-3 Naive Bayes: 2 SVM: 3 Ensembles: <past level>++ Bengio's work shows more depth is beneficial (If you can train it properly)

Depths of Deep Learning Convolutional Neural Networks

Feature Extraction Hinton's work centers around not needing to find good features He argues that once you have the right features from the data, the algorithm you pick is relatively unimportant The normal process is very intuitive and requires significant hands on work by AI developers Other approaches try to automatically determine the “best” features before passing them to the classifier, but often at a significant computational cost The goal is then to find algorithms (both training and architecturally) to not explicitly do that feature discovery work, but to build a system directly from the data itself

The Vanishing Gradient Problem Gradient is progressively getting more dilute Below top few layers, correction signal is minimal Gets stuck in local minima Especially since they start out far from ‘good’ regions (i.e., random initialization) In usual settings, we can use only labeled data Almost all data is unlabeled! The brain can learn from unlabeled data This has plagued Backpropogation (for 20+ years)

Deep Network Training Use unsupervised learning (greedy layer-wise training) Allows abstraction to develop naturally from one layer to another Help the network initialize with good parameters Perform supervised top-down training as final step Refine the features (intermediate layers) so that they become more relevant for the task Many papers call this “smoothing” or a “finishing” pass

Deep Belief Networks (DBNs) Probabilistic generative model Deep architecture – multiple layers Bidirectional layer interconnections Unsupervised pre-learning provides a good initialization of the network Maximizing the lower-bound of the log-likelihood of the data Supervised fine-tuning Generative: Up-down algorithm Discriminative: backpropagation Hinton et. al 2006

DBN Greedy training First step: Train the RBM Construct an RBM with an input layer v and a hidden layer h Train the RBM One (or more) passes for each sample in the training set

DBN Greedy training Second step: Stack another hidden layer on top of the RBM to form a new RBM Fix W1, sample h1 from Q(h1 | v) as input. Train W2 as RBM.

DBN Greedy training Third step: And so on... Continue to stack layers on top of the network, train it as previous step, with sample sampled from Q(h2 | h1) And so on...

Why greedy training works? RBM specifies P(v,h) from P(v|h) and P(h|v) Implicitly defines P(v) and P(h) Key idea of stacking Keep P(v|h) from 1st RBM Replace P(h) by the distribution generated by 2nd level RBM

Summary of Predictive Sparse Coding (Supervised Deep Nets) Phase 1: train first layer using PSD Phase 2: use encoder+absolute value as feature extractor Phase 3: train the second layer using PSD Phase 4: use encoder + absolute value as 2nd feature extractor Phase 5: train a supervised classifier on top Phase 6: (optional): train the entire system with supervised back-propagation

Hierarchical Learning Mimics mammalian vision Natural progression from low to high level structure Easier to monitor what is being learned Lower level representations may be used for various tasks

Deep Boltzmann Machines Slide Credit: R. Salskhutdinov

Deep Boltzmann Machines Pre-training: Can (must) initialize from stacked RBMs Generative fine-tuning: Positive phase: variational approximation (mean-field) This does resemble backprop in many ways. Negative phase: persistent chain (stochastic approxiamtion) Estimates the function currently being integrated by the Boltzmann machine Discriminative fine-tuning: backpropagation

Examples of Success: Handwriting Classifier Learning on predicting MNIST handwriting Stacked learning Core DBN implementation Hadoop execution https://www.paypal-engineering.com/2015/01/12/deep-learning-on-hadoop-2-0-2/

Experiments The problem is BM vs DBN training time: 1000:1 iterations per sample Video of Hinton Here! https://www.youtube.com/watch?feature=player_detailpage&v=AyzOUbkUf3M#t=1290

Deep Autoencoder Architecture Trained in layers Fixed input width Only input is word frequency of 2000 most common words for each document 400k documents Input == Output target With all data forced through 2 nodes

PCA vs. DBN Autoencoder on Texts Hinton video #2 https://www.youtube.com/watch?feature=player_detailpage&v=AyzOUbkUf3M#t=1898

Denoising Autoencoder Input == Output training Data passes through reduced feature space, forcing compression through feature extraction

Denoising An Image It is never perfect, but… http://www.cs.nyu.edu/~ranzato/research/projects.html

Why Google Wanted This Google stole Hinton from Univ of Toronto The primary need was for similarity analysis of documents Hinton's Autoencoders were shown to compress documents into a binary representation where each bit would find the neighboring documents in n dimensional space https://www.youtube.com/watch?feature=player_detailpage&v=AyzOUbkUf3M#t=2034

Convolutional Neural Networks More complex initial layers Feed forward only Stacked backpropogation training Focused on vision processing Overlapping neurons within the visual field Reduced interconnectivity, exploiting physically related sub-fields within the data Explicit pooling stages to bring prior layer’s independent processing units into the next stage Low pre-processing target http://deeplearning.net/tutorial/lenet.html

An Alternative Architecture: NuPIC From a startup called Numenta: http://numenta.org/ http://numenta.org/htm-white-paper.html Very biologically inspired Hierarchal Temporal Memory (HTM) Designed to do real time streaming of temporal data with sparse learning and multi-target functions in unsupervised situations Each level of the structure has multiple layers, where the training is randomly targeted Jeff Hawkins talk https://www.youtube.com/watch?v=1_eT5bsS4bQ#t=242

NuPic Internals: HTM Hierarchical Temporal Memory Levels of stacked cells Temporal Operates over time series data in an unsup manner Memory Columns of cells decide to activate based on input, previous status of connected neighbors

NuPIC Advantages Open Source community active Designed for temporal data Designed for feedback loop control systems Strong prediction capabilities (Grok is used on power market data) Unsupervised Parallelizable for large data sets

An Overlooked Approach: NEAT NeuroEvolutionary Augmentation Topologies Ken Stanley, UT Austin 2002 Proposed alternative to backpropogation Genetic algorithms to evolve both the structure and optimize the weights of ANN’s Often increased the depth of the network many fold

NEAT In Operation NEAT still under development: http://www.cs.ucf.edu/~kstanley/neat.html NEAT based space fighting game: Galactic Arm’s Race -- Weapons available are evolved by players

Dropout Training “Hiding” parts of the network during trainingAllows for greater multi-function learning Proof against overfitting All percentage dropouts work, even 50+% Applied to DBN and Convolutional ANN Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012). Ba, Jimmy, and Brendan Frey. "Adaptive dropout for training deep neural networks." Advances in Neural Information Processing Systems. 2013. Srivastava, Nitish. Improving neural networks with dropout. Diss. University of Toronto, 2013.

What is the Major Contribution of Deep Learning so far (IMO)? Boltzmann Machines/Restricted Boltzmann Machines More layers == Good Training algorithms (stacking approaches) Unsupervised learning algorithms Distributed representation Sparse learning (multi-target learning) Improved vision and NLP processing So… which one?

What is the Major Contribution of Deep Learning so far (IMO)? Boltzmann Machines/Restricted Boltzmann Machines More layers == Good Training algorithms (stacking approaches) Unsupervised learning algorithms Distributed representation Sparse learning (multi-target learning) Improved vision and NLP processing

DeepMind Startup News Acquired by Google last year ($650m) Building general learners Primarily focused on game playing to evaluate AI approaches Plays Atari and some other early 1980’s games Trying to add memory architectures to DBNs Seeks to handle streaming data through persistence across temporal events Very secretive, but hiring http://deepmind.com/

Other Deep Learning Startups Enlitic – Healthcare oriented Ersatz Labs – Data to prediction services MetaMind - NLP with recursive nets Nervana Systems – Deep nets on cloud 2 proc Skymind – Hadoop algorithms

Summary Deep Learning is the field of leveraging deeper models in AI Deep Belief Networks Unsupervised & Supervised abilities NuPIC Handles unlabeled streaming temporal data Convolutional nets Primarily vision, but lots of others Deep systems are the current leaders in vision, NLP, audio, documents and semantics If you want a job at Google (Bing, FB, etc) either know deep learning (or beat it)

*THE* Resource http://deeplearning.net

<This space intentionally left blank>

4/15/2017