Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh.

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Deep Learning Bing-Chen Tsai 1/21.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advanced topics.
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
How to do backpropagation in a brain
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero
CIAR Second Summer School Tutorial Lecture 2b Autoencoders & Modeling time series with Boltzmann machines Geoffrey Hinton.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
How to do backpropagation in a brain
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
A shallow introduction to Deep Learning
CIAR Second Summer School Tutorial Lecture 1b Contrastive Divergence and Deterministic Energy-Based Models Geoffrey Hinton.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
Large-scale Deep Unsupervised Learning using Graphics Processors
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Introduction to Deep Learning
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 8 Deep Belief Nets All lecture slides will be available as.ppt,.ps, &.htm at
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Learning Deep Generative Models by Ruslan Salakhutdinov
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Neural networks (3) Regularization Autoencoder
Structure learning with deep autoencoders
Department of Electrical and Computer Engineering
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
Representation Learning with Deep Auto-Encoder
实习生汇报 ——北邮 张安迪.
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Autoencoders David Dohan.
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh

 AUTO-ASSOCIATIVE NEURAL NETWORKS  OUTPUT SIMILAR AS INPUT

 BOTTLENECK CONSTRAINT  LINEAR ACTIVATION – PCA [Baldi et al., 1989]  NON-LINEAR PCA [Kramer, 1991] – 5 layered network  ALTERNATE SIGMOID AND LINEAR ACTIVATION  EXTRACTS NON-LINEAR FACTORS

 ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS  TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA  HEIRARCHICAL REPRESENTATION  RESULTS FROM CIRCUIT THEORY – SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS

 DIFFICULTY IN TRAINING DEEP NETWORKS  NON-CONVEX NATURE OF OPTIMIZATION  GETS STUCK IN LOCAL MINIMA  VANISHING OF GRADIENTS DURING BACKPROPAGATION  SOLUTION  -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006]  GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING

 PRE-TRAINING  INCREMENTAL LAYER-WISE TRAINING  EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER

 INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING  PERFORM BACKPROPOAGATION AS USUAL

 STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs)  HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1  MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER  EXACT METHODS – COMPUTATIONALLY INTRACTABLE  NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE

 DETERMINISTIC – SHALLOW AUTOENCODERS  HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER  TRAINED BY BACKPROPAGATION  DENOISING AUTOENCODERS  CONTRACTIVE AUTOENCODERS  SPARSE AUTOENCODERS

TASK \ MODELRBMSHALLOW AE CLASSIFIER [Hinton et al, 2006] and many others since then Investigated by [Bengio et al, 2007], [Ranzato et al, 2007], [Vincent et al, 2008], [Rifai et al, 2011] etc. DEEP AE [Hinton & Salakhutdinov, 2006] No significant results reported in literature - Gap

 MNIST  Big and Small Digits

 Square & Room  2d Robot Arm  3d Robot Arm

 Libraries used  Numpy, Scipy  Theano – takes care of parallelization  GPU Specifications  Memory – 256 MB  Frequency – 33 MHz  Number of Cores – 240  Tesla C1060

 REVERSE CROSS-ENTROPY  X – Original input  Z – Output  Θ – Parameters – Weights and Biases

 RESULTS FROM PRELIMINARY EXPERIMENTS

 TIME TAKEN FOR TRAINING  CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN

 EXPERIMENT USING SPARSE REPRESENTATIONS  STRATEGY A – BOTTLENECK  STRATEGY B – SPARSITY + BOTTLENECK  STRATEGY C – NO CONSTRAINT + BOTTLENECK

 MOMENTUM  INCORPORATING THE PREVIOUS UPDATE  CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS – PREVENTS OSCILLATION  ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING  WEIGHT DECAY  REGULARIZATION  PREVENTS OVER-FITTING

 USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS

 MOTIVATION