Dipartimento di Ingegneria «Enzo Ferrari»

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Advanced topics.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
More MR Fingerprinting
Deep Learning.
Machine Learning Neural Networks
How to do backpropagation in a brain
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Artificial Neural Networks
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
CSC321: Neural Networks Lecture 13: Learning without a teacher: Autoencoders and Principal Components Analysis Geoffrey Hinton.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Introduction to Deep Learning
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Vision-inspired classification
Unsupervised Learning of Video Representations using LSTMs
Learning Deep Generative Models by Ruslan Salakhutdinov
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Learning Amin Sobhani.
an introduction to: Deep Learning
Data Mining, Neural Network and Genetic Programming
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Matt Gormley Lecture 16 October 24, 2016
Generative Adversarial Networks
CS 2750: Machine Learning Dimensionality Reduction
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Machine Learning Basics
Supervised Training of Deep Networks
Deep learning and applications to Natural language processing
Deep Belief Networks Psychology 209 February 22, 2013.
CS6890 Deep Learning Weizhen Cai
Department of Electrical and Computer Engineering
PCA vs ICA vs LDA.
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Deep learning Introduction Classes of Deep Learning Networks
Deep Architectures for Artificial Intelligence
Goodfellow: Chapter 14 Autoencoders
Word Embedding Word2Vec.
Neural Networks and Deep Learning
Deep Learning for Non-Linear Control
Representation Learning with Deep Auto-Encoder
Visualizing and Understanding Convolutional Networks
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Attention for translation
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Introduction to Neural Networks
Modeling IDS using hybrid intelligent systems
Goodfellow: Chapter 14 Autoencoders
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Pattern Recognition and Machine Learning Deep Alternative Architectures Dipartimento di Ingegneria «Enzo Ferrari» Università di Modena e Reggio Emilia

UNSUPERVISED LEARNING

Motivation • Most impressive results in deep learning have been obtained with purely supervised learning methods (see previous talk) • In vision, typically classification (e.g. object recognition) Though progress has been slower, it is likely that unsupervised learning will be important to future advances in DL Image: Krizhevsky (2012) - AlexNet, the “hammer” of DL • 23 June 2014 / 2 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Why Unsupervised Learning? Reason 1: We can exploit unlabelled data; much more readily available and often free. 23 June 2014 / 4 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Why Unsupervised Learning? Reason 2: We can capture enough information about the observed variables so as to ask new questions about them; questions that were not anticipated at training time. Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 23 June 2014 / 5 Image: Features from a convolutional net (Zeiler and Fergus, 2013) CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Why Unsupervised Learning? Reason 3: Unsupervised learning has been shown to be a good regularizer for supervised learning; it helps generalize. 1500 This advantage shows up in practical applications: • transfer learning, domain adaptation • unbalanced classes • zero-shot, one-shot learning Without pre−training 1000 With pre−training 500 −500 −1000 −1500 −4000 −3000 −2000 −1000 1000 2000 3000 4000 23 June 2014 / 6 Image: ISOMAP embedding of functions represented by 50 networks w and w/o pre training (Erhan et al., 2010) CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Why Unsupervised Learning? Reason 4: There is evidence that unsupervised learning can be achieved mainly through a level-local training signal; compare this to supervised learning where the only signal driving parameter updates is available at the output and gets backpropagated. Propagate credit Supervised learning Local learning 23 June 2014 / CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Why Unsupervised Learning? Reason 5: A recent trend in machine learning is to consider problems where the output is high-dimensional and has a complex, possibly multi-modal joint distribution. Unsupervised learning can be used in these “structured output” problems. animal pet furry … striped Attribute Prediction Segmentation 23 June 2014 / CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Learning Representations “Concepts” or “Abstractions” that help us make sense of the variability in data • • Often hand-designed to have desirable properties: e.g. sensitive to variables we want to predict, less sensitive to other factors explaining variability DL has leveraged the ability to learn representations • these can be task-specific or task-agnostic - 23 June 2014 / CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Supervised Learning of Representations Learn a representation with the objective of selecting one that is best suited for predicting targets given input • (c) Layer 5, strongest feature map projections (a) Input Image (b) Layer 5, strongest feature map True Label: Pomeranian True Label: Car Wheel True Label: Afghan Hound input prediction f() Error target 23 June 2014 / 10 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor Image: Features from a convolutional net (Zeiler and Fergus, 2013)

Unsupervised Learning of Representations input prediction Error ? 23 June 2014 / 11 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Unsupervised learning of representations code • What is the objective? - reconstruction error? input reconstruction - maximum likelihood? Input images - disentangle factors of variation? Learning Identity manifold coordinates Fixed ID Pose manifold coordinates 23 June 2014 / 12 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor Fixed Pose Input Image: Lee et al. 2014

Principal Components Analysis • PCA works well when the data is near a linear manifold in high- dimensional space Project the data onto this subspace spanned by principal components • direction of first principal component i.e. direction of greatest variance • In dimensions orthogonal to the subspace the data has low variance Credit: Geoff Hinton

An inefficient way to fit PCA Train a neural network with a “bottleneck” hidden layer • code (bottleneck) output (reconstruction) input • If the hidden and output layers are linear, and we minimize squared reconstruction error: Try to make the output the same as the input • • The M hidden units will span the same space as the first M principal components • But their weight vectors will not be orthogonal • And they will have approximately equal variance Credit: Geoff Hinton

Why fit PCA inefficiently? input code reconstruction encoder decoder h(x) xˆ (h (x)) Error • With nonlinear layers before and after the code, it should be possible to represent data that lies on or near a nonlinear manifold - the encoder maps from data space to co-ordinates on the manifold - the decoder does the inverse transformation • The encoder/decoder can be rich, multi-layer functions

Auto-encoder Feed-forward architecture input code reconstruction encoder decoder h(x) xˆ (h (x)) Error • Feed-forward architecture Trained to minimize reconstruction error • bottleneck or regularization essential - 23 June 2014 / 17 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Regularized Auto-encoders input code reconstruction encoder decoder h(x) xˆ (h (x)) Error • Permit code to be higher-dimensional than the input Capture structure of the training distribution due to predictive opposition b/w reconstruction distribution and regularizer • • Regularizer tries to make enc/dec as simple as possible

Simple? Reconstruct the input from the code and make the code compact • Reconstruct the input from the code and make the code compact (PCA, auto-encoder with bottleneck) Reconstruct the input from the code and make the code sparse (sparse auto-encoders) • Add noise to the input or code and reconstruct the cleaned-up version (denoising auto-encoders) • • Reconstruct the input from the code and make the code insensitive to the input (contractive auto-encoders) 23 June 2014 / 19 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Sparse Auto-encoders 23 June 2014 / 20 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

Deconvolutional Networks • Deep convolutional sparse coding Layer 4 Trained to reconstruct the input from any layer • Layer 3 • Fast approximate inference Recently used to visualize features learned by convolutional nets (Zeiler and Fergus 2013) • Layer 1 Layer 2

Denoising Auto-encoders (Vincent et al. 2008) noisy input input noise encoder code decoder reconstruction x˜ (x) h (x˜) xˆ (h (x˜)) Error • The code can be viewed as a lossy compression of the input • Learning drives it to be a good compressor for training examples (and hopefully others as well) but not arbitrary inputs

Contractive Auto-encoders (Rifai et al. 2011) input code reconstruction encoder decoder h(x) xˆ (h (x)) Error • Learn good models of high- dimensional data (Bengio et al. 2013) • Can obtain good representations for classification • Can produce good quality samples by a random walk near the manifold of high density (Rifai et al. 2012)

Resources Online courses Andrew Ng’s Machine Learning (Coursera) • Online courses - Andrew Ng’s Machine Learning (Coursera) - Geoff Hinton’s Neural Networks (Coursera) • Websites - deeplearning.net http://deeplearning.stanford.edu/wiki/index.php/ UFLDL_Tutorial -

Surveys and Reviews Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798–1828, Aug 2013. Y. Bengio. Deep learning of representations: Looking forward. In Statistical Language and Speech Processing, pages 1–37. Springer, 2013. Y. Bengio, I. Goodfellow, and A. Courville. Deep Learning. 2014. Draft available at http://www.iro.umontreal.ca/~bengioy/dlbook/ J. Schmidhuber. Deep learning in neural networks: An overview. arXiv preprint arXiv:1404.7828, 2014. Y. Bengio. Learning deep architectures for ai. Foundations and trends in Machine Learning, 2(1):1–127, 2009.

Sequence modelling

Sequence modelling When applying machine learning to sequences, we often want to turn an input sequence into an output sequence that lives in a different domain. – E. g. turn a sequence of sound pressures into a sequence of word identities. When there is no separate target sequence, we can get a teaching signal by trying to predict the next term in the input sequence. – The target output sequence is the input sequence with an advance of 1 step. – This seems much more natural than trying to predict one pixel in an image from the other pixels, or one patch of an image from the rest of the image. For temporal sequences there is a natural order for the predictions.

Memoryless models for sequences Autoregressive models Feed Forward network

Memory and Hidden State If we give our generative model some hidden state, and if we give this hidden state its own internal dynamics, we get a much more interesting kind of model. – It can store information in its hidden state for a long time. – If the dynamics is noisy and the way it generates outputs from its hidden state is noisy, we can never know its exact hidden state. The best we can do is to infer a probability distribution over the space of hidden state vectors.

RNN RNNs are very powerful, because they combine two properties: – Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer.

RNN Structure and Weight Sharing Elman Network Jordan Network

Backpropagating through time