Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dipartimento di Ingegneria «Enzo Ferrari»

Similar presentations


Presentation on theme: "Dipartimento di Ingegneria «Enzo Ferrari»"— Presentation transcript:

1 Pattern Recognition and Machine Learning Deep Alternative Architectures
Dipartimento di Ingegneria «Enzo Ferrari» Università di Modena e Reggio Emilia

2 UNSUPERVISED LEARNING

3 Motivation Most impressive results in deep learning have been obtained with purely supervised learning methods (see previous talk) In vision, typically classification (e.g. object recognition) Though progress has been slower, it is likely that unsupervised learning will be important to future advances in DL Image: Krizhevsky (2012) - AlexNet, the “hammer” of DL 23 June 2014 / 2 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

4 Why Unsupervised Learning?
Reason 1: We can exploit unlabelled data; much more readily available and often free. 23 June 2014 / 4 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

5 Why Unsupervised Learning?
Reason 2: We can capture enough information about the observed variables so as to ask new questions about them; questions that were not anticipated at training time. Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 23 June 2014 / 5 Image: Features from a convolutional net (Zeiler and Fergus, 2013) CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

6 Why Unsupervised Learning?
Reason 3: Unsupervised learning has been shown to be a good regularizer for supervised learning; it helps generalize. 1500 This advantage shows up in practical applications: • transfer learning, domain adaptation • unbalanced classes • zero-shot, one-shot learning Without pre−training 1000 With pre−training 500 −500 −1000 −1500 −4000 −3000 −2000 −1000 1000 2000 3000 4000 23 June 2014 / 6 Image: ISOMAP embedding of functions represented by 50 networks w and w/o pre training (Erhan et al., 2010) CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

7 Why Unsupervised Learning?
Reason 4: There is evidence that unsupervised learning can be achieved mainly through a level-local training signal; compare this to supervised learning where the only signal driving parameter updates is available at the output and gets backpropagated. Propagate credit Supervised learning Local learning 23 June 2014 / CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

8 Why Unsupervised Learning?
Reason 5: A recent trend in machine learning is to consider problems where the output is high-dimensional and has a complex, possibly multi-modal joint distribution. Unsupervised learning can be used in these “structured output” problems. animal pet furry … striped Attribute Prediction Segmentation 23 June 2014 / CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

9 Learning Representations
“Concepts” or “Abstractions” that help us make sense of the variability in data Often hand-designed to have desirable properties: e.g. sensitive to variables we want to predict, less sensitive to other factors explaining variability DL has leveraged the ability to learn representations these can be task-specific or task-agnostic - 23 June 2014 / CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

10 Supervised Learning of Representations
Learn a representation with the objective of selecting one that is best suited for predicting targets given input (c) Layer 5, strongest feature map projections (a) Input Image (b) Layer 5, strongest feature map True Label: Pomeranian True Label: Car Wheel True Label: Afghan Hound input prediction f() Error target 23 June 2014 / 10 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor Image: Features from a convolutional net (Zeiler and Fergus, 2013)

11 Unsupervised Learning of Representations
input prediction Error ? 23 June 2014 / 11 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

12 Unsupervised learning of representations
code What is the objective? - reconstruction error? input reconstruction - maximum likelihood? Input images - disentangle factors of variation? Learning Identity manifold coordinates Fixed ID Pose manifold coordinates 23 June 2014 / 12 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor Fixed Pose Input Image: Lee et al. 2014

13 Principal Components Analysis
PCA works well when the data is near a linear manifold in high- dimensional space Project the data onto this subspace spanned by principal components direction of first principal component i.e. direction of greatest variance In dimensions orthogonal to the subspace the data has low variance Credit: Geoff Hinton

14 An inefficient way to fit PCA
Train a neural network with a “bottleneck” hidden layer code (bottleneck) output (reconstruction) input If the hidden and output layers are linear, and we minimize squared reconstruction error: Try to make the output the same as the input The M hidden units will span the same space as the first M principal components But their weight vectors will not be orthogonal And they will have approximately equal variance Credit: Geoff Hinton

15 Why fit PCA inefficiently?
input code reconstruction encoder decoder h(x) xˆ (h (x)) Error With nonlinear layers before and after the code, it should be possible to represent data that lies on or near a nonlinear manifold - the encoder maps from data space to co-ordinates on the manifold - the decoder does the inverse transformation The encoder/decoder can be rich, multi-layer functions

16 Auto-encoder Feed-forward architecture
input code reconstruction encoder decoder h(x) xˆ (h (x)) Error Feed-forward architecture Trained to minimize reconstruction error bottleneck or regularization essential - 23 June 2014 / 17 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

17 Regularized Auto-encoders
input code reconstruction encoder decoder h(x) xˆ (h (x)) Error Permit code to be higher-dimensional than the input Capture structure of the training distribution due to predictive opposition b/w reconstruction distribution and regularizer Regularizer tries to make enc/dec as simple as possible

18 Simple? Reconstruct the input from the code and make the code compact
Reconstruct the input from the code and make the code compact (PCA, auto-encoder with bottleneck) Reconstruct the input from the code and make the code sparse (sparse auto-encoders) Add noise to the input or code and reconstruct the cleaned-up version (denoising auto-encoders) Reconstruct the input from the code and make the code insensitive to the input (contractive auto-encoders) 23 June 2014 / 19 CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

19 Sparse Auto-encoders 23 June 2014 / 20
CVPR DL for Vision Tutorial ・ Unsupervised Learning/ G Taylor

20 Deconvolutional Networks
Deep convolutional sparse coding Layer 4 Trained to reconstruct the input from any layer Layer 3 Fast approximate inference Recently used to visualize features learned by convolutional nets (Zeiler and Fergus 2013) Layer 1 Layer 2

21 Denoising Auto-encoders
(Vincent et al. 2008) noisy input input noise encoder code decoder reconstruction x˜ (x) h (x˜) xˆ (h (x˜)) Error The code can be viewed as a lossy compression of the input Learning drives it to be a good compressor for training examples (and hopefully others as well) but not arbitrary inputs

22 Contractive Auto-encoders
(Rifai et al. 2011) input code reconstruction encoder decoder h(x) xˆ (h (x)) Error Learn good models of high- dimensional data (Bengio et al ) Can obtain good representations for classification Can produce good quality samples by a random walk near the manifold of high density (Rifai et al. 2012)

23 Resources Online courses Andrew Ng’s Machine Learning (Coursera)
Online courses - Andrew Ng’s Machine Learning (Coursera) - Geoff Hinton’s Neural Networks (Coursera) Websites - deeplearning.net UFLDL_Tutorial -

24 Surveys and Reviews Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798–1828, Aug 2013. Y. Bengio. Deep learning of representations: Looking forward. In Statistical Language and Speech Processing, pages 1–37. Springer, 2013. Y. Bengio, I. Goodfellow, and A. Courville. Deep Learning Draft available at J. Schmidhuber. Deep learning in neural networks: An overview. arXiv preprint arXiv: , 2014. Y. Bengio. Learning deep architectures for ai. Foundations and trends in Machine Learning, 2(1):1–127, 2009.

25 Sequence modelling

26 Sequence modelling When applying machine learning to sequences, we often want to turn an input sequence into an output sequence that lives in a different domain. – E. g. turn a sequence of sound pressures into a sequence of word identities. When there is no separate target sequence, we can get a teaching signal by trying to predict the next term in the input sequence. – The target output sequence is the input sequence with an advance of 1 step. – This seems much more natural than trying to predict one pixel in an image from the other pixels, or one patch of an image from the rest of the image. For temporal sequences there is a natural order for the predictions.

27 Memoryless models for sequences
Autoregressive models Feed Forward network

28 Memory and Hidden State
If we give our generative model some hidden state, and if we give this hidden state its own internal dynamics, we get a much more interesting kind of model. – It can store information in its hidden state for a long time. – If the dynamics is noisy and the way it generates outputs from its hidden state is noisy, we can never know its exact hidden state. The best we can do is to infer a probability distribution over the space of hidden state vectors.

29 RNN RNNs are very powerful, because they combine two properties:
– Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer.

30

31 RNN Structure and Weight Sharing
Elman Network Jordan Network

32

33 Backpropagating through time

34

35


Download ppt "Dipartimento di Ingegneria «Enzo Ferrari»"

Similar presentations


Ads by Google