Download presentation
1
Advanced topics
2
Learning feature hierarchies (Deep learning)
Outline Self-taught learning Learning feature hierarchies (Deep learning) Scaling up
3
Self-taught learning
4
Cars Motorcycles Supervised learning Testing: What is this?
Sometimes, most data wins. So, how to get more data? Even with AMT, often slow and expensive. Cars Motorcycles Testing: What is this?
5
Semi-supervised learning
Unlabeled images (all cars/motorcycles) Car Motorcycle Testing: What is this?
6
Self-taught learning Car Unlabeled images (random internet images)
Motorcycle Testing: What is this?
7
Self-taught learning Sparse coding, LCC, etc. f1, f2, …, fk If have labeled training set is small, can give huge performance boost. Use learned f1, f2, …, fk to represent training/test sets. Car Motorcycle Using f1, f2, …, fk a1, a2, …, ak
8
Learning feature hierarchies/Deep learning
9
Why feature hierarchies
object models object parts (combination of edges) edges pixels
10
Deep learning algorithms
Stack sparse coding algorithm Deep Belief Network (DBN) (Hinton) Deep sparse autoencoders (Bengio) [Other related work: LeCun, Lee, Yuille, Ng …]
11
Deep learning with autoencoders
Logistic regression Neural network Sparse autoencoder Deep autoencoder
12
x1 x2 x3 +1 Logistic regression
Logistic regression has a learned parameter vector q. On input x, it outputs: where x1 x2 x3 +1 Draw a logistic regression unit as:
13
String a lot of logistic units together. Example 3 layer network:
Neural Network String a lot of logistic units together. Example 3 layer network: x1 a3 a2 a1 x2 x3 Layer 3 +1 +1 Layer 1 Layer 3
14
Example 4 layer network with 2 output units:
Neural Network Example 4 layer network with 2 output units: x1 x2 x3 +1 Layer 4 +1 +1 Layer 3 Layer 1 Layer 2
15
Neural Network example
[Courtesy of Yann LeCun]
16
Training a neural network
Given training set (x1, y1), (x2, y2), (x3, y3 ), …. Adjust parameters q (for every node) to make: (Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.)
17
Unsupervised feature learning with a neural network
Autoencoder. Network is trained to output the input (learn identify function). Trivial solution unless: Constrain number of units in Layer 2 (learn compressed representation), or Constrain Layer 2 to be sparse. x4 x5 x6 +1 Layer 1 Layer 2 x1 x2 x3 Layer 3 a1 a2 a3
18
Unsupervised feature learning with a neural network
Training a sparse autoencoder. Given unlabeled training set x1, x2, … a1 a2 a3 Reconstruction error term L1 sparsity term
19
Unsupervised feature learning with a neural network
x1 x1 x2 x2 a1 x3 x3 a2 x4 x4 a3 x5 x5 +1 x6 x6 Layer 2 Layer 3 +1 Layer 1
20
Unsupervised feature learning with a neural network
x1 x2 a1 x3 a2 x4 a3 x5 +1 New representation for input. x6 Layer 2 +1 Layer 1
21
Unsupervised feature learning with a neural network
x1 x2 a1 x3 a2 x4 a3 x5 +1 x6 Layer 2 +1 Layer 1
22
Unsupervised feature learning with a neural network
x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1
23
Unsupervised feature learning with a neural network
x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1
24
Unsupervised feature learning with a neural network
x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 Train parameters so that , subject to bi’s being sparse. +1
25
Unsupervised feature learning with a neural network
x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 New representation for input. x6 +1
26
Unsupervised feature learning with a neural network
x1 x2 a1 b1 x3 a2 b2 x4 a3 b3 x5 +1 +1 x6 +1
27
Unsupervised feature learning with a neural network
x1 x2 a1 b1 c1 x3 a2 b2 c2 x4 a3 b3 c3 x5 +1 +1 +1 x6 +1
28
Unsupervised feature learning with a neural network
x1 x2 a1 b1 c1 x3 a2 b2 c2 x4 a3 b3 c3 x5 New representation for input. +1 +1 +1 x6 +1 Use [c1, c3, c3] as representation to feed to learning algorithm.
29
Deep Belief Net Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. Building block: 2-layer graphical model (Restricted Boltzmann Machine). Can then learn additional layers one at a time.
30
Restricted Boltzmann machine (RBM)
Layer 2. [a1, a2, a3] (binary-valued) x1 x2 x3 x4 Input [x1, x2, x3, x4] MRF with joint distribution: Use Gibbs sampling for inference. Given observed inputs x, want maximum likelihood estimation:
31
Restricted Boltzmann machine (RBM)
Layer 2. [a1, a2, a3] (binary-valued) x1 x2 x3 x4 Input [x1, x2, x3, x4] Gradient ascent on log P(x) : [xiaj]obs from fixing x to observed value, and sampling a from P(a|x). [xiaj]prior from running Gibbs sampling to convergence. Adding sparsity constraint on ai’s usually improves results.
32
Deep Belief Network Similar to a sparse autoencoder in many ways. Stack RBMs on top of each other to get DBN. Layer 3. [b1, b2, b3] Layer 2. [a1, a2, a3] Input [x1, x2, x3, x4] Train with approximate maximum likelihood (often with sparsity constraint on ai’s):
33
Deep Belief Network Layer 4. [c1, c2, c3] Layer 3. [b1, b2, b3]
Layer 2. [a1, a2, a3] End: One of challenges is scaling up. Most people: 14x14 up to 32x32. Input [x1, x2, x3, x4]
34
Deep learning examples
35
Convolutional DBN for audio
Max pooling unit Detection units Spectrogram
36
Convolutional DBN for audio
Time-invariant features Spectrogram
37
Probabilistic max pooling
Convolutional DBN: Convolutional Neural net: X3 X1 X2 X4 max {x1, x2, x3, x4} max {x1, x2, x3, x4} Where xi are {0,1}, and mutually exclusive. Thus, 5 possible cases: 1 1 1 1 1 1 X1 X2 X3 X4 1 1 Where xi are real numbers. Collapse 2n configurations into n+1 configurations. Permits bottom up and top down inference.
38
Convolutional DBN for audio
Spectrogram
39
Convolutional DBN for audio
Max pooling Second CDBN layer Detection units Max pooling One CDBN layer Detection units
40
Learned first-layer bases
CDBNs for speech Visual bases: Look at them and see if make sense/correspond to Gabors. Try to perform similar analysis on audio bases. Learned first-layer bases
41
Convolutional DBN for Images
‘’max-pooling’’ node (binary) Wk Detection layer H Max-pooling layer P Hidden nodes (binary) “Filter” weights (shared) At most one hidden nodes are active. Input data V Visible nodes (binary or real)
42
Convolutional DBN on face images
object models object parts (combination of edges) edges Note: Sparsity important for these results. pixels
43
Learning of object parts
Examples of learned object parts from object categories Faces Cars Elephants Chairs
44
Training on multiple objects
Trained on 4 classes (cars, faces, motorbikes, airplanes). Second layer: Shared-features and object-specific features. Third layer: More specific features. Third layer bases learned from 4 object categories. Plot of H(class|neuron active) Second layer bases learned from 4 object categories.
45
Hierarchical probabilistic inference
Generating posterior samples from faces by “filling in” experiments (cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference. Input images Samples from feedforward Inference (control) Aglioti et al., 1994; Halligan et al., 1993; Weinstein, 1969; Ramachandran, 1998; Halligan et al., 1993; Sadato et al., 1996; Halligan et al., 1999 Samples from Full posterior inference
46
Key issue in feature learning: Scaling up
47
Scaling up with graphics processors
US$ 250 NVIDIA GPU Peak GFlops Tflops, $110 million, used to simulate nuclear weapon testing. Like 13 graphics cards costing $250 each. 40 people with US$250 graphics card #18 on top supercomputers list 2 years back. Intel CPU (Source: NVIDIA CUDA Programming Guide)
48
Approx. number of parameters (millions):
Scaling up with GPUs Approx. number of parameters (millions): Using GPU (Raina et al., 2009)
49
Unsupervised feature learning: Does it work?
50
State-of-the-art task performance
Audio State-of-the-art task performance TIMIT Phone classification Accuracy Prior art (Clarkson et al.,1999) 79.6% Stanford Feature learning 80.3% TIMIT Speaker identification Accuracy Prior art (Reynolds, 1995) 99.7% Stanford Feature learning 100.0% Images CIFAR Object classification Accuracy Prior art (Yu and Zhang, 2010) 74.5% Stanford Feature learning 75.5% NORB Object classification Accuracy Prior art (Ranzato et al., 2009) 94.4% Stanford Feature learning 96.2% Video UCF activity classification Accuracy Prior art (Kalser et al., 2008) 86% Stanford Feature learning 87% Hollywood2 classification Accuracy Prior art (Laptev, 2004) 47% Stanford Feature learning 50% Multimodal (audio/video) AVLetters Lip reading Accuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 63.1%
51
Instead of hand-tuning features, use unsupervised feature learning!
Summary Instead of hand-tuning features, use unsupervised feature learning! Sparse coding, LCC. Advanced topics: Self-taught learning Deep learning Scaling up
52
Workshop page: http://ufldl.stanford.edu/eccv10-tutorial/
Other resources Workshop page: Code for Sparse coding, LCC. References. Full online tutorial.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.