Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California, Santa Barbara Slide adapted from Andrew Ng (Stanford), Nando de Freitas (UBC) 1
Agenda 1.Motivation 2.Approach 1.Sparse Deep Auto-encoder 2.Local Receptive Field 3.L2 Pooling 4.Local contrast normalization 5.Overall Model 3.Parallelism 4.Evaluation 5.Discussion 2
1. MOTIVATION 3
Motivation Feature learning Supervised learning Need large number of labeled data Unsupervised learning Example: Build face detector without having labeled face images Building high-level features using unlabeled data. 4
Motivation Previous works Auto encoder Sparse coding Result: Only learns low level features Reason: Computational constraints Approach Dataset Model Computational resources 5
2. APPROACH 6
Sparse Deep Auto-encoder Auto-encoder Neural network Unsupervised learning Back-propagation 7
Sparse Deep Auto-encoder (cnt’d) Sparse Coding Input: Images x (1), x (2)... x (m) Learn: Bases (features) f 1, f 2,..., f k, so that each input x can be approximately decomposed as: x=∑a j f j s.t. a j ’s are mostly zero (“sparse”) 8
Sparse Deep Auto-encoder (cnt’d) 9
Sparse Coding Regularizer 10
Sparse Deep Auto-encoder (cnt’d) Sparse Deep Auto-encoder Multiple hidden layers to achieve particular characteristic in learning features 11
Local Receptive Field Definition: Each feature in the autoencoder can connect only to a small region of the lower layer Goal: Learn feature efficiently Parallelism Training on small image patches 12
L2 Pooling Goal: Robust to local distortion Approach: Group similar features together to achieve invariance 13
L2 Pooling Goal: Robust to local distortion Approach: Group similar features together to achieve invariance 14
L2 Pooling Goal: Robust to local distortion Approach: Group similar features together to achieve invariance 15
L2 Pooling Goal: Robust to local distortion Approach: Group similar features together to achieve invariance 16
Local Contrast Normalization Goal: Robust to variation in light intensity Approach: Normalize contrast 17
Local Contrast Normalization Goal: Robust to variation in light intensity Approach: Normalize contrast 18
Overall Model 3 layers Simple: 18x18 px 8 neurons/patch Complex: 5x5 px LCN: 5x5 px 19
Overall Model 20
Overall Model Train: Reconstruct input of each layer Optimization function 21
Overall Model Complex model? 22
3. PARALLELISM 23
Asynchronous SGD Two recent lines of research in speeding up large learning problems: Parallel/distributed computing Online (and mini-batch) learning algorithms: stochastic gradient descent, perceptron, MIRA, stepwise EM How can we bring together the benefits of parallel computing and online learning? 24
Asynchronous SGD SGD: Stochastic Gradient Descent: Choose an initial vector of parameters W and learning rate α Repeat until an approximate minimum is obtained: Randomly shuffle examples in the training set 25
26
27
28
Model Parallelism Weights divided according to locality of image and store on different machine 29
5. EVALUATION 30
Evaluation 10M Youtube unlabeled frames of size 200x200 1B parameters 1000 machines 16,000 cores 31
Experiment on Faces Test set 37,000 images 13,026 face images Best neuron 32
Experiment on Faces (cnt’d) Visualization Top stimulus (images) for face neuron Optimal stimulus for face neuron 33
Experiment on Faces (cnt’d) Invariances Properties 34
Experiment on Faces (cnt’d) Invariances Properties 35
Experiment on Cat/Human body Test set Cat: 10,000 positive, 18,409 negative Human body: 13,026 positive, 23,974 negative Accuracy 36
ImageNet classification Recognizing images Dataset 20,000 categories 14M images Accuracy 15.8% State of art: 9.3% 37
5. DISCUSSION 38
Discussion Deep learning Unsupervised feature learning Learning multiple layers of representation Increase accuracy: Invariance, contrast normalization Scalability 39
6. REFERENCES 40
References 1.Quoc Le et al., “Building High-level Features using Large Scale Unsupervised Learning” 2.Nando de Freitas, “Deep Learning”, URL: 3.Andrew Ng, “Sparse autoencoder”, URL: er.pdf 4.Andrew Ng, “Machine Learning and AI via Brain Simulations”, URL: 5.Andrew Ng, “Deep Learning”, URL: 41