Deep belief nets experiments and some ideas. Karol Gregor NYU/Caltech
Outline DBN Image database experiments Temporal sequences
Deep belief network Backprop Labels H3 H2 H1 Input
Preprocessing – Bag of words of SIFT With: Greg Griffin (Caltech) Images Features (using SIFT) Bag of words Image1 Image2 Word1 23 11 Word2 12 55 Word3 92 33 … … … Group them (e.g. K-means)
13 Scenes Database – Test error
Train error
- Pre-training on larger dataset - Comparison to svm, spm
Explicit representations?
Compatibility between databases Pretraining: Corel database Supervised training: 15 Scenes database
Conclusions Bag of words is not a good input for deep architectures The networks can be pretrained on one database and the supervised training can be used on other one. Other observations:
Temporal Sequences
Simple prediction Y t W t-1 t-2 t-3 X Supervised learning
With hidden units (need them for several reasons) G H t-1,t-2,t-3 t t-1,t-2,t-3 t X Y Memisevic, R. F. and Hinton, G. E., Unsupervised Learning of Image Transformations. CVPR-07
Example pred_xyh_orig.m
G H t-1 t Additions t-1 t X Y Sparsity: When inferring the H the first time, keep only the largest n units on Slow H change: After inferring the H the first time, take H=(G+H)/2
Examples pred_xyh.m present_line.m present_cross.m
Hippocampus Cortex+Thalamus Senses Muscles e.g. Eye (through retina, LGN) Muscles (through sub-cortical structures) e.g. See: Jeff Hawkins: On Intelligence
Cortical patch: Complex structure (not a single layer RBM) From Alex Thomson and Peter Bannister, (see numenta.com)
Desired properties
1) Prediction A B C D E F G H J K L E F H
2) Explicit representations for sequences VISIONRESEARCH time
3) Invariance discovery e.g. complex cell time
4) Sequences of variable length VISIONRESEARCH time
5) Long sequences Layer1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ? ? 2 2 2 2 2 2 2 2 2 2 Layer2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1 2 3 5 8 13 21 34 55 89 144
6) Multilayer - Inferred only after some time VISIONRESEARCH time
7) Smoother time steps
8) Variable speed - Can fit a knob with small speed range
9) Add a clock for actual time
Hippocampus Cortex+Thalamus Senses Muscles e.g. Eye (through retina, LGN) Muscles (through sub-cortical structures)
Hippocampus Cortex+Thalamus In Addition Senses Muscles Top down attention Bottom up attention Imagination Working memory Rewards Senses e.g. Eye (through retina, LGN) Muscles (through sub-cortical structures)
Training data Of the real world Simplified: Cartoons (Simsons) Videos Of the real world Simplified: Cartoons (Simsons) A robot in an environment Problem: Hard to grasp objects Artificial environment with 3D objects that are easy to manipulate (e.g. Grand theft auto IV with objects)