Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)

Similar presentations


Presentation on theme: "(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)"— Presentation transcript:

1 (Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)

2 Outline Nonparametric Bayesian Taxonomy models for object categorization Hierarchical representations from networks of HDPs

3 Motivation Building systems that learn for a lifetime, from “construction to destruction” E.g. unsupervised learning of object category taxonomies. (with E. Bart, I Porteous and P. Perona) Hierarchical models can help to: Act as a prior to transfer information to new categories Fast recognition Classify at appropriate level of abstraction (Fido  dog  mammal) Can define similarity measure (kernel) Nonparametric Bayesian framework allows models to grow their model complexity without bound (with growing dataset size)

4 Nonparametric Model for Visual Taxonomy taxonomy 0.7 0.26 0.04 word distribution for topic k visual word image / scene detection topic 1 topic 2 topic k prior over trees is nested CRP (Blei et al. 04) -a path is more popular if it has been traveled a lot

5 300 images from Corel database. (experiments and figures by E. Bart)

6 Taxonomy of Quilts

7 Beyond Trees? Deep belief nets are more powerful alternatives to taxonomies (in a modeling sense). Nodes in the hierarchy represent overlapping and increasingly abstract categories More sharing of statistical strength Proposal: stack LDA models

8 LDA (Blei, Ng, Jordan ‘02) token i in doc. j was assigned to type w (observed). token i in image j was assigned to topic k (hidden). image-specific distribution over topics. Topic-specific distribution over words.

9 Stage-wise LDA Use Z1 as pseudo-data for next layer. After second LDA model is fit, we have 2 distributions over Z1. We combine these distributions by taking their mixture. LDA

10 Special Words Layer.. At the bottom layer we have an image-specific distribution over words. It filters out image-idiosyncrasies which are not modeled well by topics Special words topic model (Chemudgunda, Steyvers, Smyth, 06)

11 Stage-wise Learning stage 1 stage 3 stage 2

12 Last layer that has any data assigned to it... A switching variable has picked this level – all layers above are disconnected. Model At every level a switching variable picks either or. The lowest level at which was picked disconnects the upstream variables.

13 .. Collapsed Gibbs Sampling Given X, perform an upward pass to compute posterior probabilities for each level. Sample a level. From that level, sample all downstream Z-variables. (ignore upstream Z-variables) Marginalize out

14 The Digits... All experiments done by I. Porteous (and finished 2 hours ago). (I deeply believe in)

15 This level filters out image-idiosyncrasies. No information from this level is “transferred” to test-data

16 (level 1 topic distributions) (level 2 topic distributions)

17 Brightness = average level assignment Assignment to Levels

18 Properties Properties which are more specific to an image/document are explained at lower levels of hierarchy.  They act as a data-filters for the higher layers Higher levels become increasingly abstract, with larger “receptive fields” and higher variance (complex cell property). Limitation? Higher levels therefore “own” less data.  Hence higher levels are have larger plasticity. The more data, the more levels become populated.  We infer the number of layers. By marginalizing out parameters, all variables become coupled.

19 Conclusion Nonparametric Bayesian models good candidate for “lifelong learning” –need to improve computational efficiency & memory requirements Algorithm for growing object taxonomies as a function of observed data Proposal for deep belief net based on stacking LDA modules –more flexible representation & more sharing of statistical strength than taxonomy Infinite Extension: –LDA  HDP –mixture over levels  Dirichlet process –nr. hidden variables per layer and nr layers inferred demo?


Download ppt "(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)"

Similar presentations


Ads by Google