(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Scaling Up Graphical Model Inference
Topic models Source: Topic models, David Blei, MLSS 09.
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Hierarchical Dirichlet Process (HDP)
Hierarchical Dirichlet Processes
Object recognition and scene “understanding”
CS590M 2008 Fall: Paper Presentation
Advanced topics.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Unsupervised Learning of Visual Taxonomies IEEE conference on CVPR 2008 Evgeniy Bart – Caltech Ian Porteous – UC Irvine Pietro Perona – Caltech Max Welling.
Statistical Models for Networks and Text Jimmy Foulds UCI Computer Science PhD Student Advisor: Padhraic Smyth.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
British Museum Library, London Picture Courtesy: flickr.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Randomized Algorithms for Bayesian Hierarchical Clustering
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
CS Statistical Machine learning Lecture 24
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
1 Nonlinear models for Natural Image Statistics Urs Köster & Aapo Hyvärinen University of Helsinki.
Learning Deep Generative Models by Ruslan Salakhutdinov
Nonparametric Bayesian Learning of Switching Dynamical Processes
The topic discovery models
Multimodal Learning with Deep Boltzmann Machines
The topic discovery models
Hierarchical Topic Models and the Nested Chinese Restaurant Process
The topic discovery models
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Michal Rosen-Zvi University of California, Irvine
Nonparametric Bayesian Texture Learning and Synthesis
Presentation transcript:

(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)

Outline Nonparametric Bayesian Taxonomy models for object categorization Hierarchical representations from networks of HDPs

Motivation Building systems that learn for a lifetime, from “construction to destruction” E.g. unsupervised learning of object category taxonomies. (with E. Bart, I Porteous and P. Perona) Hierarchical models can help to: Act as a prior to transfer information to new categories Fast recognition Classify at appropriate level of abstraction (Fido  dog  mammal) Can define similarity measure (kernel) Nonparametric Bayesian framework allows models to grow their model complexity without bound (with growing dataset size)

Nonparametric Model for Visual Taxonomy taxonomy word distribution for topic k visual word image / scene detection topic 1 topic 2 topic k prior over trees is nested CRP (Blei et al. 04) -a path is more popular if it has been traveled a lot

300 images from Corel database. (experiments and figures by E. Bart)

Taxonomy of Quilts

Beyond Trees? Deep belief nets are more powerful alternatives to taxonomies (in a modeling sense). Nodes in the hierarchy represent overlapping and increasingly abstract categories More sharing of statistical strength Proposal: stack LDA models

LDA (Blei, Ng, Jordan ‘02) token i in doc. j was assigned to type w (observed). token i in image j was assigned to topic k (hidden). image-specific distribution over topics. Topic-specific distribution over words.

Stage-wise LDA Use Z1 as pseudo-data for next layer. After second LDA model is fit, we have 2 distributions over Z1. We combine these distributions by taking their mixture. LDA

Special Words Layer.. At the bottom layer we have an image-specific distribution over words. It filters out image-idiosyncrasies which are not modeled well by topics Special words topic model (Chemudgunda, Steyvers, Smyth, 06)

Stage-wise Learning stage 1 stage 3 stage 2

Last layer that has any data assigned to it... A switching variable has picked this level – all layers above are disconnected. Model At every level a switching variable picks either or. The lowest level at which was picked disconnects the upstream variables.

.. Collapsed Gibbs Sampling Given X, perform an upward pass to compute posterior probabilities for each level. Sample a level. From that level, sample all downstream Z-variables. (ignore upstream Z-variables) Marginalize out

The Digits... All experiments done by I. Porteous (and finished 2 hours ago). (I deeply believe in)

This level filters out image-idiosyncrasies. No information from this level is “transferred” to test-data

(level 1 topic distributions) (level 2 topic distributions)

Brightness = average level assignment Assignment to Levels

Properties Properties which are more specific to an image/document are explained at lower levels of hierarchy.  They act as a data-filters for the higher layers Higher levels become increasingly abstract, with larger “receptive fields” and higher variance (complex cell property). Limitation? Higher levels therefore “own” less data.  Hence higher levels are have larger plasticity. The more data, the more levels become populated.  We infer the number of layers. By marginalizing out parameters, all variables become coupled.

Conclusion Nonparametric Bayesian models good candidate for “lifelong learning” –need to improve computational efficiency & memory requirements Algorithm for growing object taxonomies as a function of observed data Proposal for deep belief net based on stacking LDA modules –more flexible representation & more sharing of statistical strength than taxonomy Infinite Extension: –LDA  HDP –mixture over levels  Dirichlet process –nr. hidden variables per layer and nr layers inferred demo?