Informatics and Mathematical Modelling / Cognitive Sysemts Group 1 MLSP 2010 September 1st Archetypal Analysis for Machine Learning Morten Mørup DTU Informatics.

Slides:



Advertisements
Similar presentations
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Latent Causal Modelling of Neuroimaging Data Informatics and Mathematical Modeling Morten Mørup 1 1 Cognitive Systems, DTU Informatics, Denmark, 2 Danish.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Machine Learning and Data Mining Clustering
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Support Vector Machines and Kernel Methods
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics.
Overview of Computer Vision CS491E/791E. What is Computer Vision? Deals with the development of the theoretical and algorithmic basis by which useful.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor
Unsupervised Learning
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Scalable Text Mining with Sparse Generative Models
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.
Javad Lavaei Department of Electrical Engineering Columbia University Joint work with Somayeh Sojoudi Convexification of Optimal Power Flow Problem by.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Mean-shift and its application for object tracking
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
1 Mean shift and feature selection ECE 738 course project Zhaozheng Yin Spring 2005 Note: Figures and ideas are copyrighted by original authors.
1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.
Non Negative Matrix Factorization
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Under Supervision of Dr. Kamel A. Arram Eng. Lamiaa Said Wed
Competence Centre on Information Extraction and Image Understanding for Earth Observation 29/03/07 Blind city classification using aggregation of clusterings.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 Sparse’09 8 April 2009 Sparse Coding and Automatic Relevance Determination for.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Stylization and Abstraction of Photographs Doug Decarlo and Anthony Santella.
1 A fast algorithm for learning large scale preference relations Vikas C. Raykar and Ramani Duraiswami University of Maryland College Park Balaji Krishnapuram.
A fast and precise peak finder V. Buzuloiu (University POLITEHNICA Bucuresti) Research Seminar, Fermi Lab November 2005.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
3D Face Recognition Using Range Images
ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre.
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
Probability and Statistics in Vision. Probability Objects not all the sameObjects not all the same – Many possible shapes for people, cars, … – Skin has.
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Unsupervised Streaming Feature Selection in Social Media
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Face recognition using Histograms of Oriented Gradients
Support Feature Machine for DNA microarray data
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Seunghui Cha1, Wookhyun Kim1
Machine Learning Basics
Learning latent structure in complex networks 1 2
Adaboost for faces. Material
Michal Rosen-Zvi University of California, Irvine
Color Image Retrieval based on Primitives of Color Moments
Non-Negative Matrix Factorization
Recommender Systems Problem formulation Machine Learning.
Presentation transcript:

Informatics and Mathematical Modelling / Cognitive Sysemts Group 1 MLSP 2010 September 1st Archetypal Analysis for Machine Learning Morten Mørup DTU Informatics Cognitive Systems Group Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Cognitive Systems Group Technical University of Denmark

Informatics and Mathematical Modelling / Cognitive Sysemts Group 2 MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group X  X C S Archetypical Analysis (AA) AA formed by two simplex constraints Archetype: Xc k formed by convex combination of the data points Projection: s n gives the convex combination of archetypes forming each data point 3 MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group 4 MLSP 2010 September 1st The Original paper of Adler and Breiman considered 3 applications Swiss army head shape Los Angeles Basin air polution 1976 Tokamak Fusion Data Other Applications: Flame dynamics (Stone & Adler 1996) End member extraction of Galaxy Spectra (Chan et al, 2003) Data driven Benchmarking (Porzio et al. 2008)

Informatics and Mathematical Modelling / Cognitive Sysemts Group Archetypical analysis extract the ”principal convex hull” (PCH) of the data cloud Convex hull: Blue lines and light shaded region (dots indicate points in convex set) Dominant convex hull: green lines and gray shaded region (dots indicate archetypes) While convex set can be identified in linear time O (N) (McCallum & Avis 1979) finding C and S is a non-convex (NP hard) problem. 5 MLSP 2010 September 1st (Dwyer, 1988) NB: One might think that AA is highy driven by outliers, however, ”outliers” are only relevant if they reflect representative dynamics in the data!

Informatics and Mathematical Modelling / Cognitive Sysemts Group 6 MLSP 2010 September 1st Our (new) mathematical results: 1: The AA/PCH model is in general unique! 2: The AA/PCH model can be efficiently initialized by the proposed FurthestSum algorithm 3: The AA/PCH model parameters can be efficiently optimized by normalization invariant projected gradient Large scale Applications See Theorem 1 The proposed FurthestSum algorithm guarantee extraction of points in the convex set, see Theorem 2 For details on derivation of updates and their computational complexity see section 2.3

Informatics and Mathematical Modelling / Cognitive Sysemts Group Our Machine Learning Applications Computer vision NeuroImaging TextMining Collaborative Filtering 7 MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group 8 MLSP 2010 September 1st Face database: K=361 pixels, N=2429  all images belong with probabilty 1 to convex set SVD/PCA: Low -> high freq. dynamics NMF: Part Based Representation AA: Archetypes/Freaks K-means: Centroids/Prototypes X  X C S Computer Vision: CBCL face database

Informatics and Mathematical Modelling / Cognitive Sysemts Group Archetypal Analysis naturally bridges clustering methods with low rank representations 9 MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group NeuroImaging: Positron Emission Tomography 10 MLSP 2010 September 1st XC S Altansering tracer injected, recorded signal in theory mixture of 3 underlying binding profiles (Archetypes): Low binding regions, High binding regions and artery/veines. Each voxel a given concentration fraction of these tissue types. X  X C S Low Binding High BindingArtery/Veines

Informatics and Mathematical Modelling / Cognitive Sysemts Group Text Mining: NIPS term-document (bag of words) 11 MLSP 2010 September 1st X  C S X XC: Distinct Aspects Prototypical Aspects

Informatics and Mathematical Modelling / Cognitive Sysemts Group 12 MLSP 2010 September 1st Collaborative filtering: MovieLens Medium size and large size Movie lens data ( Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567 Extracts features representing distinct user types, each user represented as a given concentration fraction of the user types. AA appear to have less tendency to overfit.

Informatics and Mathematical Modelling / Cognitive Sysemts Group Conclusion Archetypal Analysis is Unique in general (Theorem 1) Archetypal Analysis can be efficiently initialized by the proposed FurhtestSum algorithm (Theorem 2) and optimized through normalization invariant projected gradient. Archetypal Analysis naturally bridges clustering with low rank approximations Archetypal Analysis results in easy interpretable features that are closely related to the actual data Archetypal Analysis useful for a large variety of machine learning problem domains within unsupervised learning. (Computer Vision, NeuroImaging, TextMining, Collaborative Filtering) Archetypal Analysis can be extended to kernel representations finding the principal convex hull in (a potentially infinite) Hilbert space (see section 2.4 of the paper). 13 MLSP 2010 September 1st

Informatics and Mathematical Modelling / Cognitive Sysemts Group Open problems and current research directions: What is the optimal number of components? Cross-validation based on missing value prediction (see also collaborative filtering example in the paper) Bayesian generative models for AA/PCH that automatically penalize model complexity. What if ’pure’ archetypes cannot be well represented by the data available? 14 MLSP 2010 September 1st vs.

Informatics and Mathematical Modelling / Cognitive Sysemts Group Selected References from the paper 15 MLSP 2010 September 1st [1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Technometrics, vol. 36, no. 4, pp. 338–347, Nov [2] D. S. Hochbaum and D. B. Shmoys., “A best possible heuristic or the k-center problem.,” Mathematics of Operational Research, vol. 10, no. 2, pp. 180–184, [7] Emily Stone and Adele Cutler, “Introduction to archetypal analysis of spatio- temporal dynamics,” Phys. D, vol. 96, no.1-4, pp. 110–131, [8] Giovanni C. Porzio, Giancarlo Ragozini, and Domenico Vistocco, “On the use of archetypes as benchmarks,” Appl. Stoch. Model. Bus. Ind., vol. 24, no. 5, pp. 419–437, [9] B. H. P. Chan, D. A. Mitchell, and L. E. Cram, “Archetypal analysis of galaxy spectra,” MON.NOT.ROY.ASTRON.SOC., vol. 338, pp. 790, [11] D. McCallum and D. Avis, “A linear algorithm for finding the convex hull of a simple polygon,” Information Processing Letters, vol. 9, pp. 201–206, [12] Rex A. Dwyer, “On the convex hull of random points in a polytope,” Journal of Applied Probability, vol. 25, no. 4, pp.688–699, 1988.