Scalable Training of Mixture Models via Coresets Daniel Feldman Matthew Faulkner Andreas Krause MIT.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.

theory and practice collide The Next Big One: Detecting Earthquakes and Other Rare Events from Community Sensors Matthew Faulkner,

A Fresh Perspective: Learning to Sparsify for Detection in Massive Noisy Sensor Networks IPSN 4/9/2013 Matthew Faulkner Annie Liu Andreas Krause.

Summarizing Distributed Data Ke Yi HKUST += ?. Small summaries for BIG data  Allow approximate computation with guarantees and small space – save space,

An Overview of Machine Learning

MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.

A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Principal Component Analysis

Evolutionary Feature Extraction for SAR Air to Ground Moving Target Recognition – a Statistical Approach Evolving Hardware Dr. Janusz Starzyk Ohio University.

University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.

1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.

Optimal Adaptation for Statistical Classifiers Xiao Li.

Scalable Text Mining with Sparse Generative Models

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Distributed Model-Based Learning PhD student: Zhang, Xiaofeng.

FLANN Fast Library for Approximate Nearest Neighbors

Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.

Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.

Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,

Presented By Wanchen Lu 2/25/2013

ENN: Extended Nearest Neighbor Method for Pattern Recognition

CSE 185 Introduction to Computer Vision Pattern Recognition.

Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.

On-line Novelty Detection With Application to Mobile Robotics Stephen Marsland Imaging Science and Biomedical Engineering University of Manchester.

SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.

DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Rotation Invariant Neural-Network Based Face Detection

Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.

Compressive Sensing for Multimedia Communications in Wireless Sensor Networks By: Wael BarakatRabih Saliba EE381K-14 MDDSP Literary Survey Presentation.

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

STATISTICS FOR HIGH DIMENSIONAL BIOLOGICAL RECORDINGS Dr Cyril Pernet, Centre for Clinical Brain Sciences Brain Research Imaging Centre

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

NIPS 2013 Michael C. Hughes and Erik B. Sudderth

1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,

Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Dimensionality Reduction and Principle Components Analysis

Large-scale Machine Learning

Pathology Spatial Analysis February 2017

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Outlier Processing via L1-Principal Subspaces

CSC321: Neural Networks Lecture 19: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Overview of Supervised Learning

Faulkner, Matthew, Michael Olson, Rishi Chandy, Jonathan Krause, K

Satellite data Marco Puts

network of simple neuron-like computing elements

Robust Full Bayesian Learning for Neural Networks

Online sketches with random features

EM Algorithm and its Applications

Presentation transcript:

Scalable Training of Mixture Models via Coresets Daniel Feldman Matthew Faulkner Andreas Krause MIT

Fitting Mixtures to Massive Data Importance Sample EM, generally expensiveWeighted EM, fast!

Coresets for Mixture Models *

Naïve Uniform Sampling 4

5 Small cluster is missed Sample a set U of m points uniformly  High variance

Sampling Distribution Sampling distribution Bias sampling towards small clusters

Importance Weights Weights Sampling distribution

Creating a Sampling Distribution Iteratively find representative points 8

Creating a Sampling Distribution Sample a small set uniformly at random 9 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 10 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 11 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 12 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 13 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 14 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 15 Iteratively find representative points

Creating a Sampling Distribution Remove half the blue points nearest the samples Sample a small set uniformly at random 16 Small clusters are represented Iteratively find representative points

Creating a Sampling Distribution Partition data via a Voronoi diagram centered at points 17

Creating a Sampling Distribution Sampling distribution 18 Points in sparse cells get more mass and points far from centers

Importance Weights Sampling distribution 19 Points in sparse cells get more mass and points far from centers Weights

20 Importance Sample

21 Coresets via Adaptive Sampling

A General Coreset Framework Contributions for Mixture Models:

A Geometric Perspective Gaussian level sets can be expressed purely geometrically: 23 affine subspace

Geometric Reduction Lifts geometric coreset tools to mixture models Soft-min

Semi-Spherical Gaussian Mixtures 25

Extensions and Generalizations 26 Level Sets

Composition of Coresets Merge [c.f. Har-Peled, Mazumdar 04] 27

Composition of Coresets Compress Merge [Har-Peled, Mazumdar 04] 28

Coresets on Streams Compress Merge [Har-Peled, Mazumdar 04] 29

Coresets on Streams Compress Merge [Har-Peled, Mazumdar 04] 30

Coresets on Streams Compress Merge [Har-Peled, Mazumdar 04] 31 Error grows linearly with number of compressions

Coresets on Streams Error grows with height of tree

33 Coresets in Parallel

Handwritten Digits Obtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components. 34 MNIST data: 60,000 training, 10,000 testing

35 Neural Tetrode Recordings Waveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions. T. Siapas et al, Caltech

36 Community Seismic Network Detect and monitor earthquakes using smart phones, USB sensors, and cloud computing. CSN Sensors Worldwide

Learning User Acceleration dimensional acceleration feature vectors Bad Good

38 Seismic Anomaly Detection Bad Good GMM used for anomaly detection

Conclusions Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets Parallel (MapReduce) and Streaming implementations Strong empirical performance, enables learning on mobile devices GMMs admit coresets of size independent of n - Extensions for other mixture models 39