Https://portal.futuregrid.org Deterministic Annealing Dimension Reduction and Biology Indiana University Environmental Genomics April 20 2012 Geoffrey.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Parallel Deterministic Annealing Clustering and its Application to LC-MS Data Analysis October IEEE International.

Future Internet Technology Building

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Scalable High Performance Dimension Reduction

Dimension reduction (2) Projection pursuit ICA NCA Partial Least Squares Blais. “The role of the environment in synaptic plasticity…..” (1998) Liao et.

Panel: New Opportunities in High Performance Data Analytics (HPDA) and High Performance Computing (HPC) The 2014 International Conference.

Proteomics Metagenomics and Deterministic Annealing Indiana University October Geoffrey Fox

Parallelizing Data Analytics INTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING From Clouds and Big Data to Exascale.

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8

Principal Component Analysis

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

Presented By Wanchen Lu 2/25/2013

1 IE 607 Heuristic Optimization Simulated Annealing.

Deterministic Annealing Indiana University CS Theory group January Geoffrey Fox

X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.

Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.

Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.

Generative Topographic Mapping in Life Science Jong Youl Choi School of Informatics and Computing Pervasive Technology Institute Indiana University

Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

Presenter: Yang Ruan Indiana University Bloomington

Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.

Deterministic Annealing Networks and Complex Systems Talk 6pm, Wells Library 001 Indiana University November Geoffrey.

Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.

S CALABLE H IGH P ERFORMANCE D IMENSION R EDUCTION Seung-Hee Bae.

Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Randomized Algorithms for Bayesian Hierarchical Clustering

Service Aggregated Linked Sequential Activities: GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data.

Parallel Applications And Tools For Cloud Computing Environments SC 10 New Orleans, USA Nov 17, 2010.

Simulated Annealing G.Anuradha.

BCS547 Neural Decoding.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

Analyzing Expression Data: Clustering and Stats Chapter 16.

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox

Deterministic Annealing and Robust Scalable Data mining for the Data Deluge Petascale Data Analytics: Challenges, and Opportunities (PDAC-11) Workshop.

Lecture 18, CS5671 Multidimensional space “The Last Frontier” Optimization Expectation Exhaustive search Random sampling “Probabilistic random” sampling.

Optimization Indiana University July Geoffrey Fox

Yang Ruan PhD Candidate Salsahpc Group Community Grid Lab Indiana University.

A new approach for the gamma tracking clustering by the deterministic annealing method François Didierjean IPHC, Strasbourg.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Introduction to Sampling based inference and MCMC

Basic machine learning background with Python scikit-learn

DA Algorithms Analytics February 2017 Software: MIDAS HPC-ABDS

Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.

Probabilistic Models with Latent Variables

SPIDAL and Deterministic Annealing

Adaptive Interpolation of Multidimensional Scaling

More on Search: A* and Optimization

Expectation-Maximization & Belief Propagation

Robust Parallel Clustering Algorithms

Basic Kmeans Clustering and Optimization

Indiana University July Geoffrey Fox

MDS and Visualization September Geoffrey Fox

Simulated Annealing & Boltzmann Machines

Presentation transcript:

Deterministic Annealing Dimension Reduction and Biology Indiana University Environmental Genomics April Geoffrey Fox Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington

Some Motivation Big Data requires high performance – achieve with parallel computing Big Data requires robust algorithms as more opportunity to make mistakes Deterministic annealing (DA) is one of better approaches to optimization – Tends to remove local optima – Addresses overfitting – Faster than simulated annealing Return to my heritage (physics) with an approach I called Physical Computation (cf. also genetic algs) -- methods based on analogies to nature Physics systems find true lowest energy state if you anneal i.e. you equilibrate at each temperature as you cool

Some Ideas  Deterministic annealing is better than many well-used optimization problems  Started as “Elastic Net” by Durbin for Travelling Salesman Problem TSP  Basic idea behind deterministic annealing is mean field approximation, which is also used in “Variational Bayes” and many “neural network approaches”  Markov chain Monte Carlo (MCMC) methods are roughly single temperature simulated annealing Less sensitive to initial conditions Avoid local optima Not equivalent to trying random initial starts

Uses of Deterministic Annealing Clustering – Vectors: Rose (Gurewitz and Fox) – Clusters with fixed sizes and no tails (Proteomics team at Broad) – No Vectors: Hofmann and Buhmann (Just use pairwise distances) Dimension Reduction for visualization and analysis – Vectors: GTM – No vectors: MDS (Just use pairwise distances) Can apply to general mixture models (but less study) – Gaussian Mixture Models – Probabilistic Latent Semantic Analysis with Deterministic Annealing DA-PLSA as alternative to Latent Dirichlet Allocation (typical informational retrieval/global inference topic model)

Deterministic Annealing Gibbs Distribution at Temperature T P(  ) = exp( - H(  )/T) /  d  exp( - H(  )/T) Or P(  ) = exp( - H(  )/T + F/T ) Minimize Free Energy combining Objective Function and Entropy F = =  d  {P(  )H + T P(  ) lnP(  )} H is objective function to be minimized as a function of parameters  Simulated annealing corresponds to doing these integrals by Monte Carlo Deterministic annealing corresponds to doing integrals analytically (by mean field approximation) and is much faster than Monte Carlo In each case temperature is lowered slowly – say by a factor 0.95 to 0.99 at each iteration

Implementation of DA Central Clustering Clustering variables are M i (k) where this is probability point i belongs to cluster k and  k=1 K M i (k) = 1 In Central or PW Clustering, take H 0 =  i=1 N  k=1 K M i (k)  i (k) – Linear form allows DA integrals to be done analytically Central clustering has  i (k) = (X(i)- Y(k)) 2 and M i (k) determined by Expectation step – H Central =  i=1 N  k=1 K M i (k) (X(i)- Y(k)) 2 = exp( -  i (k)/T ) /  k=1 K exp( -  i (k)/T ) Centers Y(k) are determined in M step of EM method 6

Deterministic Annealing Minimum evolving as temperature decreases Movement at fixed temperature going to local minima if not initialized “correctly Solve Linear Equations for each temperature Nonlinear effects mitigated by initializing with solution at previous higher temperature F({y}, T) Configuration {y}

System becomes unstable as Temperature lowered and there is a phase transition and one splits cluster into two and continues EM iteration One can start with just one cluster Rose, K., Gurewitz, E., and Fox, G. C. ``Statistical mechanics and phase transitions in clustering,'' Physical Review Letters, 65(8): , August My #6 most cited article (415 cites including 8 in 2012)

Some non-DA Ideas  Dimension reduction gives Low dimension mappings of data to both visualize and apply geometric hashing  No-vector (can’t define metric space) problems are O(N 2 )  Genes are no-vector unless multiply aligned  For no-vector case, one can develop O(N) or O(NlogN) methods as in “Fast Multipole and OctTree methods”  Map high dimensional data to 3D and use classic methods developed originally to speed up O(N 2 ) 3D particle dynamics problems

10 Start at T= “  ” with 1 Cluster Decrease T, Clusters emerge at instabilities

11

12

Trimmed Clustering Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis (Rudolf Frühwirth, D R Mani and Saumyadipta Pyne) BMC Bioinformatics 2011, 12:358 H TCC =  k=0 K  i=1 N M i (k) f(i,k) – f(i,k) = (X(i) - Y(k)) 2 /2  (k) 2 k > 0 – f(i,0) = c 2 / 2 k = 0 The 0’th cluster captures (at zero temperature) all points outside clusters (background) Clusters are trimmed (X(i) - Y(k)) 2 /2  (k) 2 < c 2 / 2 Applied to Proteomics Mass Spectrometry T ~ 0 T = 1 T = 5 Distance from cluster center

Proteomics 2D DA Clustering 14

15 Proteomics 2D DA Clustering -- Scaled

High Performance Dimension Reduction and Visualization Need is pervasive – Large and high dimensional data are everywhere: biology, physics, Internet, … – Visualization can help data analysis Visualization of large datasets with high performance – Map high-dimensional data into low dimensions (2D or 3D). – Need Parallel programming for processing large data sets – Developing high performance dimension reduction algorithms: MDS(Multi-dimensional Scaling) GTM(Generative Topographic Mapping) DA-MDS(Deterministic Annealing MDS) DA-GTM(Deterministic Annealing GTM) – Interactive visualization tool PlotViz

Multidimensional Scaling MDS Map points in high dimension to lower dimensions Many such dimension reduction algorithms (PCA Principal component analysis easiest); simplest but perhaps best at times is MDS Minimize Stress  (X) =  i<j =1 n weight(i,j) (  (i, j) - d(X i, X j )) 2  (i, j) are input dissimilarities and d(X i, X j ) the Euclidean distance squared in embedding space (3D usually) SMACOF or Scaling by minimizing a complicated function is clever steepest descent (expectation maximization EM) algorithm Computational complexity goes like N 2 * Reduced Dimension We developed a Deterministic annealed version of it which is much better Could just view as non linear  2 problem (Tapia et al. Rice) – Slower but more general All parallelize with high efficiency

18 OctTree for 100K sample of Fungi We use OctTree for logarithmic interpolation

440K Interpolated 19

A large cluster in Region 0 20

26 Clusters in Region 4 21

13 Clusters in Region 6 22

Phylogenetic Tree with Centers of Clusters at bottom 23

Phylogenetic Tree I 24

Phylogenetic Tree II 25

Phylogenetic Tree III 26

Metagenomics 27

~100K COG with 7 clusters from database 28

29

30 CoG NW Sqrt (4D)

References Ken Rose, Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems. Proceedings of the IEEE, : p – References earlier papers including his Caltech Elec. Eng. PhD 1990 T Hofmann, JM Buhmann, “Pairwise data clustering by deterministic annealing”, IEEE Transactions on Pattern Analysis and Machine Intelligence 19, pp Hansjörg Klock and Joachim M. Buhmann, “Data visualization by multidimensional scaling: a deterministic annealing approach”, Pattern Recognition, Volume 33, Issue 4, April 2000, Pages Frühwirth R, Waltenberger W: Redescending M-estimators and Deterministic Annealing, with Applications to Robust Regression and Tail Index Estimation. Austrian Journal of Statistics 2008, 37(3&4): Review Recent algorithm work by Seung-Hee Bae, Jong Youl Choi (Indiana CS PhD’s)

Web Links 1.) Metagenomics – parent page at 2.) Proteomics: The COG Project ) Fungi – parent page 4.) Region Clustering ) Fungi Phylogenetic 6.) The Metagenomics Qiime ) Also 32