Generative Topographic Mapping in Life Science Jong Youl Choi School of Informatics and Computing Pervasive Technology Institute Indiana University

Slides:

Advertisements

Similar presentations

Motivations Social Bookmarking Socialized Bookmarks Tags.

Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Scalable High Performance Dimension Reduction

Analyzing large-scale cheminformatics and chemogenomics datasets through dimension reduction David J. Wild Assistant Professor & Director, Cheminformatics.

1 Optimization Algorithms on a Quantum Computer A New Paradigm for Technical Computing Richard H. Warren, PhD Optimization.

Machine Learning and Data Mining Clustering

Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,

Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.

High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.

Visual Recognition Tutorial

Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

Lecture 5: Learning models using EM

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Optimization via Search CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

Task Assignment and Transaction Clustering Heuristics.

Expectation Maximization Algorithm

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Maximum Likelihood (ML), Expectation Maximization (EM)

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Distributed Model-Based Learning PhD student: Zhang, Xiaofeng.

Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.

Elements of the Heuristic Approach

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.

FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.

Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.

S CALABLE H IGH P ERFORMANCE D IMENSION R EDUCTION Seung-Hee Bae.

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Service Aggregated Linked Sequential Activities: GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data.

Deterministic Annealing Dimension Reduction and Biology Indiana University Environmental Genomics April Geoffrey.

Parallel Applications And Tools For Cloud Computing Environments SC 10 New Orleans, USA Nov 17, 2010.

SALSA HPC Group School of Informatics and Computing Indiana University.

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,

Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.

1 E.V. Myasnikov 2007 Digital image collection navigation based on automatic classification methods Samara State Aerospace University RCDL 2007Интернет-математика.

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Bayesian Networks in Document Clustering Slawomir Wierzchon, Mieczyslaw Klopotek Michal Draminski Krzysztof Ciesielski Mariusz Kujawiak Institute of Computer.

Yang Ruan PhD Candidate Salsahpc Group Community Grid Lab Indiana University.

A new approach for the gamma tracking clustering by the deterministic annealing method François Didierjean IPHC, Strasbourg.

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Heuristic Optimization Methods

Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu

Probabilistic Models with Latent Variables

Adaptive Interpolation of Multidimensional Scaling

Dimension reduction : PCA and Clustering

Simple Kmeans Examples

Boltzmann Machine (BM) (§6.4)

Presentation transcript:

Generative Topographic Mapping in Life Science Jong Youl Choi School of Informatics and Computing Pervasive Technology Institute Indiana University Ph.D. Thesis Proposal

Visualization in Life Science (1) ▸ 2D or 3D visualization of high-dimensional data can provide an efficient way to find relationships between data elements ▸ Display each element as a point and distances represent similarities (or dissimilarities) ▸ Easy to recognize clusters or groups An example of chemical data (PubChem) Visualization to display disease-gene relationship, aiming at finding cause-effect relationships between disease and genes. 1

Visualization in Life Science (2) ▸ Visualization can be used to verify the correctness of analysis ▸ Feature selections in the child obesity data can be verified through visualization Genetic Algorithm Canonical Correlation Analysis Visualization A workflow of feature selection In health data analysis for child obesity study, visualization has been used for verification purpose. Data was collected from electronic medical record system (RMRS, Indianapolis, IN) in Indiana University Medical Center 2

Generative Topographic Mapping ▸ Algorithm for dimension reduction –Find an optimal user-defined L-dim. representation –Use Gaussian distribution as distortion measurement ▸ Find K centers for N data –K-clustering problem, known as NP-hard –Use Expectation-Maximization (EM) method K latent points N data points K latent points N data points 3

Advantages of GTM ▸ Complexity is O(KN), where –N is the number of data points –K is the number of clusters. Usually K << N ▸ Efficient, compared with MDS which is O(N 2 ) ▸ Produce more separable map (right) than PCA (left) 4

Problems ▸ O(KN) is still demanding in most life science ➥ Parallelization with distributed memory model (CCGrid 2010) ➥ Interpolation (aka, out-of-sample extension) can be used (HPDC 2010) ▸ GTM find only local optimal solution ➥ Applying Deterministic Annealing (DA) algorithm for global optimal solution (ICCS 2010) ▸ Optimal choice of K is still unknown ➥ Developing hierarchical GTM can help ➥ DA-GTM support natively hierarchical structure 5

Parallel GTM A A B B C C ▸ Finding K clusters for N data points –Relationship is a bipartite graph (bi-graph) –Represented by K-by-N matrix ▸ Decomposition for P-by-Q compute grid –Reduce memory requirement by 1/PQ 6 Example: A 8-byte double precision matrix for N=1M and K=8K requires 64GB Example: A 8-byte double precision matrix for N=1M and K=8K requires 64GB

GTM Interpolation ▸ Training in GTM is to find an optimal K positions, which is the most time consuming ▸ Two step procedure –GTM training only by n samples out of N data –Remaining (N-n) out-of-samples are approximated without training n In-sample N-n Out-of-sample N-n Out-of-sample Total N data Training Interpolation Trained data Interpolated GTM map Interpolated GTM map 7

Deterministic Annealing (DA) ▸ An heuristic to find a global solution –The principle of maximum entropy : choose the most unbiased and non-committal answers –Similar with Simulated Annealing (SA) which is based on random walk model –But, DA is deterministic as no randomness is involved ▸ New paradigm –Analogy in thermodynamics –Find solutions as lowering temperature T –New objective function, free energy F = D−TH –Minimize free energy F as T  1 8

GTM with Deterministic Annealing Objective Function EM-GTM DA-GTM Maximize log-likelihood L Minimize free energy F Optimization  Very sensitive  Trapped in local optima  Faster  Large deviation  Very sensitive  Trapped in local optima  Faster  Large deviation  Less sensitive to an initial condition  Find global optimum  Require more computational time  Small deviation  Less sensitive to an initial condition  Find global optimum  Require more computational time  Small deviation Pros & Cons When T = 1, L = -F 9

Adaptive Cooling Schedule ▸ Typical cooling schedule –Fixed –Exponential –Linear ▸ Adaptive cooling schedule –Dynamic –Adjust on the fly –Move to the next critical temperature as fast as possible TemperatureIteration Temperature 10 Iteration

Phase transition ▸ DA’s discrete behavior –In some range of temperatures, solutions are settled –At a specific temperature, start to explode, which is known as critical temperature T c ▸ Critical temperature T c –Free energy F is drastically changing at T c –Second derivative test : Hessian matrix loose its positive definiteness at T c –det ( H ) = 0 at T c, where 11

Demonstration latent points 1K data points 25 latent points 1K data points

DA-GTM Result 13

Contributions ▸ GTM optimization –GTM with distributed memory model –GTM interpolation as an out-of-sample extension –Deterministic Annealing for global optimal solution –Research on hierarchical DA-GTM ▸ GTM/DA-GTM application –PubChem data visualization –Health data visualization 14

Selected Papers ▸ J. Y. Choi, J. Qiu, M. Pierce, and G. Fox. Generative topographic mapping by deterministic annealing. To appear in the International Conference on Computational Science (ICCS) 2010, ▸ J. Y. Choi, S.-H. Bae, X. Qiu, and G. Fox. High performance dimension reduction and visualization for large high-dimensional data analysis. To appear in the Proceedings of the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2010, ▸ S.-H. Bae, J. Y. Choi, J. Qiu, and G. Fox. Dimension reduction and visualization of large high-dimensional data via interpolation. Submitted to HPDC 2010, ▸ J. Y. Choi, J. Rosen, S. Maini, M. E. Pierce, and G. C. Fox. Collective collaborative tagging system. In proceedings of GCE08 workshop at SC08, ▸ M. E. Pierce, G. C. Fox, J. Rosen, S. Maini, and J. Y. Choi. Social networking for scientists using tagging and shared bookmarks: a web 2.0 application. In 2008 International Symposium on Collaborative Technologies and Systems (CTS 2008),

16

Comparison of DA Clustering DA Clustering DA-GTM Distortion  K-means  Gaussian mixture Related Algorithm Distortion Distance DA Clustering DA-GTM DA Clustering DA-GTM 17