Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Slides:



Advertisements
Similar presentations
Motivations Social Bookmarking Socialized Bookmarks Tags.
Advertisements

Future Internet Technology Building
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Scalable High Performance Dimension Reduction
Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
Analyzing large-scale cheminformatics and chemogenomics datasets through dimension reduction David J. Wild Assistant Professor & Director, Cheminformatics.
Panel: New Opportunities in High Performance Data Analytics (HPDA) and High Performance Computing (HPC) The 2014 International Conference.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.
Data Clustering Methods
High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.
Two Technique Papers on High Dimensionality Allan Rempel December 5, 2005.
Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Layer Assignment Algorithm for RLC Crosstalk Minimization Bin Liu, Yici Cai, Qiang Zhou, Xianlong Hong Tsinghua University.
Mutual Information Mathematical Biology Seminar
Dimensionality Reduction and Embeddings
Dimensionality Reduction
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum, Vin de Silva, John C. Langford Presented by Napat Triroj.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Dimensionality Reduction
1 M. Bronstein Multigrid multidimensional scaling Multigrid Multidimensional Scaling Michael M. Bronstein Department of Computer Science Technion – Israel.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
SALSASALSA Judy Qiu Research Computing UITS, Indiana University.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Service Aggregated Linked Sequential Activities GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data mining.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Generative Topographic Mapping in Life Science Jong Youl Choi School of Informatics and Computing Pervasive Technology Institute Indiana University
Service Aggregated Linked Sequential Activities GOALS: Increasing number of cores accompanied by continued data deluge. Develop scalable parallel data.
Presenter: Yang Ruan Indiana University Bloomington
Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.
Yang Ruan PhD Candidate Computer Science Department Indiana University.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
S CALABLE H IGH P ERFORMANCE D IMENSION R EDUCTION Seung-Hee Bae.
Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.
Multidimensional Scaling Vuokko Vuori Based on: Data Exploration Using Self-Organizing Maps, Samuel Kaski, Ph.D. Thesis, 1997 Multivariate Statistical.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Service Aggregated Linked Sequential Activities: GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Deterministic Annealing Dimension Reduction and Biology Indiana University Environmental Genomics April Geoffrey.
Parallel Applications And Tools For Cloud Computing Environments SC 10 New Orleans, USA Nov 17, 2010.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Deterministic Annealing and Robust Scalable Data mining for the Data Deluge Petascale Data Analytics: Challenges, and Opportunities (PDAC-11) Workshop.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
Optimization Indiana University July Geoffrey Fox
Yang Ruan PhD Candidate Salsahpc Group Community Grid Lab Indiana University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Heuristic Optimization Methods
Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu
Applying Twister to Scientific Applications
DACIDR for Gene Analysis
Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.
Adaptive Interpolation of Multidimensional Scaling
Towards High Performance Data Analytics with Java
Indiana University July Geoffrey Fox
Presentation transcript:

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive Technology Institute at Indiana University Bloomington

Outline  Motivation  Background and Related Work  DA-SMACOF  Experimental Analysis  Conclusions 2

Outline  Motivation  Background and Related Work  DA-SMACOF  Experimental Analysis  Conclusions 3

Motivation  Data explosion  Information visualization makes data analysis feasible for such a vast and high-dimensional scientific data.  e.g.) Biological sequence, chemical Compound data, etc.  Dimension reduction alg. helps people to investigate unknown data distribution of the high-dimensional data.  Multidimensional Scaling (MDS) for Data Visualization  Construct a mapping in the target dimension w.r.t. the proximity (dissimilarity) information.  Non-linear optimization problem.  Easy to trapped in local optima  How to avoid local optima? 4

Mapping examples 5

Outline  Motivation  Background and Related Work  Multidimensional Scaling  SMACOF algorithm  Deterministic Annealing method  DA-SMACOF  Experimental Analysis  Contributions 6

Multidimensional Scaling (MDS)  Given the proximity information among points.  Optimization problem to find mapping in target dimension of the data based on given pairwise proximity information while minimize the objective function.  Objective functions: STRESS (1) or SSTRESS (2)  Only needs pairwise dissimilarities  ij between original points  (not necessary to be Euclidean distance)  d ij (X) is Euclidean distance between mapped (3D) points 7

Scaling by MAjorizing a COmplicated Function. (SMACOF)  Iterative majorizing algorithm to solve MDS problem.  EM-like hill-climbing approach.  Decrease STRESS value monotonically.  Tend to be trapped in local optima.  Computational complexity and memory requirement is O(N 2 ). 8

Deterministic Annealing (DA)  Simulated Annealing (SA) applies Metropolis algorithm to minimize F by random walk.  Gibbs Distribution at T (computational temperature).  Minimize Free Energy (F)  As T decreases, more structure of problem space is getting revealed.  DA tries to avoid local optima w/o random walking.  DA finds the expected solution which minimize F by calculating exactly or approximately.  DA applied to clustering, GTM, Gaussian Mixtures etc. 9

Outline  Motivation  Background and Related Work  DA-SMACOF  Experimental Analysis  Conclusions 10

DA-SMACOF  If we use STRESS objective function as an expected energy function of MDS problem, then  Also, we define P 0 and F 0 as following:  minimize F MDS (P 0 ) = | 0 + F 0 (P 0 ) w.r.t. μ i  − | 0 + F 0 (P 0 ) is independent to μ i  part is necessary to be minimized w.r.t. μ i. 11

DA-SMACOF (2)  Use EM-like SMACOF alg. to calculate expected mapping which minimize at T.  New STRESS is following: 12

DA-SMACOF (3)  The MDS problem space could be smoother with higher T than with the lower T.  T represents the proportion of entropy to the free energy F.  Generally DA approach starts with very high T, but if T 0 is too high, then all points are mapped at the origin.  We need to find appropriate T 0 which makes at least one of the points is not mapped at the origin. 13

DA-SMACOF (4) 14

Outline  Motivation  Background and Related Work  DA-SMACOF  Experimental Analysis  Conclusions 15

Experimental Analysis  Data  UCI ML Repository Iris (150), cancer (683), and yeast (1484)  Chemical compounds 155 featured real-value vector of 333 instances.  Biological Sequence 30,000 Metagenomics sequences N by N dissimilarity matrix through SW alg.  Algorithms  SMACOF  Distance Smoothing (MDS-DistSmooth)  Proposed DA-SMACOF  Compare the avg. of 50 (10 for seq. data) random initial runs. 16

Mapping Quality (iris)  4D real-value vector  3 diff. classes  150 instances  DA-exp95 improves  57.1/45.8% of SMACOF  43.6/13.2% of DS-s100 17

Mapping Quality (compounds) 18  155-D real-value vector  333 instances  avg. STRESS of  SMACOF: 2.50/1.88  DS-s100: 2.66/1.57 times larger than DA-exp95

Mapping Quality (breast cancer) 19  9-D int-value vector  683 instances  avg. STRESS of  SMACOF: 18.6/11.3%  DS-s100: 8.3/comparable worse than DA-exp95  DA-exp99 is worse than DA-exp95/90

Mapping Quality (yeast) 20  8 real-value attributes  1484 instances  SMACOF shows higher divergence

Mapping Quality (metagenomics) 21  N x N dissimilarity matix  30,000 sequences  Run w/ Parallel SMACOF  DA-exp95 improves  12.6/10.4% of SMACOF

Mapping Examples 22

Runtime Comparison 23

Outline  Motivation  Background and Related Work  DA-SMACOF  Experimental Analysis  Conclusions 24

Conclusions  Dimension reduction could be a very useful tool for high dimensional scientific data visualization.  SMACOF is easy to be trapped in local optima.  Apply DA approach to SMACOF alg. to prevent trapping local optima. DA-SMACOF  outperforms SMACOF and MDS-DistSmooth in Quality by better STRESS value. Reliability by consistent result (less sensitive to initial conf).  uses compatible runtimes, 1.12 ~ 4.2 times longer than SMACOF and 1.3 ~ 9.1 times shorter than DistSmooth. 25

Thanks! Questions? 26