Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.

Slides:

Advertisements

Similar presentations

Scalable High Performance Dimension Reduction

Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.

Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.

Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.

High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.

OpenFOAM on a GPU-based Heterogeneous Cluster

By Xinggao Xia and Jong Chul Lee. TechniqueAdditionsMultiplications/Divisions Gauss-Jordann 3 /2 Gaussian Eliminationn 3 /3 Cramer’s Rulen 4 /3 n :

Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

Localization from Mere Connectivity Yi Shang (University of Missouri - Columbia); Wheeler Ruml (Palo Alto Research Center ); Ying Zhang; Markus Fromherz.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.

Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.

SALSASALSA Judy Qiu Research Computing UITS, Indiana University.

SALSASALSA Programming Abstractions for Multicore Clouds eScience 2008 Conference Workshop on Abstractions for Distributed Applications and Systems December.

Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,

Service Aggregated Linked Sequential Activities GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data mining.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Service Aggregated Linked Sequential Activities GOALS: Increasing number of cores accompanied by continued data deluge. Develop scalable parallel data.

1 Performance of a Multi-Paradigm Messaging Runtime on Multicore Systems Poster at Grid 2007 Omni Austin Downtown Hotel Austin Texas September

SALSASALSA International Conference on Computational Science June Kraków, Poland Judy Qiu

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

Independent Component Analysis (ICA) A parallel approach.

Presenter: Yang Ruan Indiana University Bloomington

Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

Intelligent Database Systems Lab 1 Advisor ： Dr. Hsu Graduate ： Jian-Lin Kuo Author ： Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.

SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.

Parallelization of the Classic Gram-Schmidt QR-Factorization

S CALABLE H IGH P ERFORMANCE D IMENSION R EDUCTION Seung-Hee Bae.

SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox School of Informatics and Computing Pervasive.

SALSA HPC Group School of Informatics and Computing Indiana University.

Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,

Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department.

1 Performance Measurements of CCR and MPI on Multicore Systems Expanded from a Poster at Grid 2007 Austin Texas September Xiaohong Qiu Research.

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.

Slide 14.1 Nonmetric Scaling MathematicalMarketing Chapter 14 Nonmetric Scaling Measurement, perception and preference are the main themes of this section.

Service Aggregated Linked Sequential Activities: GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data.

Shanghai Many-Core Workshop, March Judy Qiu Research.

SALSA HPC Group School of Informatics and Computing Indiana University.

1 Multicore for Science Multicore Panel at eScience 2008 December Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University.

Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox

Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.

Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,

SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.

Yang Ruan PhD Candidate Salsahpc Group Community Grid Lab Indiana University.

SPIDAL Java High Performance Data Analytics with Java on Large Multicore HPC Clusters

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Ioannis E. Venetis Department of Computer Engineering and Informatics

Thilina Gunarathne, Bimalee Salpitkorala, Arun Chauhan, Geoffrey Fox

Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu

Applying Twister to Scientific Applications

DACIDR for Gene Analysis

Adaptive Interpolation of Multidimensional Scaling

Hybrid Programming with OpenMP and MPI

Presentation transcript:

Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae

Contents  Multidimensional Scaling (MDS)  Scaling by MAjorizing a COmplicated Function (SMACOF)  Parallelization of SMACOF  Performance Analysis  Conclusions & Future Works

Multidimensional Scaling (MDS)  Techniques to configure data points in high- dimensional space Into low-dimensional space based on proximity (dissimilarity) info.  e.g.) N-dimension  3-dimension (viewable)  Dissimilarity Matrix [ Δ = (δ ij ) ]  Symmetric  Non-negative  Zero-diagonal elements  MDS can be used for visualization of high- dimensional scientific data  E.g.) chemical data, biological data

MDS (2)  Can be seen as an optimization problem.  minimization of the objective function.  Objective Function  STRESS [ σ(X)] – weighted squared error btwn dist.  SSTRESS [ σ 2 (X)] – btwn squared dist.  Where, d ij (X) = | x i – x j |, x i & x j are mapping results.

SMACOF  Scaling by MAjorizing a COmplicated Function  Iterative EM-like algorithm  A variant of gradient descent approach  Likely to have local minima  Guarantee monotonic decreasing the objective criterion.

SMACOF (2)

Parallel SMACOF  Dominant time consuming part  Iterative matrix multiplication. O( k * N 3 )  Parallel MM on Multicore machine  Shared memory parallelism.  Only need to distribute computation but not data.  Block decomposition and mapping decomposed submatrix to each thread based on thread ID.  Only need to know starting and end position ( i, j ) instead of actually dividing the matrix into P submatrix, like MPI style.

Parallel SMACOF (2)  Parallel matrix multiplication n n b b n n b b X = n n b b ABC a ij a i1 a im …… b 1j b ij b mj … … c ij

Experiments  Test Environments Intel8aIntel8b CPUIntel Xeon E5320Intel Xeon x5355 CPU clock1.86 GHz2.66 GHz Core4-core x 2 L2 Cache8 MB Memory8 GB4 GB OSXP pro 64 bitVista Ultimate 64 bit LanguageC#

Experiments (2)  Benchmark Data  4D Gaussian Distribution data set (8 centers) (0,2,0,1) (0,0,1,0) (0,0,0,0) (2,0,0,0) (2,2,0,1) (2,2,1,0) (0,2,4,1) (2,0,4,1)

Experiments (3)  Design  Different number of block size  Cache line effect  Different number of threads and data points  Scalability of the parallelism  Jagged 2D Array vs. Rectangular 2D array  C# language specific issue.  Known that Jagged array performs better than multidimensional array.

Experimental Results (1)  Different block size (Cache effect)

Experimental Results (2)  Different Block Size (using 1 thread) Intel8aIntel8b #pointsblkSizeTime(sec)speedup#pointsblkSizeTime(sec)speedup

Experimental Results (3)  Different Data Size Speedup ≈ 7.7 Overhead ≈ 0.03

Experimental Results (4)  Different number of Threads  1024 data points

Experimental Results (5)  Jagged Array vs. 2D array  1024 data points w/ 8 threads

MDS Example: Biological Sequence Data 4500 Points : Pairwise Aligned 4500 Points : ClustalW MSA 17

Obesity Patient ~ 20 dimensional data 2000 records 6 Clusters 4000 records 8 Clusters 18

Conclusion & Future Works  Parallel SMACOF shows  High efficiency (> 0.94) and speedup (> 7.5 / 8-core), for larger data, i.e or 2048 points.  Cache effect: b=64 is most fitted block size for the block matrix multiplication for the tested machines.  Jagged array is at least 1.4 times faster than 2D array for the parallel SMACOF.  Future Works  Distributed memory version of SMACOF

Acknowledgement  Prof. Geoffrey Fox  Dr. Xiaohong Qiu  SALSA project group of CGL at IUB

Questions? Thanks!