Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu

Service Aggregated Linked Sequential Activities: High Performance Data Mining on Multi-core Systems
Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu Community Grids Laboratory, Indiana University Bloomington Research Computing UITS Technology Collaboration: George Chrysanthakopoulos, Henrik Frystyk Nielsen Microsoft Application Collaboration: Cheminformatics: Rajarshi Guha and David Wild Bioinformatics: Haiku Tang Demographics (GIS): Neil Devadasan IUPUI GOALS: Increasing number of cores accompanied by continued data deluge Develop scalable parallel data mining algorithms with good multicore and cluster performance; understand software runtime and parallelization method Use managed code (C#) and package algorithms as services to encourage broad use assuming experts parallelize core algorithms CURRENT RESUTS: Microsoft CCR supports MPI, dynamic threading and via DSS Service model of computing; detailed performance measurements Speedups of 7.5 or above on 8-core systems for “large problems” with deterministic annealed (avoid local minima) algorithms for clustering, Gaussian Mixtures, GTM (dimensional reduction); extending to new algorithms/applications

Parallel Programming Strategy
Speedup = Number of cores/(1+f) f = (Sum of Overheads)/(Computation per core) Computation  Grain Size n . # Clusters K Overheads are Synchronization: small with CCR Load Balance: good Memory Bandwidth Limit:  0 as K   Cache Use/Interference: Important Runtime Fluctuations: Dominant large n,K All our “real” problems have f ≤ 0.05 and speedups on 8 core systems greater than 7.6 “Main Thread” and Memory M 1 m1 m0 2 m2 3 m3 4 m4 5 m5 6 m6 7 m7 Subsidiary threads t with memory mt Use Data Decomposition as in classic distributed memory but use shared memory for read variables. Each thread uses a “local” array for written variables to get good cache performance MPI Exchange Latency in µs (20-30 µs computation between messaging) Machine OS Runtime Grains Parallelism MPI Latency Intel8c:gf12 (8 core 2.33 Ghz) (in 2 chips) Redhat MPJE(Java) Process 8 181 MPICH2 (C) 40.0 MPICH2:Fast 39.3 Nemesis 4.21 Intel8c:gf20 Fedora MPJE 157 mpiJava 111 MPICH2 64.2 Intel8b (8 core 2.66 Ghz) Vista 170 142 100 CCR (C#) Thread 20.2 AMD4 (4 core 2.19 Ghz) XP 4 185 152 99.4 CCR 16.3 Intel(4 core) 25.8 Fractional Overhead f K=10 Clusters 20 Clusters 10000/Grain Size 30 Clusters DA Clustering Performance Runtime Fluctuations 2% to 5% overhead

PCA GTM Linear PCA v. nonlinear GTM on 6 Gaussians in 3D
Deterministic Annealing Clustering of Indiana Census Data Decrease temperature (distance scale) to discover more clusters Resolution T0.5 r: Renters a:Asian h: Hispanic p: Total GTM Projection of 2 clusters of 335 compounds in 155 dimensions Stop Press: GTM Projection of PubChem: 10,926,94 compounds in 166 dimension binary property space takes 4 days on 8 cores. 64X64 mesh of GTM clusters interpolates PubChem. Could usefully use 1024 cores! David Wild will use for GIS style 2D browsing interface to chemistry PCA GTM Bioinformatics: Annealed Clustering and Euclidean embedding for repetitive sequences, gene/protein families. Use GTM to replace PCA in structure analysis Linear PCA v. nonlinear GTM on 6 Gaussians in 3D

General Formula DAC GM GTM DAGTM DAGM
N data points E(x) in D dim. space and Minimize F by EM Parallel Dimensional Scaling and Metric embedding; Generalized Cluster analysis Generative Topographic Mapping GTM a(x) = 1 and g(k) = (1/K)( /2)D/2 s(k) = 1/  and T = 1 Y(k) = m=1M Wmm(X(k)) Choose fixed m(X) = exp( (X-m)2/2 ) Vary Wm and  but fix values of M and K a priori Y(k) E(x) Wm are vectors in original high D dim. space X(k) and m are vectors in 2 dim. mapped space Deterministic Annealing Clustering DAC a(x) = 1/N or generally p(x) with  p(x) =1 g(k)=1 and s(k)=0.5 T is annealing temperature varied down from  with final value of 1 Vary cluster center Y(k) but can calculate Pk and (k) (even for matrix (k)) using IDENTICAL formulae for Gaussian mixtures K starts at 1 and is incremented by algorithm DAGTM: GTM has several natural annealing versions based on either DAC or DAGM: under investigation Deterministic Annealing Gaussian mixture models DAGM a(x) = 1 g(k)={Pk/(2(k)2)D/2}1/T s(k)= (k)2 (taking case of spherical Gaussian) T is annealing temperature varied down from  with final value of 1 Vary Y(k) Pk and (k) K starts at 1 and is incremented by algorithm Also have Principal Component Analysis Near Term Future Work: Parallel Algorithms for Random Projection Metric Embedding (Bourgain) MDS Dimensional Scaling (EM like SMACOF) Marquardt Algorithm for Newton’s Method Later: HMM and SVM, Other embedding We need: Large Windows Cluster Link of CCR and MPI (or cross cluster CCR) Linear Algebra for C#: Multiplication, SVD, Equ. Solve High Performance C# Math Libraries Traditional Gaussian mixture models GM As DAGM but set T=1 and fix K

Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu

Similar presentations

Presentation on theme: "Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu

Similar presentations

Presentation on theme: "Geoffrey Fox, Huapeng Yuan, Seung-Hee Bae Xiaohong Qiu"— Presentation transcript:

Similar presentations

About project

Feedback