GPU-Accelerated Incremental Correlation Clustering of Large Data with Visual Feedback Eric Papenhausen and Bing Wang (Stony Brook University) Sungsoo Ha.

Slides:



Advertisements
Similar presentations
Point-based Graphics for Estimated Surfaces
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
Does 3D Really Make Sense for Visual Cluster Analysis? Yes! Bing Wang and Klaus Mueller Visual Analytics and Imaging Lab Computer Science Department Stony.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.
TorusVis ND : Unraveling High- Dimensional Torus Networks for Network Traffic Visualizations Shenghui Cheng, Pradipta De, Shaofeng H.-C. Jiang* and Klaus.
Structure-Based Distance Metric for High-Dimensional Space Exploration with Multi-Dimensional Scaling Jenny Hyunjung Lee , Kevin T. McDonnell, Alla Zelenyuk.
Parallel Sonar Beam Tracing CS566 Final Project Alessandro Febretti, Homam Abu Saleem.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Distributed Model-Based Learning PhD student: Zhang, Xiaofeng.
Astro-DISC: Astronomy and cosmology applications of distributed super computing.
FLANN Fast Library for Approximate Nearest Neighbors
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Tomihisa (Tom) Welsh Michael Ashikhmin Klaus Mueller Tomihisa (Tom) Welsh Michael Ashikhmin Klaus Mueller Center for Visual Computing Stony Brook University.
GmImgProc Alexandra Olteanu SCPD Alexandru Ştefănescu SCPD.
In Situ Sampling of a Large-Scale Particle Simulation Jon Woodring Los Alamos National Laboratory DOE CGF
“SEMI-AUTOMATED PARALLELISM USING STAR-P " “SEMI-AUTOMATED PARALLELISM USING STAR-P " Dana Schaa 1, David Kaeli 1 and Alan Edelman 2 2 Interactive Supercomputing.
Kien A. Hua Data Systems Lab Division of Computer Science University of Central Florida.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
A Multicolor CCD Survey for Quasars z > 3 Nikhil Revankar, Dr. Julia Kennefick, Shelly Bursick University of Arkansas, Arkansas Center for Space and Planetary.
GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science,
Visibility Queries Using Graphics Hardware Presented by Jinzhu Gao.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Copyright © 2012 Axceleon Intellectual Property All rights reserved HPC User Forum, Dearborn MI. Our Focus: Enable HPC solutions in the Cloud for our Customer.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Distributed Multigrid for Processing Huge Spherical Images Michael Kazhdan Johns Hopkins University.
HP-PURDUE-CONFIDENTIAL Final Exam May 16th 2008 Slide No.1 Outline Motivations Analytical Model of Skew Effect and its Compensation in Banding and MTF.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben.
Brookhaven Science Associates U.S. Department of Energy A Multi-Stage Expert System for Aerosol Classification Statisticians: Raymond Mugno and Wei Zhu.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Galaxies with Active Nuclei Chapter 14:. Active Galaxies Galaxies with extremely violent energy release in their nuclei (pl. of nucleus).  “active galactic.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
An Algorithm to Compute Independent Sets of Voxels for Parallelization of ICD-based Statistical Iterative Reconstruction Sungsoo Ha and Klaus Mueller Department.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Clustering Hongfei Yan School of EECS, Peking University 7/8/2009 Refer to Aaron Kimball’s slides.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Easy-to-Use RedFlag System Delivers Notifications via Phone, , Text, Social Media, and More to Improve Effectiveness of Your Communications COMPANY.
Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Chapter 25 Galaxies and Dark Matter. 25.1Dark Matter in the Universe 25.2Galaxy Collisions 25.3Galaxy Formation and Evolution 25.4Black Holes in Galaxies.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Semi-Supervised Clustering
Progressive Clustering of Big Data with GPU Acceleration and Visualization Jun Wang1, Eric Papenhausen1, Bing Wang1, Sungsoo Ha1, Alla Zelenyuk2, and Klaus.
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Tamas Szalay, Volker Springel, Gerard Lemson
Parallel Density-based Hybrid Clustering
In-Class Activity… Cloud Computing.
Image Processing for Physical Data
PyStormTracker: A Parallel Object-Oriented Cyclone Tracker in Python
Hosted on Microsoft Azure, Seismic is Drastically Changing How Enterprise Sales Teams Utilize Content to Accelerate Sales and Close Deals MICROSOFT AZURE.
ideas to mobile apps in record time,
Ch 4. The Evolution of Analytic Scalability
N-Body Gravitational Simulations
Information Sciences and Systems Lab
Presentation transcript:

GPU-Accelerated Incremental Correlation Clustering of Large Data with Visual Feedback Eric Papenhausen and Bing Wang (Stony Brook University) Sungsoo Ha (SUNY Korea) Alla Zelenyuk (Pacific Northwest National Lab) Dan Imre (Imre Comsulting) Klaus Mueller (Stony Brook University and SUNY Korea)

The Internet of Things and People

The Large Synoptic Survey Telescope Will survey the entire visible sky deeply in multiple colors every week with its three-billion pixel digital camera Probe the mysteries of Dark Matter & Dark Energy 10 x more galaxies than Sloan Digital Sky Survey Movie-like window on objects that change or move rapidly

Our Data – Aerosol Science Acquired by a state-of-the-art single particle mass spectrometer (SPLAT II) often deployed in an aircraft Used in atmospheric chemistry  understand the processes that control the atmos. aerosol life cycle  find the origins of climate change  uncover and model the relationship between atmospheric aerosols and climate

Our Data – Aerosol Science SPLAT II can acquire up to 100 particles per second at sizes between 50-3,000 nm at a precision of 1 nm  Creates a 450-D mass spectrum for each particle SpectraMiner:  Builds a hierarchy of particles based on their spectral composition  Hierarchy is used n subsequent automated classification of new particle acquisitions in the field or in the lab SpectraMiner

Tightly integrate the scientist into the data analytics  Interactive clustering – cluster sculpting  Interaction needed since the data are extremely noisy  Fully automated clustering tools typically do not return satisfactory results Strategy:  Determine leaf nodes  Merge using correlation metric via heap sort  Correlation sensitive to article composition ratios (or mixing state)

SpectraMiner – Scale Up CPU-based solution worked well for some time SPLAT II and new large campaigns present problems  At 100 particles/s, the number of particles gathered in a single acquisition run can easily reach 100,000  This would take just a 15 minute time window Large campaigns are much longer & more frequent  Datasets of 5-10M particles have become the norm Recently SPLAT II operated 24/7 for one month  Had to reduce acquisition rate to 20 particles/s CPU-based solution took days/weeks to compute

Interlude: Big Data – What Do You Need? #1: Well, data !!  data = $$  look at LinkedIn, Facebook, Google, Amazon #2: High performance computing  parallel computing (GPUs), cloud computing #3: Nifty computer algorithms for  noise removal  redundancy elimination and importance sampling  missing data estimation  outlier detection  natural language processing and analysis  image and video analysis  learning a classification model

Interlude: Big Data – What Do You Need? #1: Well, data !!  data = $$  look at LinkedIn, Facebook, Google, Amazon #2: High performance computing  parallel computing (GPUs), cloud computing #3: Nifty computer algorithms for  noise removal  redundancy elimination and importance sampling  missing data estimation  outlier detection  natural language processing and analysis  image and video analysis  learning a classification model

Incremental k-Means – Sequential Basis of our trusted CPU-based solution (10 years old) Make the first point a cluster center While number of unclustered points > 0 Pt = next unclustered point Compare Pt to all cluster centers Find the cluster with the shortest distance If(distance < threshold) Cluster Pt into cluster center Else Make Pt a new cluster center End If End Second pass to cluster outliers

Incremental k-Means – Parallel New parallizable version of the previous algorithm Do Perform sequential k-means until C clusters emerge Num_Iterations = 0 While Num_Iterations < Max_iterations In Parallel: Compare all points to C centers In Parallel: Update the C cluster centers Num_Iterations++ End Output the C clusters If number of unclustered points == 0 End Else continue End

Comments and Observations Algorithm merges the incremental k-means algorithm with a parallel implementation (k=C) Design choices:  C=96 good balance between CPU and GPU utilization  With C>96 algorithm becomes CPU-bound  With C<96 the GPU would be underutilized  A multiple of 32 avoids divergent warps on the GPU  Max_iterations = 5 worked best Advantages of the new scheme:  Second pass of previous scheme no longer needed

GPU Implementation Platform  1-4 Tesla K20 GPUs  Installed in a remote ‘cloud’ server  Future implementations will emphasize this cloud aspect more Parallelism  Launch N/32 thread blocks of size 32 x 32 each  Each thread compares a point with 3 cluster centers  Make use of shared memory to avoid non-coalesced memory accesses

GPU Implementation – Algorithm c1 = Centers[tid.y] // First 32/96 loaded by thread block c2 = Centers[tid.y + 32] // Second 32/96 loaded c3 = Centers[tid.y + 64] // Final 32/96 loaded pt = Points[tid.x] [clust, dist] = PearsonDist(pt, c1,c2,c3) // d xy =1-r xy [clust, dist] = IntraColumnReduction(clust,dist) //first thread in each column writes result If(tid.y == 0) Points.clust[tid.x] = clust Points.dist[tid.x] = dist End If

Quality Measures

Acceleration by Sub-Thresholding Size of the data was a large bottleneck  Data points had to be kept around for a long time  Cull points that were tightly clustered early  These are the points that have a low Pearson’s distance This also improved the DB index

Results – Sub-Thresholding About 33x speedup

Results – Multi-GPU 4-GPU has about 100x speedup over sequential

In-Situ Visual Feedback (1) Visualize cluster centers as summary snapshots  Glimmer MDS algorithm was used  Intuitive 2D layout for non-visualization experts Color map:  Small clusters map to mostly white  Large clusters map to saturated blue We find that early visualizations are already quite revealing  This is shown by cluster size histogram  Cluster size of M>10 is considered significant

In-Situ Visual Feedback (2) 79/96 998/ /13920

In-Situ Visual Feedback (3) 3001/ / /336994

Relation To Previous Work (1) Main difference  We perform k-means clustering for data reduction Previous work often uses map-reduce approaches  Connection most often with MPI/OpenMP  Distribute points onto a set of machines  Compute (map) one iteration of local k-means in parallel  Send the local k means to a set of reducers  Compute their averages in parallel and send back to mappers  Optionally skip the reduction step and instead broadcast to mappers for local averaging

Relation To Previous Work (2) GPU solutions  Often only parallelize the point-cluster assignments  Compute new cluster centers on the CPU due to low parallelism

Conclusions and Future Work Current approach quite promising  Good speedup  In-situ visualization of data reduction process with early valuable feedback Future work  Load-balancing point removal for multi-GPU  Anchored visualization so layout is preserved  Enable visual steering of point reduction  Extension to streaming data  Also accelerate hierarchy building

Final Slide Thanks to NSF and DOE for funding Addional support from the Ministry of Korea Knowledge Economy (MKE) Any questions?