Tree-Based Density Clustering using Graphics Processors

Slides:



Advertisements
Similar presentations
DBSCAN & Its Implementation on Atlas Xin Zhou, Richard Luo Prof. Carlo Zaniolo Spring 2002.
Advertisements

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng,
OpenFOAM on a GPU-based Heterogeneous Cluster
University of Wisconsin Petascale Tools Workshop Madison, WI August 4-7 th 2014 The Hybrid Model: Experiences at Extreme Scale Benjamin Welton.
Cluster Analysis.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
Cluster Analysis.
FLANN Fast Library for Approximate Nearest Neighbors
K-Ary Search on Modern Processors Fakultät Informatik, Institut Systemarchitektur, Professur Datenbanken Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
1 CS 4396 Computer Networks Lab Dynamic Routing Protocols - II OSPF.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
PFPC: A Parallel Compressor for Floating-Point Data Martin Burtscher 1 and Paruj Ratanaworabhan 2 1 The University of Texas at Austin 2 Cornell University.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben.
Topic9: Density-based Clustering
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Data Structures and Algorithms in Parallel Computing Lecture 1.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
1 Core Techniques: Cluster Analysis Cluster: a number of things of the same kind being close together in a group (Longman dictionary of contemporary English.
Other Clustering Techniques
Streaming Applications for Robots with Real Time QoS Oct Supun Kamburugamuve Indiana University.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.
Cohesive Subgraph Computation over Large Graphs
DATA MINING Spatial Clustering
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
Tree-Structured Indexes
Parallel Algorithm Design
Real-Time Ray Tracing Stefan Popov.
CS 685: Special Topics in Data Mining Jinze Liu
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
CSE572, CBS598: Data Mining by H. Liu
Stack Trace Analysis for Large Scale Debugging using MRNet
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Presentation transcript:

Tree-Based Density Clustering using Graphics Processors A First Marriage of MRNet and GPUs Evan Samanas and Ben Welton Roadmap or Through the roadmap. What our contribution was. Explain our contribution. Move tweet slide forward. Some more info to catch interest. Tweet slide after this. One person standing.

Tree-Based Density Clustering using Graphics Processors The Tweet Stream v v ` ` Source: Twitter, Map: About.com Tree-Based Density Clustering using Graphics Processors

Tree-Based Overlay Networks (TBONs) Scalable multicast Scalable gather Scalable data aggregation FE … BE app CP Tree-Based Density Clustering using Graphics Processors

MRNet – Multicast / Reduction Network General-purpose TBON API Network: user-defined topology Stream: logical data channel to a set of back-ends multicast, gather, and custom reduction Packet: collection of data Filter: stream data operator synchronization transformation Widely adopted by HPC tools CEPBA toolkit Cray ATP & CCDB Open|SpeedShop & CBTF STAT TAU FE F(x1,…,xn) CP CP CP CP CP CP Forgot to mention reduction operation with streams. Clean up boxes … … … BE app BE app BE app BE app Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors TBON Computation … Data to process: e.g. 40 MB Total Time: ~60 sec Total Time: ~30 sec Ideal Characteristics: Filter output size constant or decreasing Computation rate similar across levels Adjustable for load balance FE ~10 sec Packet Size: ≤10 MB ~10 sec CP CP Packet Size: ≤10 MB BE app BE app BE app … 4x BE app ~10 sec ~40 sec Simple animation for each bullet point? Should be constant/decreasing output size. Would hit scaling limits soon. Nuke original bullet. Keep sub-bullet. Mention that these are general features of MRNet (ideal computation). Replace symbol on <= with actual symbol. Data Size: 10MB per BE ~10 sec Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Why GPUs? Natural fit Increase compute power Trade computation for bandwidth Derived summaries Compute and send ∆ data Compression algorithms (e.g. LZO, zlib, etc.) FE F(x1,…,xn) CP CP CP CP CP CP Mention Jaguar at end? … … … BE app BE app BE app BE app Tree-Based Density Clustering using Graphics Processors

Clustering Example (DBSCAN[1]) The two parameters that determine if a point is in a cluster is Epsilon (Eps), and MinPts For every discovered point, this same calculation is performed until the cluster is fully expanded Goal: Find regions that meet minimum density and spatial distance characteristics If the number of points in Eps is > MinPts, the point is a core point. Min Points Eps Min Pts: 3 Attribute to Hans on slide, try and fix the transition between 5 and 6. Don’t say that we had gpu’s and were looking for A problem. Change 1 on slide title. [1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996) Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Scaling DBSCAN PDBSCAN[2] Quality equivalent to single DBSCAN Linear speedup up to 8 nodes DBDC[3] Sacrifices quality ~30x speedup on 15 nodes CUDA-Dclust[4] Quality equivalent to DBSCAN ~15x faster on 1 node CUDA-Dclust looks like a third example of an algorithm. Change sub-bullet type or do something. Define quality maybe here? Mention speed up for number nodes for each of these. Say it’s a GPU running on a single node [2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999) [3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004) [4] C. Bohm et al., Density-based clustering using graphics processors (2009) Tree-Based Density Clustering using Graphics Processors

Tree-Based Clustering: Mr. Scan Algorithm Steps SpatialDecomp: CPU(@ FE) DBSCAN: CPU or GPU(@ BE) DrawBoundBox: CPU or GPU MergeCluster: CPU (x #levels) FE MergeCluster CP CP DBSCAN BE BE BE BE Talk about dbdc. 15sec’s to explain where it came from. Discuss number of levels/cpu/gpu/BE/etc on this slide. No parenthesis after function names. Wide dash is a bad idea. Tree-Based Density Clustering using Graphics Processors

Spatial Decomposition Eps 1. Start with an input of Spatially Referenced points 2. Partition the region into equal sized density regions across one dimension 3. Add the shadow region area of one Epsilon to all density regions Partition #3 Partition #1 Partition #2 Split the area so that we show points split in area. Label Eps. Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors DBSCAN - CPU Run on local slice for each BE R* tree Start at random point Cluster w/ respect to Eps, MinPts Complexity: O(n log n) A B R* - show something about it. Say it in a way that we extended dbscan via the following. Discuss the relationship of the levels of the tree. How does it work. C D E R* Tree Example Tree-Based Density Clustering using Graphics Processors

GPU DBSCAN Filter Candidate Clusters are potential clusters being explored concurrently in the GPU Number of cluster candidates is limited by GPU Characteristics State array stores the current state of points in the search space States Candidate #1 Coarse grain search space Chain Collisions causing cluster merges Candidate #2 . Canidate Clusters == Chains. What happens to chain number 1/2 do after states merge. What is a thread block maybe replace with limited by gpu charactoristics. Reference the cuda-dclust algorithm. More spacing between text. Consistentcy with justification. Candidate #N GPU DBSCAN operates similarly to the CPU version with two exceptions CUDA-DCLUST [09 – Böhm]  Tree-Based Density Clustering using Graphics Processors

DrawBoundBox – CPU | | GPU Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors MergeBoundBox - CPU Checks for merge if box within shadow At least one point MUST be in common Iterate through ALL points in right cluster Match! Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors The Tweet Stream ` ` Source: Twitter, Map: About.com Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Evaluation Dataset: 1-3 “Tweet Days” Measuring: Time to completion Quality compared to single-threaded DBSCAN Algorithms: Single-Threaded DBSCAN DBDC MRNet w/DBSCAN filter MRNet w/DBSCAN GPU filter Not a results slide, this is an eval slide. Maybe use Evaluation. Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Results Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Results Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Results Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Quality Change range to 120%, 100k or a comma or something. Change colors. Change xxx.00%. Drop decimals Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Future Work Scaling Issues Spatial Decomposition Merging Algorithm Tree-Based Density Clustering using Graphics Processors

2D Spatial Decomposition Eps 7pts 11pts 1D Spatial Decomposition has some severe limitations Splits can have wildly differing point counts Number of splits limited by Epsilon 2D Spatial Decomposition would allow for more Splits with more equal point counts Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Merging Algorithm Bounding Box No quality degradation Limits iteration, but still iterates Possible Alternative: Concave Hull DBSCAN on border points Avoids iteration Constant output size O(n2) on BEs Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Use Cases Twitter Data “Flu Tweets” Mood/Topic clustering Riot prediction Any Spatial data Currently limited to 2D Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Conclusion DBSCAN performance scales Quality able to be maintained Goal: Scale O(100,000 nodes) Tree-Based Density Clustering using Graphics Processors