Chung Sheng CHEN, Nauful SHAIKH, Panitee CHAROENRATTANARUK, Christoph F. EICK, Nouhad RIZK and Edgar GABRIEL Department of Computer Science, University.

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Advertisements

A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
CLEVER: CLustEring using representatiVEs and Randomized hill climbing Rachana Parmar and Christoph F. Eick:
OpenFOAM on a GPU-based Heterogeneous Cluster
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms Kang Chen and Jeremy Johnson Department of Mathematics and.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for the Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Discovering Interesting Regions in Spatial Data Sets using Supervised Clustering Christoph F. Eick, Banafsheh Vaezian, Dan Jiang, Jing Wang PKDD Conference,
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users Efficient Parallel CKY Parsing on GPUs Youngmin Yi (University.
A N A RCHITECTURE AND A LGORITHMS FOR M ULTI -R UN C LUSTERING Rachsuda Jiamthapthaksin, Christoph F. Eick and Vadeerat Rinsurongkawong Computer Science.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Ch. Eick: Region Discovery Project Part3 Region Discovery Project Part3: Overview The goal of Project3 is to design a region discovery algorithm and evaluate.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,
1. Data Mining (or KDD) Let us find something interesting! Definition := “Data Mining is the non-trivial process of identifying valid, novel, potentially.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
MOSAIC: A Proximity Graph Approach for Agglomerative Clustering Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-shen Chen, Ulvi Celepcikay, Christian Guisti,
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Data Mining & Machine Learning Group ADMA09 Rachsuda Jianthapthaksin, Christoph F. Eick and Ricardo Vilalta University of Houston, Texas, USA A Framework.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Zeidat&Eick, MLMTA, Las Vegas K-medoid-style Clustering Algorithms for Supervised Summary Generation Nidal Zeidat & Christoph F. Eick Dept. of Computer.
Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 15: Basic Parallel Programming Concepts.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Department of Computer Science 1 Data Mining / KDD Let us find something interesting! Definition := “KDD is the non-trivial process of identifying valid,
Topics in Memory System Design 2016/2/5\course\cpeg323-07F\Topic7.ppt1.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Sunpyo Hong, Hyesoon Kim
Data Mining & Machine Learning Group UH-DMML: Ongoing Data Mining Research Data Mining and Machine Learning Group, Computer Science Department, University.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Ch. Eick: Randomized Hill Climbing Techniques Randomized Hill Climbing Neighborhood Hill Climbing: Sample p points randomly in the neighborhood of the.
1. Randomized Hill Climbing Neighborhood Randomized Hill Climbing: Sample p points randomly in the neighborhood of the currently best solution; determine.
1 Parallel Datacube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation Ruoming Jin Ge Yang Gagan Agrawal The Ohio State University.
Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Algorithms and Problem Solving
CS427 Multicore Architecture and Parallel Computing
Parallel Programming By J. H. Wang May 2, 2017.
Lecture 5: GPU Compute Architecture
Department of Computer Science University of California, Santa Barbara
1. Randomized Hill Climbing
Randomized Hill Climbing
Computing the Entropy (H-Function)
Randomized Hill Climbing
Algorithms and Problem Solving
Spatial Data Mining Definition: Spatial data mining is the process of discovering interesting patterns from large spatial datasets; it organizes by location.
6- General Purpose GPU Programming
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Chung Sheng CHEN, Nauful SHAIKH, Panitee CHAROENRATTANARUK, Christoph F. EICK, Nouhad RIZK and Edgar GABRIEL Department of Computer Science, University of Houston Talk Organization 1. Randomized Hill Climbing 2. CLEVER—A Prototype-based Clustering Algorithm which Supports Fitness Functions 3. OpenMP and CUDA Versions of Clever 4. Experimental Results 5. Summary 1 Design and Evaluation of a Parallel Execution Framework for the CLEVER Clustering Algorithm

1. Randomized Hill Climbing Neighborhood Randomized Hill Climbing: Sample p points randomly in the neighborhood of the currently best solution; determine the best solution of the n sampled points. If it is better than the current solution, make it the new current solution and continue the search; otherwise, terminate returning the current solution. Advantages: easy to apply, does not need many resources, usually fast. Problems: How do I define my neighborhood; what parameter p should I choose? Eick et al., ParCo11, Ghent

Maximize f(x,y,z)=|x-y-0.2|*|x*z-0.8|*|0.3-z*z*y| with x,y,z in [0,1] Neighborhood Design: Create solutions 50 solutions s, such that: s= (min(1, max(0,x+r1)), min(1, max(0,y+r2)), min(1, max(0, z+r3)) with r1, r2, r3 being random numbers in [-0.05,+0.05]. Example Randomized Hill Climbing Eick et al., ParCo11, Ghent

2. CLEVER: Clustering with Plug-in Fitness Functions  In the last 5 years, the UH-DMML Research Group at the University of Houston developed families of clustering algorithms that find contiguous spatial clusters by maximizing a plug-in fitness function.  This work is motivated by a mismatch between evaluation measures of traditional clustering algorithms (such as cluster compactness) and what domain experts are actually looking for.  Plug-in Fitness Functions allow domain experts to instruct clustering algorithms with respect to desirable properties of “good” clusters the clustering algorithm should seek for. 4 Eick et al., ParCo11, Ghent

Region Discovery Framework 8 Eick et al., ParCo11, Ghent

Region Discovery Framework3 The algorithms we currently investigate solve the following problem: Given: A dataset O with a schema R A distance function d defined on instances of R A fitness function q(X) that evaluates clusterings X={c 1,…,c k } as follows: q(X)=  c  X reward(c)=  c  X i(c)  size(c)  with  1 Objective: Find c 1,…,c k  O such that: 1. c i  c j =  if i  j 2. X={c 1,…,c k } maximizes q(X) 3. All cluster c i  X are contiguous (each pair of objects belonging to c i has to be delaunay- connected with respect to c i and to d) 4. c 1  …  c k  O 5. c 1,…,c k are usually ranked based on the reward each cluster receives, and low reward clusters are frequently not reported 10 Eick et al., ParCo11, Ghent

Example1: Finding Regional Co-location Patterns in Spatial Data Objective: Find co-location regions using various clustering algorithms and novel fitness functions. Applications: 1. Finding regions on planet Mars where shallow and deep ice are co-located, using point and raster datasets. In figure 1, regions in red have very high co-location and regions in blue have anti co-location. 2. Finding co-location patterns involving chemical concentrations with values on the wings of their statistical distribution in Texas ’ ground water supply. Figure 2 indicates discovered regions and their associated chemical patterns. Figure 1: Co-location regions involving deep and shallow ice on Mars Figure 2: Chemical co-location patterns in Texas Water Supply 12

Example 2: Regional Regression Geo-regression approaches: Multiple regression functions are used that vary depending on location. Regional Regression: I. To discover regions with strong relationships between dependent & independent variables II. Construct regional regression functions for each region III. When predicting the dependent variable of an object, use the regression function associated with the location of the object 13 Eick et al., ParCo11, Ghent

Representative-based Clustering Attribute2 Attribute Objective: Find a set of objects O R such that the clustering X obtained by using the objects in O R as representatives minimizes q(X). Characteristic: cluster are formed by assigning objects to the closest representative Popular Algorithms: K-means, K-medoids/PAM, CLEVER, CLEVER, 9 Eick et al., ParCo11, Ghent

10 A prototype-based clustering algorithm which supports plug- in fitness function Uses a randomized hill climbing procedure to find a “good” set of prototype data objects that represent clusters “good”  maximize the plug-in fitness function Search for the “correct number of cluster” CLEVER is powerful but usually slow; Hill Climbing Procedure CLEVER Plug-in fitness function Neighboring solutions generator Assign cluster members Eick et al., ParCo11, Ghent

Inputs: Dataset O, k’, neighborhood-size, p, q, , object-distance-function d or distance matrix D, i-max Outputs: Clustering X, fitness q(X), rewards for clusters in X Algorithm: 1. Create a current solution by randomly selecting k’ representatives from O. 2. If i-max iterations have been done terminate with the current solution 3. Create p neighbors of the current solution randomly using the given neighborhood definition. 4. If the best neighbor improves the fitness q, it becomes the current solution. Go back to step If the fitness does not improve, the solution neighborhood is re-sampled by generating p’ (more precisely, first 2*p solutions and then (q-2)*p solutions are re-sampled) more neighbors. If re-sampling does not lead to a better solution, terminate returning the current solution (however, clusters that receive a reward of 0 will be considered outliers and non-reward clusters are therefore not returned); otherwise, go back to step 2 replacing the current solution by the best solution found by re-sampling. Pseudo Code of CLEVERs) 11

3. PAR-CLEVER : A Faster Clustering Algorithm OpenMP CUDA (GPU computing) MPI Map/Reduce 12 Eick et al., ParCo11, Ghent

13 1 0Ovals Size:3,359 Fitness function: purity Earthquake Size: 330,561 Fitness function: find clusters with high variance with respect to earthquake depth Yahoo Ads Clicks full size: 3,009,071,396; subset:2,910,613 Fitness function: minimum intra-cluster distance Eick et al., ParCo11, Ghent

14 1. Assign cluster members: O(n*k) 1. Data parallelization 2. Highly independent 3. The first priority for parallelization 2. Fitness value calculation : ~ O(n) 3. Neighboring solutions generation: ~ O(p) n:= number of object in the dataset k:= number of clusters in the current solution p:= sampling rate (how many neighbors of the current solution are sampled) Eick et al., ParCo11, Ghent

15 crill-001 to crill-016 (OpenMP) Processor : 4 x AMD Opteron(tm) Processor 6174 CPU cores : 48 Core speed : 2200 MHz Memory : 64 GB crill-101 and crill-102 (GPU Computing—NVIDIA CUDA) Processor : 2 x AMD Opteron(tm) Processor 6174 CPU cores : 24 Core speed : 2200 MHz Memory : 32 GB GPU Device : 4 x Tesla M2050, Memory : 3 Gb CUDA cores : 448 Eick et al., ParCo11, Ghent

16 100val Dataset ( size = 3359 ) p=100, q=27, k’=10, η = 1.1, th=0.6, β = 1.6, Interestingness Function=Purity Threads Loop-level Time(sec) Speedup Efficiency Loop-level + Incremental Updating Time(sec) Speedup Efficiency Task-level Time(sec) Speedup Efficiency Iterations = 14, Evaluated neighbor solutions = 15200, k = 5, Fitness = Eick et al., ParCo11, Ghent

17 Eick et al., ParCo11, Ghent

18 Earthquake Dataset ( size = 330,561 ) p=50, q=12, k’=100, η =2, th=1.2, β = 1.4, Interestingness Function=Variance High Threads Loop-level Time(hours) Speedup Efficiency Loop-level + Incremental Updating Time(hours) Speedup Efficiency Task-level Time(hours) Speedup Efficiency Iterations = 216, Evaluated neighbor solutions = 21,950, k = 115 Eick et al., ParCo11, Ghent

19 Eick et al., ParCo11, Ghent

20 Yahoo Reduced Dataset ( size = ) p=48, q=7, k’=80, η =1.2, th=0, β = , Interestingness Function=Average Distance to Medoid Threads Loop-level Time(hours) Speedup Efficiency Loop-level + Incremental Updating Time(hours) Speedup Efficiency Task-level Time(hours) Speedup Efficiency Iterations = 10, Evaluated neighbor solutions = 480, k = 94 Eick et al., ParCo11, Ghent

21 Eick et al., ParCo11, Ghent

22 100val Dataset ( size = 3359 ) p=100, q=27, k’=10, η = 1.1, th=0.6, β = 1.6, Interestingness Function=Purity Run Time (seconds) Avg:1.327 Iterations = 12, Evaluated neighbor solutions = 5100, k = 5 CUDA version evaluate 5100 solutions in seconds  solutions in 3.95 seconds Speed up = Time(CPU) / Time(GPU) 63x speed up compares to sequential version 1.62x speed up compares to 48 threads OpenMP OpenMP #threadsSequential Task-levelTime(sec) Iterations = 14, Evaluated neighbor solutions = 15200, k = 5, Fitness = vs.

23 Earthquake Dataset ( size = ) p=50, q=12, k’=100, η =2, th=1.2, β = 1.4, Interestingness Function=Variance High Run Time (seconds) Avg: Iterations = 158, Evaluated neighbor solutions = 28,900, k = 92 OpenMP #threads Sequential Task-levelTime(hours) Iterations = 216, Evaluated neighbor solutions = 21950, k = 115 CUDA version evaluate solutions in seconds  solutions in seconds Speed up = Time(CPU) / Time(GPU) 6119x speed up compares to sequential version 202x speed up compares to 48 threads OpenMP vs. Eick et al., ParCo11, Ghent

The representatives are read frequently in the computation that assigns objects to clusters. The results presented earlier cached the representatives into the shared memory for a faster access. The following table compares the performances between CLEVER with and without caching the representatives on the earthquake data set. The data size of the representatives being cached is 2MB The result shows that caching the representatives has very little improvement on the runtime (0.09%) based on the Earthquake Dataset ( size = ) p=50, q=12, k’=100, η =2, th=1.2, β = 1.4, Interestingness Function=Variance High Run Time (seconds) Cache Avg: No-cache Avg: Iterations = 158, Evaluated neighbor solutions = 28,900, k = Eick et al., ParCo11, Ghent

The OpenMP version uses a object oriented programming (OOP) design inherited from its original implementation but the redesigned CUDA version is more a procedural programming implementation. CUDA hardware has higher bandwidth which contributed to the speedup a little Caching contributes little of the speedup (we already analyzed that) 25 Eick et al., ParCo11, Ghent

26 CUDA and OpenMP results indicate good scalability parallel algorithm using multi-core processors—computations which take days can now be performed in minutes/hours. OpenMP Easy to implement Good Speed up Limited by the number of cores and the amount of RAM CUDA GPU Extra attentions needed for CUDA programming Lower level of programming: registers, cache memory… GPU memory hierarchy is different from CPU Only support for some data structures; Synchronization between threads in blocks is not possible Super speed up, some of which are still subject of investigation Eick et al., ParCo11, Ghent

More work on the CUDA version Conduct more experiments which explain what works well and which doesn’t and why it does/does not work well Analyze impact of the capability to search many more solutions on solution quality in more depth. Implement a version of CLEVER which conducts multiple randomized hill climbing searches in parallel and which employs dynamic load balancing  more resources are allocated to the “more promising” searches Reuse code for speeding up other data mining algorithms which uses randomized hill climbing. 27 Eick et al., ParCo11, Ghent