Download presentation
Presentation is loading. Please wait.
Published byΕυτέρπη Βούλγαρης Modified over 5 years ago
1
Parallel k-means++ for Multiple Shared-Memory Architectures
Patrick Mackey Pacific Northwest National Laboratory Robert R. Lewis Washington State University ICPP 2016
2
This Paper Describes the approaches for parallelizing k-means++ on three distinct hardware architectures. OpenMP: shared-memory multiple multi-core processors. Cray XMT: massively multi-threaded architecture. high performance GPU.
3
k-means++ A method that improves the quality of k- means clustering.
Selecting a set of initial seeds that would on average provide better clustering than random selection. Uses a probabilistic approach for selecting seeds. The probability is based on the distance of a data point from all previously selected seeds.
4
Pseudocode of Serial k-means++
5
Pseudocode of Weighted_Rand_Index
6
Parallel k-means++ Parallelizing the probabilistic selection is challenging. A dependence exists between each iteration in the while loop. Simple loop parallelism will not work. Each thread is given a partition of data points, and make its own seed selection from its subset of weighted probabilities using the same basic algorithm. Produces a list of potential seed choices and their probabilities.
7
Parallel k-means++(Cont.)
Performs another weighted probability selection on the list and decides the final chosen seed.
8
Proof of Correctness Let x ∈ X be an arbitrary vector.
ppar(x): probability of selecting x in the parallel algorithm. p(x): the true probability of selecting x ∈ X by weighted probability. Theorem: Ppar(x) = P(x)
9
Proof Let X’ be the set of vectors assigned to a thread containing the vector x. Since p(X’|x) = 1.0, ppar(x) = p(x).
10
k-means++ for OpenMP
11
K-means++ for Massively Multithreaded Architecture
12
Weighted_Rand() on Massively Multithreaded Architecture
13
K-means++ on GPU Implemented with Nvidia’s Thrust library for C++.
14
Prob_Reduce()
15
Scaling Performance Results
16
Platform Performance Comparison
Conduct a series of experiments with varying size of n, m, and k on different platforms. n: the number of data points. m: the dimensional size of the data. k: the number of clusters. Platforms: GPU (Nvidia Tesla C1060) OpenMP (8 cores) OpenMP (4 cores) Cray XMT (128 processors) Cray XMT (64 processors) Cray XMT (32 processors)
17
Linear Regression Linear regression model: Accuracy
Root-mean-square-error(RMSE) “The average deviation among all our platforms was just 4.4% of the average predicted time, with no platform having an RMSE greater than 11% of the mean.”
18
Comparison Visualization
“Every single platform had a range of values for n, m, and k in which it predicted to be the fastest of all our tested platforms.”
19
Summaries GPU dominated when the dimensionality of the data was small.
Cray XMT excelled when the dimensionality of the data was high or the number of data points became exceedingly large. Shared-memory multiple multi-core processors outperform the others when the data was small, or the number of clusters desired was small.
20
Summaries(Cont.) “Using a number of threads equal to the number of processors will not always be the most efficient.” A program could be implemented that selects a more optimal number of threads to run the algorithm with, with the added benefit of making more resources available for other processes.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.