Presentation is loading. Please wait.

Presentation is loading. Please wait.

YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.

Similar presentations


Presentation on theme: "YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1."— Presentation transcript:

1 YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1

2 Outline Introduction  Efficiency of data mining -> GPGPU -> k-means on GPU; Related work Method Research Plan 2

3 Efficiency of Data mining  Face the challenge of efficiency due to the increasing data Parallel data mining Fig.1 Fig.2 3

4 4 The efficiency of data mining

5 GPGPU A general-purpose and high performance parallel hardware; Supply another platform for parallelizing data mining algorithms. DRAM Cache ALU Control ALU DRAM CPU GPU Fig.3 5

6 k-means on GPU Programming on GPU  CUDA: integrated CPU+GPU, C program k-Means  Widely used in statistical data analysis, pattern recognition, etc.;  Easy to implement on CPU, suitable to implement on GPU; 6

7 Outline Introduction Related work  UV_k-Means, GPUMiner and HP_k-Means; Method Research Plan 7

8 Related work nkd MineBech on CPU HP k-Means UV k-Means GPU Miner 2 million 1002 19.361.45 2.84 61.39 4002 70.932.16 5.96 63.46 1008 39.812.48 6.07192.05 4008152.254.5316.32226.79 4 million 1002 38.742.88 5.64130.36 4002141.844.3811.94126.38 1008 79.604.9512.85383.41 4008304.469.0334.54474.83 Speed of k-Means on low dimension data, in second. NVIDIA GTX 280 GPU; Intel(R) Core(TM) i5 CPU; 8

9 Outline Introduction Related work Method and Results  k-Means (three steps)-> step 1 -> step 2 -> step 3;  Experiments; Research Plan 9

10 k-Means algorithm n data point; k centroid; Compute distanc (n i, k i ) find the closest centroid compute new centroid If centroid change? Yes No End Step 1 O(nkd) Step 2 O(nk) Step 3 O(nd) Memory Mechanism 10

11 Memory Mechanism of GPU Global Memory  Large size  Long latency Register  Small size  Short latency  User cannot control Shared memory  Medium size  Short latency  User control 11

12 k-Means on GPU Key idea  Increase the number of computing operation for each global memory access;  Adopts the method from matrix multiplication and reduction. Dimension is a key parameter  For low dimension: use register;  For high dimension: use shared memory; 12

13 k-Means on GPU For low dimension Read each data from global memory once 13

14 k-Means on GPU For high dimension Read each data from global memory once 14

15 Experiments The experiments were conducted on a PC with an NVIDIA GTX280 GPU and an Intel(R) Core(TM) i5 CPU. GTX 280 has 30 SIMD multi-processors, and each one contains eight processors and performs at 1.29 GHz. The memory of the GPU is 1GB with the peak bandwidth of 141.7 GB/sec. The CPU has four cores running at 2.67 GHz. The main memory is 8 GB with the peak bandwidth of 5.6 GB/sec. We use Visual Studio 2008 to write and compile all the source code. The version of CUDA is 2.3. We calculate the time of the application after the file I/O, in order to show the speedup effect more clearly. 15

16 Experiments On low dimension data  Compare with HP, UV and GPUMiner, the data is generated randomly Four to ten times faster than HP 16

17 Experiments On high dimension data  Compare with UV and GPUMiner, the data is from KDD 1999. Four to eight times faster than UV 17

18 Experiments Compare with CPU The results illustrate that our algorithm compares very favorably with other existing algorithms. Forty to two hundred times faster than CPU version 18

19 Outline Introduction Related work Method Research Plan 19

20 Research Plan Detail analysis about k-Means on GPU  GFLOPS  Deal with even larger data set Other data mining algorithms on GPU  K-nn  SDP (widely used in protein identification ) 20

21 Q & A Thanks very much 21


Download ppt "YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1."

Similar presentations


Ads by Google