Download presentation
Presentation is loading. Please wait.
Published byOscar Ross Modified over 9 years ago
1
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th 2010
2
Data Clustering
3
Data Clustering Cont.
4
Example
5
Flow Cytometry
6
Flow Cytometry Cont.
7
Flow Cytometry Data Sets Size of the data, motivation for GPUs / parallel processing
8
Parallel Computing
9
Trend toward multi-core, many-core architectures
10
GPU Architecture Trends
11
Tesla GPU Architecture
12
GPGPU
13
CUDA Software Stack
14
CUDA Programming Model
15
CUDA Kernel Grids / Blocks /Threads
16
CUDA Memory
17
CUDA Program Flow
18
C-means
19
C-means Parallel Implementation
21
EM with a Gaussian mixture model
22
EM Parallel Implementation
24
Performance Tuning Global Memory Coalescing – 1.0/1.1 vs 1.2/1.3 devices
25
Performance Tuning Partition Camping
26
Performance Tuning CUBLAS
27
Multi-GPU Strategy 3 Tier Parallel hierarchy – MPI, OpenMP, CUDA
28
Multi-GPU Strategy MapReduce-style data distribution and reduction
29
Multi-GPU Implementation Very little impact on GPU kernel implementations, just their inputs / grid dimensions Discuss host-code changes
30
Data Distribution Asynchronous MPI sends from host instead of each node reading input file from data store
31
Results - Kernels Speedup figures
32
Results - Kernels Speedup figures
33
Results – Overhead Time-breakdown for I/O, GPU memcpy, etc
34
Multi-GPU Results Amdahl’s Law vs. Gustafson’s Law – i.e. Strong vs. Weak Scaling – i.e. Fixed Problem Size vs. Fixed-Time – i.e. True Speedup vs. Scaled Speedup
35
Fixed Problem Size Analysis
36
Time-Constrained Analysis
37
Conclusions
39
Future Work
40
Questions?
41
References
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.