Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clusters of Computational Accelerators

Similar presentations


Presentation on theme: "Clusters of Computational Accelerators"— Presentation transcript:

1 Clusters of Computational Accelerators
Jan Prins UNC-Chapel Hill

2 Topics Similarity of accelerator architectures
proof-of-concept kernels for high performance applications New application areas for accelerator architectures

3 Accelerator architectures
Existing commodity accelerators Sony/Toshiba/IBM Cell BE Nvidia G80 GPU compute-unified device architecture (CUDA) ATI R600 (almost) Related developments Intel demonstrates 80-core TFlop chip multicore projects March of progress: next generation GPUs Roadrunner to be based on 2nd gen Cell

4 Cell BE and Nvidia G80 Cell BE GeForce 8800 GTX
similarities: 8 cores, local store, vectors/Simd (4 vs 16), high speed device memory differences: Cell integrated PPC, EIB. G80 local caching, extensive multithreading Cell BE GeForce 8800 GTX

5 Programming for the memory hierarchy
Local Memory Global address space Cache ALU Regs length – latency, thickness – bandwidth, aspect ratio – size of transfers simple parallel memory hierarchy (PMH) simple uniprocessor memory hierarchy (UMH)

6 Accelerator memory hierarchy
Device Memory Parallelism Vector / SIMD multithreading multiprocessing Local Store Local Store Vector elts Vector elts Vector elts Vector elts

7 Programming accelerators
Package inherent parallelism available in problem to provide the concurrency and parallel slack needed at every level of PMH serialize where needed to reach appropriate level of reuse Programming models explicit notion of locality CUDA UPC

8 Clusters of Accelerators
Scale PMH Peak Perf Cost Rack Global Address Space 20TF $250K Node Local 400GF $4K CPU L2/L3 core L1 Accelerator Device 200GF $1K SIMD Vector

9 Proof of concept kernels
Demonstrating performance of accelerator clusters challenge is towards the bottom of the parallel memory hierarchy proof-of-concept kernels can establish viability and scaling Example n-body kernels demonstrated to achieve strong performance on Cell and G80 Consequence Folding at home clients developed for Playstation and PCs with high-end ATI GPU. Full GROMACS acceleration on Cell, NAMD acceleration on G80 underway

10 New application domains
Database and datamining operations Stream mining

11 Stream mining applications
Sampling Aggregation Summarization Clustering dimensionality reduction PCA, SVD subspace clustering Classification Anomaly Detection

12 Challenges Continuous data flow Limited storage space
Limited communication bandwidth through hierarchy Detecting and modeling changes Visualization

13 Conclusions Techniques to effectively exploit accelerator clusters are relatively independent of particular choice of accelerator Application demonstrations can follow spiral development model focusing on implementation of key kernels Data mining and stream mining are important application areas that may be well served by accelerator architectures


Download ppt "Clusters of Computational Accelerators"

Similar presentations


Ads by Google