Download presentation
Presentation is loading. Please wait.
Published bySandra Young Modified over 9 years ago
1
Unified Parallel C at LBNL/UCB An Evaluation of Current High-Performance Networks Christian Bell, Dan Bonachea, Yannick Cote, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Michael Welcome, Kathy Yelick Lawrence Berkeley National Lab & U.C. Berkeley http://upc.lbl.gov
2
Unified Parallel C at LBNL/UCB Motivation Benchmark a variety of current high-speed Networks -Measure Latency and Software Overhead, not just Bandwidth -One-sided communication provides advantages vs. 2-sided MPI? Global Address Space (GAS) Languages -UPC, Titanium (Java), Co-Array Fortran -Small message performance (8 bytes) -Support sparse/irregular/adaptive programs -Programming model: incremental optimization -Overlapping messages can hide the latency
3
Unified Parallel C at LBNL/UCB Systems Evaluated SystemNetwork Bus (per sec) 1-sided hardware APIs Cray T3ECustom (330 MB) SHMEM, E-registers IBM SPSP switch 2 GXX bus (2 GB) LAPI HP AlphaServerQuadrics PCI 64/66 (532 MB) SHMEM IBM NetfinityMyrinet PCI 32/66 (266 MB) GM PC clusterGigE PCI 64/66 (532 MB) VIPL
4
Unified Parallel C at LBNL/UCB Modified LogGP Model LogGP: no overlap Observed: overheads can overlap: L can be negative P0 P1 o send L o recv P0 P1 o send o recv EEL: end to end latency (instead of transport latency L) g: minimum time between small message sends G: additional gap per byte for larger messages
5
Unified Parallel C at LBNL/UCB Microbenchmarks P0 o send gap P0 o send gap cpu P0 o send gap cpu 1)Ping-pong test: measures EEL (end-to-end latency) 2)Flood test: measures gap (g/G) 3)CPU overlap test: measures software overheads Flood Test CPU Test 1CPU Test 2
6
Unified Parallel C at LBNL/UCB Latencies for 8 byte ‘puts’
7
Unified Parallel C at LBNL/UCB 8-byte ‘put’ Latencies with Software Overheads
8
Unified Parallel C at LBNL/UCB Gap varies with msg clustering Clustering messages can both use idle cycles, and reduce the number of idle cycles that need to be filled
9
Unified Parallel C at LBNL/UCB Potential for CPU overlap during clustered message sends Hardware support for 1-way communication provides more opportunity for computational overlap
10
Unified Parallel C at LBNL/UCB Fixed message cost (g), vs. per-byte cost (G)
11
Unified Parallel C at LBNL/UCB “Large” Messages Factor of 6 between minimum sizes needed for “large” message (large = bandwidth dominates fixed message cost)
12
Unified Parallel C at LBNL/UCB Small message performance over time Software send overhead for 8-byte messages over time. Not improving much over time (even in absolute terms)
13
Unified Parallel C at LBNL/UCB Conclusion Latency and software overhead of messages varies widely among today’s HPC networks -Affects ability to effectively mask communication latency, with large effect on GAS language viability -especially software overhead--latency can be hidden These parameters have historically been overlooked in benchmarks and vendor evaluations -Hopefully this will change -Recent discussions with vendors promising -Incorporation into standard benchmarks would be nice… http://upc.lbl.gov
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.