Download presentation
Presentation is loading. Please wait.
1
1 Berkeley UPC Kathy Yelick Christian Bell, Dan Bonachea, Wei Chen, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Rajesh Nishtala, Mike Welcome LBNL and U.C. Berkeley http://upc.lbl.gov
2
Kathy YelickBerkeley UPChttp://upc.lbl.gov Berkeley UPC Compiler Status Recent Berkeley UPC release (v2.2) Support 1.2 language spec Supports collectives (tuning ongoing); memory model compliance Supports UPC I/O (naïve reference implementation) Compiler work Optimization phase and improved performance in v2.2 Work on automated communication overlap, upc_forall,… Large effort in quality assurance and robustness Test suite: 600+ tests run nightly on 20+ platform configs >30,000 UPC compilations and >20,000 UPC test runs per night Test suite infrastructure extended to support any UPC compiler now running nightly with GCC/UPC + UPCR also has been used on HP-UPC, Cray UPC
3
Kathy YelickBerkeley UPChttp://upc.lbl.gov Berkeley UPC Collaborations GCC UPC on Berkeley UPC Runtime Use for cluster (GASNet) implementations Now works with pthread runtime Source-level debugging with Totalview 7.x Joint project with Etnus General framework for source-to-source translators Future work: Cray XT3 and other Rainier/Adams port Possible BlueGene/L port XT3 and BG/L both run on MPI conduit
4
Kathy YelickBerkeley UPChttp://upc.lbl.gov Berkeley Applications & Benchmarks Some new applications FT:.45 TFlops on 512 proc Itanium/Quadrics (Elan4) CG: 30 GFlops on 512 HP Alpha/Quadrics (Elan3) LU: >2 TFlops on 512 proc Itanium/Quadrics (Elan4) Barnes-Hut: fine-grained (based on Splash) CFG: uses to Chombo More on LU Towards a Sparse direct solver (SuperLU) Currently a full (top500-compliant) HPL implementation All UPC except for call to the BLAS
5
5 End of Berkeley Status
6
6 Data Movement and Synchronization
7
Kathy YelickBerkeley UPChttp://upc.lbl.gov Motivation for Data Movement Synchronization Some are (at best) hard/slow in UPC Benchmarks highlight these FT: communication-limited, all-to-all: want to overlap MG: fill in ghost regions Remote writes are often faster than remote reads But need to synchronize: let the other proc know data is available –See Tarek and John Mellor-Crummey’s PPoPP05 paper –Signaling store in Split-C –Implementation issue: reordering LU: remotely enqueue a task GUPS and Histogram: remotely increment/XOR a value With or without atomicity
8
Kathy YelickBerkeley UPChttp://upc.lbl.gov Who Would Like to Talk? Non-Blocking Memget/put (Dan) Semaphores (Dan) Semaphore example (Tarek) Remote Atomics (Phil) Floating functions (Jason)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.