Download presentation
Presentation is loading. Please wait.
Published byAmos Russell Modified over 9 years ago
1
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07
2
2 Wei ChenUPC talk Parallel Programming Most parallel programs are written using either: Message passing with a SPMD model Usually for scientific applications with C++/Fortran Scales easily: user controlled data layout Hard to use: send/receive matching, message packing/unpacking Shared memory with OpenMP/pthreads/Java Usually for non-scientific applications Easier to program: direct reads and writes to shared data Hard to scale: (mostly) limited to SMPs, no concept of locality PGAS: an alternative hybrid model
3
3 Wei ChenUPC talk Partitioned Global Address Space PGAS model uses global address space abstraction Shared memory is partitioned by processors User controlled data layout (global pointers and distributed arrays) One-sided communication: Use RDMA support for reads/writes of shared variables Much faster than message passing for small/medium size messages Hybrid model works for both SMPs and clusters Languages: Titanium, Co-Array Fortran, UPC Shared Global address space X[0] Private ptr: X[1]X[P]
4
4 Wei ChenUPC talk Unified Parallel C A SPMD parallel extension of C PGAS: add shared qualifier to type system Several kinds of shared array distributions Fine-grained and bulk communication Commercial compilers with Cray/HP/IBM Open source compilers with Berkeley UPC Vector Addition in UPC #define N 100*THREADS shared int v1[N], v2[N], sum[N]; //cyclic layout void main() { for(int i=0; i<N; i++) if (MYTHREAD == i%THREADS) //SPMD sum[i]=v1[i]+v2[i]; }
5
5 Wei ChenUPC talk Overview of the Berkeley UPC Compiler Translator UPC Code Translator Generated C Code Berkeley UPC Runtime System GASNet Communication System Network Hardware Platform- independent Network- independent Compiler- independent Language- independent Two Goals: Portability and High-Performance Lower UPC code into ISO C code Shared Memory Management and pointer operations Uniform get/put interface for underlying networks
6
6 Wei ChenUPC talk UPC to C Translator Preprocessed UPC Source WHIRL with shared types WHIRL with runtime calls ISO C code Parsing Optimized WHIRL Lowering WHIRL2C Lowering Backend C compiler Optimizer Based on Open64 Extend with shared type Reuse analysis framework Add UPC specific optimizations Portable translation High level IR Config file for platform dependent information Reinclude library headers Convert shared memory operations into runtime calls
7
7 Wei ChenUPC talk Optimization framework Combination of language/compiler/runtime support Transparent to the user Performance portable Short term goal: effective on different cluster networks. Long term goal: code designed for SMP get good performance on clusters Optimize regular array accesses Optimize irregular pointer accesses Nonblocking bulk communication Loop framework for message vectorization, strip mining PRE framework with split-phase access and coalescing Runtime framework for communication overlap A[i][j][k] p->x->yupc_memget(dst, src, size)
8
8 Wei ChenUPC talk Application Performance – LU Decomposition UPC performance comparable to MPI/HPL(Linpack) with < ½ the code size Uses light-weight multi-threading atop SPMD latency tolerant Highly adaptable to different problem and machine sizes
9
9 Wei ChenUPC talk Application Performance – 3D FFT One-sided UPC approach sends more, smaller messages Same total volume of data, but send earlier and more often Aggressively overlaps the transpose with the 2nd 1-D FFT Same approach is less effective in MPI due to higher per-message cost Consistently outperforms MPI-based implementations – by as much as 2X MFLOPS / Proc up is good
10
10 Wei ChenUPC talk Current Status Public release v2.4 in November 2006 Fully compliant with UPC 1.2 specification Communication optimizations Extensions for performance and programmability Support from laptops to supercomputers OS: UNIX (Linux, BSD, AIX, Solaris, etc), Mac, Cygwin Arch: x86, Itanium, Opteron, Alpha, PPC, SPARC, Cray X1, NEC SX-6, Blue Gene, etc. Network: SMP, Myrinet, Quadrics, Infiniband, IBM LAPI, MPI, Ethernet, SHMEM, etc. Give us a try at http://upc.lbl.gov
11
11 Wei ChenUPC talk Summary UPC designed to be consistent with C Expose memory layout Flexible communication with pointers and arrays Give users more control to achieve high performance Berkeley UPC compiler provides an open-source and portable implementation Hand optimized UPC programs match and often beat MPI’s performance Research goal: productive user + efficient compiler
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.