Download presentation
Presentation is loading. Please wait.
1
Overview *Unified Parallel C is an extension to ANSI C. *UPC is a global address space language for parallel programming. * UPC extends C by providing shared arrays, data affinity to processors, a parallel loop construct, locks and split-barrier synchronization primitives. *The first UPC compiler was written for the Cray T3E. *UPC compilers are now available for AlphaServer and SGI platforms.
2
Example UPC Program a[0] = 0Shared Local Memory Layout Thread 0Thread 1 shared int a[THREADS]; shared int b; void main(void) { if(MYTHREAD == 0) { a[0] = 4; a[1] = 2; } upc_barrier; } shared int a[THREADS]; shared int b; void main(void) { upc_barrier; if(MYTHREAD == 1) { b = a[0]; } a[1] = 0 b = 0 a[0] = 4Shared Local a[1] = 2 b = 4
3
The Big Picture C Compiler UPC Code MuPC RTS Object Code MPI Library UPC Executable Code UPC Intermediate code in C EDG UPC to C Translator
4
The Run Time System Interface The run time system interface is divided into six parts. *Initialization and finalization *Gets and put to implement one-sided remote references. *Synchronization functions to implement the UPC builtins barrier, notify and wait *Locks to implement upc_lock, upc_unlock and upc_lockattempt *Dynamic memory allocation functions to implement upc_local_alloc, upc_global_alloc and upc_all_alloc *String functions to implement upc_memcpy, upc_memget, upc_memset and upc_memput
5
MuPC *MuPC is Michigan Technological University’s implementation of Compaq’s runtime system interface. *MuPC is open source. *MuPC available on Alpha Server, Sun Solaris and Linux Clusters. *MuPC is a user level implementation based on Pthreads and MPI.
6
MuPC Design *1 UPC = 2 Pthreads = 1 Unix process *The user UPC Pthread is the user’s code. *The send/recv Pthread uses MPI for interprocess communication. upc_finalize pthread_create User UPC Pthread User UPC Pthread User UPC Pthread Send Recv Pthread Send Recv Pthread Send Recv Pthread mupcrun -n 3 a.out
7
Ping-Pong Test Performance Sun Enterprise 4500 AlphaServer 2GHz Intel Processors, (Gigabit ethernet) MuPC: 75 sSun MPI: 7 s MuPC: 55 sElan MPI: 40 s MuPC: 63 s LAM MPI: 37 s Time
8
Matrix Multiplication (na ï ve) shared[P] int a[N][P]; shared int b[P][M]; shared[M] int c[N][M]; forall(i=0;i<N;i++;&a[i][0]){ for(j=0;j<M;j++){ sum=0; for(k=0;k<P;k++) sum+=a[i][k]*b[k][j]; c[i][j]=sum; } 1 2 4 8 16 16x2x2GHz Intel processors, Gigabit ethernet Total problem size: 128x128 integer
9
Matrix Multiplication (with prefetching) int local_a[P]; forall(j=0;j<M;j++;&b[0][j]){ for(i=0;i<N;i++){ upc_memget(local_a,a[i], P*sizeof(int)); sum=0; for(k=0;k<P;k++) sum+=local_a[k]*b[k][j]; c[i][j]=sum; } 1 2 4 8 16 16x2x2GHz Intel processors, Gigabit ethernet Total problem size: 128x128 integer
10
Matrix Multiplication (prefetching + local pointer) int local_a[P]; int *pb; int stride=M/THREADS; forall(j=0;j<M;j++;&b[0][j]){ for(i=0;i<N;i++){ pb=(int*)&b[0][j]; upc_memget(local_a,a[i], P*sizeof(int)); sum=0; for(k=0,s=0;k<P;k++, s+=stride) sum+=local_a[k]*pb[s]; c[i][j]=sum; } 1 2 4 8 16 16x2x2GHz Intel processors, Gigabit ethernet Total problem size: 128x128 integer
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.