Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview *Unified Parallel C is an extension to ANSI C. *UPC is a global address space language for parallel programming. * UPC extends C by providing.

Similar presentations


Presentation on theme: "Overview *Unified Parallel C is an extension to ANSI C. *UPC is a global address space language for parallel programming. * UPC extends C by providing."— Presentation transcript:

1 Overview *Unified Parallel C is an extension to ANSI C. *UPC is a global address space language for parallel programming. * UPC extends C by providing shared arrays, data affinity to processors, a parallel loop construct, locks and split-barrier synchronization primitives. *The first UPC compiler was written for the Cray T3E. *UPC compilers are now available for AlphaServer and SGI platforms.

2 Example UPC Program a[0] = 0Shared Local Memory Layout Thread 0Thread 1 shared int a[THREADS]; shared int b; void main(void) { if(MYTHREAD == 0) { a[0] = 4; a[1] = 2; } upc_barrier; } shared int a[THREADS]; shared int b; void main(void) { upc_barrier; if(MYTHREAD == 1) { b = a[0]; } a[1] = 0 b = 0 a[0] = 4Shared Local a[1] = 2 b = 4

3 The Big Picture C Compiler UPC Code MuPC RTS Object Code MPI Library UPC Executable Code UPC Intermediate code in C EDG UPC to C Translator

4 The Run Time System Interface The run time system interface is divided into six parts. *Initialization and finalization *Gets and put to implement one-sided remote references. *Synchronization functions to implement the UPC builtins barrier, notify and wait *Locks to implement upc_lock, upc_unlock and upc_lockattempt *Dynamic memory allocation functions to implement upc_local_alloc, upc_global_alloc and upc_all_alloc *String functions to implement upc_memcpy, upc_memget, upc_memset and upc_memput

5 MuPC *MuPC is Michigan Technological University’s implementation of Compaq’s runtime system interface. *MuPC is open source. *MuPC available on Alpha Server, Sun Solaris and Linux Clusters. *MuPC is a user level implementation based on Pthreads and MPI.

6 MuPC Design *1 UPC = 2 Pthreads = 1 Unix process *The user UPC Pthread is the user’s code. *The send/recv Pthread uses MPI for interprocess communication. upc_finalize pthread_create User UPC Pthread User UPC Pthread User UPC Pthread Send Recv Pthread Send Recv Pthread Send Recv Pthread mupcrun -n 3 a.out

7 Ping-Pong Test Performance Sun Enterprise 4500 AlphaServer 2GHz Intel Processors, (Gigabit ethernet) MuPC: 75  sSun MPI: 7  s MuPC: 55  sElan MPI: 40  s MuPC: 63  s LAM MPI: 37  s Time

8 Matrix Multiplication (na ï ve) shared[P] int a[N][P]; shared int b[P][M]; shared[M] int c[N][M]; forall(i=0;i<N;i++;&a[i][0]){ for(j=0;j<M;j++){ sum=0; for(k=0;k<P;k++) sum+=a[i][k]*b[k][j]; c[i][j]=sum; } 1 2 4 8 16 16x2x2GHz Intel processors, Gigabit ethernet Total problem size: 128x128 integer

9 Matrix Multiplication (with prefetching) int local_a[P]; forall(j=0;j<M;j++;&b[0][j]){ for(i=0;i<N;i++){ upc_memget(local_a,a[i], P*sizeof(int)); sum=0; for(k=0;k<P;k++) sum+=local_a[k]*b[k][j]; c[i][j]=sum; } 1 2 4 8 16 16x2x2GHz Intel processors, Gigabit ethernet Total problem size: 128x128 integer

10 Matrix Multiplication (prefetching + local pointer) int local_a[P]; int *pb; int stride=M/THREADS; forall(j=0;j<M;j++;&b[0][j]){ for(i=0;i<N;i++){ pb=(int*)&b[0][j]; upc_memget(local_a,a[i], P*sizeof(int)); sum=0; for(k=0,s=0;k<P;k++, s+=stride) sum+=local_a[k]*pb[s]; c[i][j]=sum; } 1 2 4 8 16 16x2x2GHz Intel processors, Gigabit ethernet Total problem size: 128x128 integer


Download ppt "Overview *Unified Parallel C is an extension to ANSI C. *UPC is a global address space language for parallel programming. * UPC extends C by providing."

Similar presentations


Ads by Google