Download presentation
Presentation is loading. Please wait.
1
October 17, 20011 MuPC Run Time System for UPC Steve Seidel, Phil Merkey Jeevan Savant, Kian Gap Lee Department of Computer Science Michigan Technological University Brian Wibecan, Program PI Phil Becker, Program Manager Kevin Harris, Bruce Trull, and Daniel Christians Compaq UPC Development
2
October 17, 20012 UPC designed by Carlson et al. A “light weight” extension of C for parallelism A shared memory, multithreaded model Arrays and pointers can be shared Array distribution is semi-automatic Remote references are automatically resolved Parallel constructs include – forall – fence and split barrier Built-ins for – memory allocation/free – locks
3
October 17, 20013 Compaq's UPC compiler UPC object code – front end translates UPC source to EDG IL – lowering phase converts UPC-specifics to standard EDG IL – middle end converts EDG IL to GEM-compatible IL – GEM back end converts GEM IL to alpha object code Each of the intermediate phases above has some UPC-specific components. Alternative:“Bail out" after lowering phase to produce C code that includes calls to a run time system. Under discussion: EDG front end for UPC
4
October 17, 20014 Run Time System Interface The RTS interface is an evolving set of data objects and methods that captures the semantics of “UPC minus C”. An RTS "reference implementation" was suggested by Harris. A publicly available reference implementation will – promote UPC code base, user base and platform base – challenge MPI and OpenMP – foster RTS evolution – promote support for UPC tools MuPC is MTU's run time system for UPC
5
October 17, 20015 Run Time System Structure Run time structures describing shared objects and globals are maintained. References to nonlocal shared objects are made through get and put. UPC barrier ’s and fence ’s are passed directly to the RTS. The same is true of UPC calls to other built-in functions that provide locks and dynamic memory allocation.
6
October 17, 20016 Available compiler technology Proprietary Compaq compiler supports a proprietary RTS. Reference compiler is not currently available, but... Compaq will provide a compiler that supports the reference RTS.
7
October 17, 20017 MuPC Design Goals Public availability Wide platform base Open source maintained by MTU User-level implementation Quick delivery Efficiency is not a primary goal
8
October 17, 20018 Available Platforms MTU (on site): – Beowulf cluster (64 nodes) – Sun Enterprise 4500 (12 processors) – SGI Origin 2000 (4 processors) – Sun workstation networks (various) – Linux workstation networks (various) – AlphaServer and 2 workstations (provided by Compaq) Remote: – AlphaServer SC (Compaq) – T3E (Cray)
9
October 17, 20019 Transport vehicle selection Candidates – MPIno one-sided communication – MPI-2incomplete implementations – Pthreadsno multiprocessor support – OpenMPexpensive, possibly incompatible – shmemlimited platform base – VIAlimited platform base – ARMCIlimited user base – TCP/IPtoo low-level Selection criteria – Portability and availability: MPI, Pthreads, TCP/IP – Technical shortcomings can be overcome
10
October 17, 200110 MPI/Pthreads hybrid transport vehicle MPI provides process control and interprocessor communication. Pthreads provides multithreading within each process to handle asynchronous remote accesses. The following are equivalent in MuPC: – one MPI process – one UPC thread (from the user’s point of view) – one user Pthread + one MPI send/recv Pthread Thread safety is provided by isolating all MPI calls in the send/recv Pthread.
11
October 17, 200111 upcrun -np 3 upc-demo MPI_init pthread_create MPI_init pthread_create MPI_init pthread_create user UPC thrd send/ recv thrd user UPC thrd send/ recv thrd user UPC thrd send/ recv thrd upc_finalize
12
October 17, 200112 Example: Nonlocal array reference x=a[k]; // User shared array shared int a[10][THREADS]; // Frontend-generated temporary pointer shared int *UPC_RTS_ptr;... // UPC source code: // x=a[k]; // Front end computes address, // phase and thread of remote reference. UPC_RTS_ptr = (vaddr,phase,thread); // Call is made to get a[k] x = MuPC_get_sync_int(UPC_RTS_ptr);
13
October 17, 200113 x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while
14
October 17, 200114 x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while
15
October 17, 200115 x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while
16
October 17, 200116 x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while
17
October 17, 200117 x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while
18
October 17, 200118 Synthetic Testing Pseudo-code walkthroughs of all MuPC functions Synthetic test codes are C/MPI programs that call MuPC RTS routines directly. Shared data is artificially allocated. // THREAD 0 int a[10];... // a[12]=42; index=12%10; thread=12/10; MuPC_put_integer(a,index,thread,42);... // THREAD 1 int a[10];... // outcome is // a[2]=42;...
19
October 17, 200119 Integration Testing Wrap get ’s, put ’s and notify / wait to conform to the RTS interface. Integrate MuPC with front end... –... data structures and globals –... initialization and finalization Rewrite synthetic tests in UPC and compare to previous results. Add built-in functions for – locks – memory allocation
20
October 17, 200120 Full-scale Testing MTU test kernels GWU UPC test suite Contributed UPC codes
21
October 17, 200121 Documentation, Delivery, and Distribution MuPC source Front-end binaries for targeted platforms Makefiles, release notes, etc. Serve these items from MTU MuPC web site Publish a description of MuPC
22
October 17, 200122 Preliminary Work, Summer, 2001 RTS header files provided by Compaq MPI-2 one-sided communication proposed as primary transport vehicle but current implementations do not meet full standard MPI/Pthreads hybrid selected Studied intermediate output of Compaq's UPC front end Compaq hardware and software delivered Single-threaded working environment verified Accounts on AlphaServer SC also provided
23
October 17, 200123 August 20-21, Nashua Participants: – Bill Carlson, Brian Wibecan, Kevin Harris, Phil Becker, Daniel Christians, Jim Bovay, Savant, Merkey, Seidel Discussed RTS definition and UPC features per Wibecan's agenda. Outcomes: – MPI/Pthreads hybrid design feasible – MuPC will include upccc and upcrun MPI wrappers – Agreed on RTS and UPC feature interpretations – MuPC efficiency and performance not highest priority – Written meeting summary submitted to Compaq (Sept. 23, 2001) (Sept. 23, 2001)
24
October 17, 200124 Current Work Recent improvements: – isolating MPI calls for thread safety – send/recv threads yield control when there are no pending requests Skeleton implementations of get/put, barrier, fence, and finalize have been scaled to over 30 nodes on MTU’s Beowulf cluster.
25
October 17, 200125 Project Work Plan: Start date June 28, 2001 This plan is based on the Project Work Items specified in the March 27 RFP from Compaq and on the March 30 MTU Proposal.
26
October 17, 200126 Completed Work Items (per MTU proposal) 1(a): Review implementation methodologies (b): Identify development platforms (b): Identify development platforms (c): Align resources (staff and platforms) (c): Align resources (staff and platforms) (d): Identify target platforms (d): Identify target platforms (e): Conclusion memo (sent 9/23/1) (e): Conclusion memo (sent 9/23/1) 2: Formal Work Plan and Agreement – (Written version of this document) 4: Initial Design of Run Time System – Design presented in Nashua on August 20, 2001
27
October 17, 200127 Remaining Work Items (w/completion dates) 5: Development of remaining primary components (Jan. 1, 2002) – (d) locks – (e) complete gets and puts – (b) memory allocation – (f) utility functions 3: Test design and documentation (Feb. 1, 2002) – This testing will be done concurrent with Item 5 above. – (a) Synthetic testing – (b) Integration testing – (c) Full-scale testing
28
October 17, 200128 6: Public Interface development (April 1, 2002) – (a) Makefiles, release notes, installation notes, etc. – (b) Bundle all necessary software – (c) Provide MTU-authored test codes and results – (d) Release advance copies for review and comment 7: System Refinement and Delivery (June 1, 2002) – (a) Release MuPC to the UPC Developers' Group – (b) Maintain MuPC website at MTU – (c) Publish description of MuPC 8: Completion Certification (June 28, 2002) – (a) Final MuPC release by MTU
29
October 17, 200129 MuPC Project Staff Jeevan Savant, M.S. Graduate Student – MuPC design and implementation – (Items 5(b,d,e,f), 6(a,d), and 7(a,c) above) – Support: 9 months, half-time Kian Gap (Mark) Lee, M.S. Graduate Student – MuPC testing and platform integration – (Items 3(a,b,c), 6(b,c), 7(b,c) above) – Support: 9 months, half-time Phillip Merkey, Research Assistant Professor Steven Seidel, Associate Professor
30
October 17, 200130 Additional MTU UPC projects Charles Wallace, Assistant Professor – UPC Memory models Xiaodi (Lisa) Li, M.S. Graduate Student – Benchmarking MuPC using one or two NAS parallel benchmarks Yi (Leon) Liang, M.S. Graduate Student – Pthreads-only MuPC RTS Yongsheng Huang, M.S. Graduate Student – UPC memory models, improving MuPC efficiency Zhang Zhang, Ph.D. Graduate Student – UPC memory models, improving MuPC efficiency
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.