October 17, 20011 MuPC Run Time System for UPC Steve Seidel, Phil Merkey Jeevan Savant, Kian Gap Lee Department of Computer Science Michigan Technological.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

MPI Message Passing Interface
2003 Michigan Technological University March 19, Steven Seidel Department of Computer Science Michigan Technological University
Introductions to Parallel Programming Using OpenMP
Introduction to the Partitioned Global Address Space (PGAS) Programming Model David E. Hudak, Ph.D. Program Director for HPC Engineering
Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Computer Systems/Operating Systems - Class 8
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Distributed Processing, Client/Server, and Clusters
Chapter 5 Processes and Threads Copyright © 2008.
1 Presentation at the 4 th PMEO-PDS Workshop Benchmark Measurements of Current UPC Platforms Zhang Zhang and Steve Seidel Michigan Technological University.
 2006 Michigan Technological University CS /15/6 1 Shared Memory Programming for Large Scale Machines C. Barton 1, C. Cascaval 2, G. Almasi 2,
1 CS 501 Spring 2003 CS 501: Software Engineering Lecture 2 Software Processes.
Overview *Unified Parallel C is an extension to ANSI C. *UPC is a global address space language for parallel programming. * UPC extends C by providing.
Transparent Process Migration for Distributed Applications in a Beowulf Cluster Mark Claypool and David Finkel Computer Science Department Worcester Polytechnic.
CS 501: Software Engineering
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
DECISION SUPPORT SYSTEM DEVELOPMENT
PRASHANTHI NARAYAN NETTEM.
UPC at CRD/LBNL Kathy Yelick Dan Bonachea, Jason Duell, Paul Hargrove, Parry Husbands, Costin Iancu, Mike Welcome, Christian Bell.
Types of software. Sonam Dema..
Unified Parallel C at LBNL/UCB FT Benchmark in UPC Christian Bell and Rajesh Nishtala.
Hossein Bastan Isfahan University of Technology 1/23.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Distributed Shared Memory Systems and Programming
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Kittiphan Techakittiroj (19/09/58 09:28 น. 19/09/58 09:28 น. 19/09/58 09:28 น.) Operating Systems: OS for Client and Server Kittiphan Techakittiroj
ITEC224 Database Programming
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen the LBNL/Berkeley UPC Group.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Software Engineering Management Lecture 1 The Software Process.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Co-Array Fortran Open-source compilers and tools for scalable global address space computing John Mellor-Crummey Rice University.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Windows NT Operating System. Windows NT Models Layered Model Client/Server Model Object Model Symmetric Multiprocessing.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
1 CS 501 Spring 2004 CS 501: Software Engineering Lecture 2 Software Processes.
UPC Performance Tool Interface Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant.
August 2001 Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS) Dan Schaffer NOAA Forecast Systems Laboratory (FSL)
A Multi-platform Co-array Fortran Compiler for High-Performance Computing John Mellor-Crummey, Yuri Dotsenko, Cristian Coarfa {johnmc, dotsenko,
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
OpenMP for Networks of SMPs Y. Charlie Hu, Honghui Lu, Alan L. Cox, Willy Zwaenepoel ECE1747 – Parallel Programming Vicky Tsang.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 1.
Software Systems Division (TEC-SW) ASSERT process & toolchain Maxime Perrotin, ESA.
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
Pony – The occam-π Network Environment A Unified Model for Inter- and Intra-processor Concurrency Mario Schweigler Computing Laboratory, University of.
Unified Parallel C at LBNL/UCB Berkeley UPC Runtime Report Jason Duell LBNL September 9, 2004.
Parallel Performance Wizard: A Generalized Performance Analysis Tool Hung-Hsun Su, Max Billingsley III, Seth Koehler, John Curreri, Alan D. George PPW.
Configuration & Build Management. Why Software Configuration Management ? The problem: Multiple people have to work on software that is changing More.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
A Parallel Communication Infrastructure for STAPL
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
University of Technology
Chapter 4: Threads.
Chapter 2: The Linux System Part 1
Overview of Workflows: Why Use Them?
Presentation transcript:

October 17, MuPC Run Time System for UPC Steve Seidel, Phil Merkey Jeevan Savant, Kian Gap Lee Department of Computer Science Michigan Technological University Brian Wibecan, Program PI Phil Becker, Program Manager Kevin Harris, Bruce Trull, and Daniel Christians Compaq UPC Development

October 17, UPC designed by Carlson et al. A “light weight” extension of C for parallelism A shared memory, multithreaded model Arrays and pointers can be shared Array distribution is semi-automatic Remote references are automatically resolved Parallel constructs include – forall – fence and split barrier Built-ins for – memory allocation/free – locks

October 17, Compaq's UPC compiler UPC object code – front end translates UPC source to EDG IL – lowering phase converts UPC-specifics to standard EDG IL – middle end converts EDG IL to GEM-compatible IL – GEM back end converts GEM IL to alpha object code Each of the intermediate phases above has some UPC-specific components. Alternative:“Bail out" after lowering phase to produce C code that includes calls to a run time system. Under discussion: EDG front end for UPC

October 17, Run Time System Interface The RTS interface is an evolving set of data objects and methods that captures the semantics of “UPC minus C”. An RTS "reference implementation" was suggested by Harris. A publicly available reference implementation will – promote UPC code base, user base and platform base – challenge MPI and OpenMP – foster RTS evolution – promote support for UPC tools MuPC is MTU's run time system for UPC

October 17, Run Time System Structure Run time structures describing shared objects and globals are maintained. References to nonlocal shared objects are made through get and put. UPC barrier ’s and fence ’s are passed directly to the RTS. The same is true of UPC calls to other built-in functions that provide locks and dynamic memory allocation.

October 17, Available compiler technology Proprietary Compaq compiler supports a proprietary RTS. Reference compiler is not currently available, but... Compaq will provide a compiler that supports the reference RTS.

October 17, MuPC Design Goals Public availability Wide platform base Open source maintained by MTU User-level implementation Quick delivery Efficiency is not a primary goal

October 17, Available Platforms MTU (on site): – Beowulf cluster (64 nodes) – Sun Enterprise 4500 (12 processors) – SGI Origin 2000 (4 processors) – Sun workstation networks (various) – Linux workstation networks (various) – AlphaServer and 2 workstations (provided by Compaq) Remote: – AlphaServer SC (Compaq) – T3E (Cray)

October 17, Transport vehicle selection Candidates – MPIno one-sided communication – MPI-2incomplete implementations – Pthreadsno multiprocessor support – OpenMPexpensive, possibly incompatible – shmemlimited platform base – VIAlimited platform base – ARMCIlimited user base – TCP/IPtoo low-level Selection criteria – Portability and availability: MPI, Pthreads, TCP/IP – Technical shortcomings can be overcome

October 17, MPI/Pthreads hybrid transport vehicle MPI provides process control and interprocessor communication. Pthreads provides multithreading within each process to handle asynchronous remote accesses. The following are equivalent in MuPC: – one MPI process – one UPC thread (from the user’s point of view) – one user Pthread + one MPI send/recv Pthread Thread safety is provided by isolating all MPI calls in the send/recv Pthread.

October 17, upcrun -np 3 upc-demo MPI_init pthread_create MPI_init pthread_create MPI_init pthread_create user UPC thrd send/ recv thrd user UPC thrd send/ recv thrd user UPC thrd send/ recv thrd upc_finalize

October 17, Example: Nonlocal array reference x=a[k]; // User shared array shared int a[10][THREADS]; // Frontend-generated temporary pointer shared int *UPC_RTS_ptr;... // UPC source code: // x=a[k]; // Front end computes address, // phase and thread of remote reference. UPC_RTS_ptr = (vaddr,phase,thread); // Call is made to get a[k] x = MuPC_get_sync_int(UPC_RTS_ptr);

October 17, x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while

October 17, x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while

October 17, x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while

October 17, x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while

October 17, x = MuPC_get_sync_int(UPC_RTS_ptr p); MuPC_get_sync_int Send/Recv Thr Send/Recv Thr send_lock.type=GET send_lock.ptr=p wait on recv_lock.done x=recv_lock.data Pthread lock structs: send_lock recv_lock while (threads) case... GET: MPI_Send(p,RECV)... REPLY: MPI_Recv(y) recv_lock.data=y recv_lock.done=T... end while while (threads) case... RECV: MPI_Recv(p) MPI_Send(*p,REPLY)... end while

October 17, Synthetic Testing Pseudo-code walkthroughs of all MuPC functions Synthetic test codes are C/MPI programs that call MuPC RTS routines directly. Shared data is artificially allocated. // THREAD 0 int a[10];... // a[12]=42; index=12%10; thread=12/10; MuPC_put_integer(a,index,thread,42);... // THREAD 1 int a[10];... // outcome is // a[2]=42;...

October 17, Integration Testing Wrap get ’s, put ’s and notify / wait to conform to the RTS interface. Integrate MuPC with front end... –... data structures and globals –... initialization and finalization Rewrite synthetic tests in UPC and compare to previous results. Add built-in functions for – locks – memory allocation

October 17, Full-scale Testing MTU test kernels GWU UPC test suite Contributed UPC codes

October 17, Documentation, Delivery, and Distribution MuPC source Front-end binaries for targeted platforms Makefiles, release notes, etc. Serve these items from MTU MuPC web site Publish a description of MuPC

October 17, Preliminary Work, Summer, 2001 RTS header files provided by Compaq MPI-2 one-sided communication proposed as primary transport vehicle but current implementations do not meet full standard MPI/Pthreads hybrid selected Studied intermediate output of Compaq's UPC front end Compaq hardware and software delivered Single-threaded working environment verified Accounts on AlphaServer SC also provided

October 17, August 20-21, Nashua Participants: – Bill Carlson, Brian Wibecan, Kevin Harris, Phil Becker, Daniel Christians, Jim Bovay, Savant, Merkey, Seidel Discussed RTS definition and UPC features per Wibecan's agenda. Outcomes: – MPI/Pthreads hybrid design feasible – MuPC will include upccc and upcrun MPI wrappers – Agreed on RTS and UPC feature interpretations – MuPC efficiency and performance not highest priority – Written meeting summary submitted to Compaq (Sept. 23, 2001) (Sept. 23, 2001)

October 17, Current Work Recent improvements: – isolating MPI calls for thread safety – send/recv threads yield control when there are no pending requests Skeleton implementations of get/put, barrier, fence, and finalize have been scaled to over 30 nodes on MTU’s Beowulf cluster.

October 17, Project Work Plan: Start date June 28, 2001 This plan is based on the Project Work Items specified in the March 27 RFP from Compaq and on the March 30 MTU Proposal.

October 17, Completed Work Items (per MTU proposal) 1(a): Review implementation methodologies (b): Identify development platforms (b): Identify development platforms (c): Align resources (staff and platforms) (c): Align resources (staff and platforms) (d): Identify target platforms (d): Identify target platforms (e): Conclusion memo (sent 9/23/1) (e): Conclusion memo (sent 9/23/1) 2: Formal Work Plan and Agreement – (Written version of this document) 4: Initial Design of Run Time System – Design presented in Nashua on August 20, 2001

October 17, Remaining Work Items (w/completion dates) 5: Development of remaining primary components (Jan. 1, 2002) – (d) locks – (e) complete gets and puts – (b) memory allocation – (f) utility functions 3: Test design and documentation (Feb. 1, 2002) – This testing will be done concurrent with Item 5 above. – (a) Synthetic testing – (b) Integration testing – (c) Full-scale testing

October 17, : Public Interface development (April 1, 2002) – (a) Makefiles, release notes, installation notes, etc. – (b) Bundle all necessary software – (c) Provide MTU-authored test codes and results – (d) Release advance copies for review and comment 7: System Refinement and Delivery (June 1, 2002) – (a) Release MuPC to the UPC Developers' Group – (b) Maintain MuPC website at MTU – (c) Publish description of MuPC 8: Completion Certification (June 28, 2002) – (a) Final MuPC release by MTU

October 17, MuPC Project Staff Jeevan Savant, M.S. Graduate Student – MuPC design and implementation – (Items 5(b,d,e,f), 6(a,d), and 7(a,c) above) – Support: 9 months, half-time Kian Gap (Mark) Lee, M.S. Graduate Student – MuPC testing and platform integration – (Items 3(a,b,c), 6(b,c), 7(b,c) above) – Support: 9 months, half-time Phillip Merkey, Research Assistant Professor Steven Seidel, Associate Professor

October 17, Additional MTU UPC projects Charles Wallace, Assistant Professor – UPC Memory models Xiaodi (Lisa) Li, M.S. Graduate Student – Benchmarking MuPC using one or two NAS parallel benchmarks Yi (Leon) Liang, M.S. Graduate Student – Pthreads-only MuPC RTS Yongsheng Huang, M.S. Graduate Student – UPC memory models, improving MuPC efficiency Zhang Zhang, Ph.D. Graduate Student – UPC memory models, improving MuPC efficiency