Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin.

Similar presentations


Presentation on theme: "Department of Computer Science Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin."— Presentation transcript:

1 Department of Computer Science Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin

2  Key statistics from Top500 (Nov. 2011) ◦ 149/500 (30%) more than 8,192 core  In 2006, this number was 12/500 (2.4%) ◦ 7 systems: more than 64K cores ◦ 6 systems: between 64K and 128K cores ◦ 3 systems: more than 128K cores  Later this year: LLNL’s Sequoia w/ 1.6M cores  2018?: Exascale systems w/ 10 7 or 10 8 cores A Little Preaching to the Choir Software systems must scale as well!

3  Before bootstrapping: ◦ Program image on storage device ◦ Set of (allocated) computer nodes  After bootstrapping:  Application processes started on computer nodes  Application’s configuration complete  ready for primary operation Software Infrastructure-bootstrapping Given a node allocation, start infrastructure's composite processes and propagate necessary startup information.

4 “I gotta get my application up and running!”

5 (1) “First, I start all the processes on the appropriate nodes”

6 (2) “Next, I must disseminate some initialization information”

7 Bootstrapping is complete when the infrastructure is ready for steady-state usage.

8 Basic Bootstrapping  Sequential instantiation ◦ e.g. rsh or ssh-based  Point-to-point propagation 70 seconds to start 2K processes!

9  Infrastructure-specific, scalable mechanisms ◦ Still limited by sequential operations ◦ Not portable to other infrastructures  Using high-performance resource managers ◦ Myriad interfaces ◦ No communication facilities  Generic bootstrapping infrastructures ◦ LaunchMON: targets tools with wrapper for existing RMs ◦ ScELA: sequentially launch agents that create local processes More Scalable Approaches

10 MRNet Sequential-ish Bootstrapping  Parent creates children  Local  fork ()/ exec ()  Remote  rsh -based mechanism  Integrated instantiation and information propagation  MRNet’s “standard”

11 MRNet XT (hybrid) process launch  Bulk-launch 1 process per node  Process launches collocated processes  Information propagated in similar fashion

12 LIBI Approach  LIBI: Lightweight infrastructure- bootstrapping infrastructure ◦ Name is no longer a work in progress ◦ Generic service for scalable distributed software infrastructure bootstrapping  Process launch  Scalable, low-level collectives Large Scale Distributed Software Job Launchers Communication Services LIBI rsh/ssh SLURM OpenRTE ALPS LaunchMON DebuggersSystem Monitors Applications Performance Analyzers Overlay Networks COBO MPI

13  session: set of processes (to be) deployed ◦ session master manages other members ◦ session front-end interacts with session master ◦ LIBI currently supports only master/member communication  host-distribution: where to create processes ◦  process distribution: how/where to create processes ◦ LIBI API

14 launch(process-distribution-array) ◦ instantiate processes according to input distributions [send|recv]UsrData(session-id, msg) ◦ communicate between front-end and session master broadcast(), scatter(), gather(), barrier() ◦ communicate between session master and members LIBI API (cont’d)

15 front-end( ){ LIBI_fe_init(); LIBI_fe_createSession(sess); proc_dist_req_t pd; pd.sessionHandle = sess; pd.proc_path = get_ExePath(); pd.proc_argv = get_ProgArgs(); pd.hd = get_HostDistribution(); LIBI_fe_launch(pd); //test broadcast and barrier LIBI_fe_sendUsrData(sess1, msg, len ); LIBI_fe_recvUsrData(sess1, msg, len); //test scatter and gather LIBI_fe_sendUsrData(sess1, msg, len ); LIBI_fe_recvUsrData(sess1, msg, len); return 0; } Example LIBI Front-end

16 session_member() { LIBI_init(); //test broadcast and barrier LIBI_recvUsrData(msg, msg_length); LIBI_broadcast(msg, msg_length); LIBI_barrier(); LIBI_sendUsrData(msg, msg_length) //test scatter and gather LIBI_recvUsrData(msg, msg_length); LIBI_scatter(msg, sizeof(rcvmsg), rcvmsg); LIBI_gather(sndmsg, sizeof(sndmsg), msg); LIBI_sendUsrData(msg, msg_length); LIBI_finalize(); } Example LIBI-launched Application

17  LaunchMON-based runtime ◦ Tested SLURM or rsh launching ◦ COBO PMGR service LIBI Implementation Status Large Scale Distributed Software Job Launchers Communication Services LIBI rsh/ssh SLURM OpenRTE ALPS LaunchMON DebuggersSystem Monitors Applications Performance Analyzers Overlay Networks COBO MPI

18 LIBI Early Evaluation

19 LIBI Microbenchmark Results

20

21  MRNet uses LIBI to launch all MRNet processes ◦ Parse topology file and setup/call LIBI_launch()  Session front-end gathers/scatters startup information ◦ Parent listening socket (IP/port) MRNet/LIBI Integration

22  STAT over MRNet over LIBI performance results  Basic, scalable rsh-based mechanism  Hybrid (XT-like) launch approach  Mechanisms to alleviate filesystem contention ◦ Like our scalable binary relocation service  More flexible process and host distributions ◦ Instantiating different images in same session ◦ Integrating allocating and launching  Integrated scalable communication infrastructure Some Future Work


Download ppt "Department of Computer Science Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin."

Similar presentations


Ads by Google