Download presentation
Presentation is loading. Please wait.
Published byRoxanne Barrett Modified over 8 years ago
1
Department of Computer Science Joshua Goehner and Dorian Arnold University of New Mexico (In collaboration with LLNL and U. of Wisconsin
2
Key statistics from Top500 (Nov. 2011) ◦ 149/500 (30%) more than 8,192 core In 2006, this number was 12/500 (2.4%) ◦ 7 systems: more than 64K cores ◦ 6 systems: between 64K and 128K cores ◦ 3 systems: more than 128K cores Later this year: LLNL’s Sequoia w/ 1.6M cores 2018?: Exascale systems w/ 10 7 or 10 8 cores A Little Preaching to the Choir Software systems must scale as well!
3
Before bootstrapping: ◦ Program image on storage device ◦ Set of (allocated) computer nodes After bootstrapping: Application processes started on computer nodes Application’s configuration complete ready for primary operation Software Infrastructure-bootstrapping Given a node allocation, start infrastructure's composite processes and propagate necessary startup information.
4
“I gotta get my application up and running!”
5
(1) “First, I start all the processes on the appropriate nodes”
6
(2) “Next, I must disseminate some initialization information”
7
Bootstrapping is complete when the infrastructure is ready for steady-state usage.
8
Basic Bootstrapping Sequential instantiation ◦ e.g. rsh or ssh-based Point-to-point propagation 70 seconds to start 2K processes!
9
Infrastructure-specific, scalable mechanisms ◦ Still limited by sequential operations ◦ Not portable to other infrastructures Using high-performance resource managers ◦ Myriad interfaces ◦ No communication facilities Generic bootstrapping infrastructures ◦ LaunchMON: targets tools with wrapper for existing RMs ◦ ScELA: sequentially launch agents that create local processes More Scalable Approaches
10
MRNet Sequential-ish Bootstrapping Parent creates children Local fork ()/ exec () Remote rsh -based mechanism Integrated instantiation and information propagation MRNet’s “standard”
11
MRNet XT (hybrid) process launch Bulk-launch 1 process per node Process launches collocated processes Information propagated in similar fashion
12
LIBI Approach LIBI: Lightweight infrastructure- bootstrapping infrastructure ◦ Name is no longer a work in progress ◦ Generic service for scalable distributed software infrastructure bootstrapping Process launch Scalable, low-level collectives Large Scale Distributed Software Job Launchers Communication Services LIBI rsh/ssh SLURM OpenRTE ALPS LaunchMON DebuggersSystem Monitors Applications Performance Analyzers Overlay Networks COBO MPI
13
session: set of processes (to be) deployed ◦ session master manages other members ◦ session front-end interacts with session master ◦ LIBI currently supports only master/member communication host-distribution: where to create processes ◦ process distribution: how/where to create processes ◦ LIBI API
14
launch(process-distribution-array) ◦ instantiate processes according to input distributions [send|recv]UsrData(session-id, msg) ◦ communicate between front-end and session master broadcast(), scatter(), gather(), barrier() ◦ communicate between session master and members LIBI API (cont’d)
15
front-end( ){ LIBI_fe_init(); LIBI_fe_createSession(sess); proc_dist_req_t pd; pd.sessionHandle = sess; pd.proc_path = get_ExePath(); pd.proc_argv = get_ProgArgs(); pd.hd = get_HostDistribution(); LIBI_fe_launch(pd); //test broadcast and barrier LIBI_fe_sendUsrData(sess1, msg, len ); LIBI_fe_recvUsrData(sess1, msg, len); //test scatter and gather LIBI_fe_sendUsrData(sess1, msg, len ); LIBI_fe_recvUsrData(sess1, msg, len); return 0; } Example LIBI Front-end
16
session_member() { LIBI_init(); //test broadcast and barrier LIBI_recvUsrData(msg, msg_length); LIBI_broadcast(msg, msg_length); LIBI_barrier(); LIBI_sendUsrData(msg, msg_length) //test scatter and gather LIBI_recvUsrData(msg, msg_length); LIBI_scatter(msg, sizeof(rcvmsg), rcvmsg); LIBI_gather(sndmsg, sizeof(sndmsg), msg); LIBI_sendUsrData(msg, msg_length); LIBI_finalize(); } Example LIBI-launched Application
17
LaunchMON-based runtime ◦ Tested SLURM or rsh launching ◦ COBO PMGR service LIBI Implementation Status Large Scale Distributed Software Job Launchers Communication Services LIBI rsh/ssh SLURM OpenRTE ALPS LaunchMON DebuggersSystem Monitors Applications Performance Analyzers Overlay Networks COBO MPI
18
LIBI Early Evaluation
19
LIBI Microbenchmark Results
21
MRNet uses LIBI to launch all MRNet processes ◦ Parse topology file and setup/call LIBI_launch() Session front-end gathers/scatters startup information ◦ Parent listening socket (IP/port) MRNet/LIBI Integration
22
STAT over MRNet over LIBI performance results Basic, scalable rsh-based mechanism Hybrid (XT-like) launch approach Mechanisms to alleviate filesystem contention ◦ Like our scalable binary relocation service More flexible process and host distributions ◦ Instantiating different images in same session ◦ Integrating allocating and launching Integrated scalable communication infrastructure Some Future Work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.