Presentation is loading. Please wait.

Presentation is loading. Please wait.

A View from the Top End of Year 1 Al Geist October 10-11 Houston TX.

Similar presentations


Presentation on theme: "A View from the Top End of Year 1 Al Geist October 10-11 Houston TX."— Presentation transcript:

1 A View from the Top End of Year 1 Al Geist October 10-11 Houston TX

2 www.scidac.org/ScalableSystems Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC SDSC IBM SNL LANL Ames NCSA Cray Intel Unlimited Scale Participating Organizations Main Web Site

3 Scalable Systems Software Center June 13-14 Houston TX Review of Last Meeting Details in Main project notebook

4 Progress Reports at June. mtg Al Geist – working groups, notebooks, telecoms Working Group Leaders – What areas their working group is addressing Progress report on what their group has done Present problems being addressed Next steps for the group Discussion items for the larger group to consider Demonstrations of Prototype Components One Big intra-component demo Slides can be found in Main Notebook page 22

5 Consensus and Voting: Event Manager Proposal: Much discussion: revised proposal to say that Event Management is important feature to our Software Suite independent of whether it is in a central component or inside components. And that proposed tuple API is initial starting point. Passed strawvote 13 for / 0 against / 0 abstain Adopt HTTP POST (byte count) as standard Proposal: Passed strawvote 10 for / 0 against / 1 abstain Adopt W3 standard for XML signature syntax and process: Long discussion. Decided more discussion needed before vote Bugzilla site now up and running Link is on the ScalableSystems home page.

6 Scalable Systems Software Center June-October Progress Since Last Meeting

7 Five Project Notebooks filling up A main notebook for general information And individual notebooks for each working group Over 200 total pages – 34 added since last meeting A lot of new material in Resource Management notebook (way to go) Get to all notebooks through main web site www.scidac.org/ScalableSystems Click on side bar or at “project notebooks” at bottom of page

8 Four Bi-weekly Working Group Telecoms Less talk more work Resource management, scheduling, and accounting Tuesday 3:00 pm (Eastern) 1-800-664-0771 keyword “SSS mtg” Validation and Testing Wednesday 1:00 pm (Eastern) 1-877-540-9892 mtg code 999157 Proccess management, system monitoring, and checkpointing Thursday 1:00 pm (Eastern) 1-877-252-5250 mtg code 160910 Node build, configuration, and information service Friday 3:00 pm (Eastern) 1-888-469-1934 mtg code 58145 (changes)

9 Scalable Systems Integrated Component Demonstration Queue Manager Allocation Manager Node Monitor Local Scheduler Process Manager Discovery Service Color Key Working Group Resource Management and Accounting Process Management and Monitoring Node Configuration and Build Infrastructure Job Submission Client 1 Submit-Job 3 Query-Node 6 Exec-Process 4 Create-Reservation 2 Query-Job 5 Run-Job 8 Delete-Job 0 Service-Lookup 7 Query-Job 9 Withdraw-Allocation Done June 2002

10 Authentication & Communication R. Lusk Meta Scheduler D. Jackson Meta Manager S. Scott Accounting S. Jackson Scheduler D. Jackson System/Job Monitors M. Showerman Package Services J. Mugler Information Services JP Navaro Allocation Management S. Jackson Queue Manager B. Bode Job Manager B. Bode Checkpoint / Restart P. Hargrove Process Manager R.Lusk Service Directory N. Desai Node Manager T. Naughton C-Plant XML interface E. Debenedictis Resource Mgmt Working Group Build & Configure Working Group Process Mgmt Working Group SSSlib Used by all components

11 Scalable Systems Software Center October 10-11,2002 This Meeting

12 SciDAC Booth

13 SciDAC Systems Poster

14 SciDAC Booth

15 SciDAC Systems Poster (2)

16 Agenda – October 10 8:00 Breakfast 8:30 Al Geist – Project Status. Getting ready for SC 2002 9:00 External Project review – Feburary (start planing) Working Group Reports 9:30Scott Jackson – Resource Management 10:30 Break 11:00 Erik Debenedictis – Validation and Testing 12:00 Lunch (on own but go somewhere as group) 1:00 Paul Hargrove – Process Management 2:00 Narayan Desi – Node Build, Configure 3.00 Break 3:30 SC Demos and Hacking big multi-component demo 5:00 Open Discussion 5:30 Adjourn Working groups may wish to get together in evening

17 Agenda – October 11 8:00 Breakfast 8:30 Discussion, proposals, strawvotes THANKS to Airport Security Meeting for open access to their internet access! ssslib meatball GUI (who?) Chiba City for SC demos (Nov 4?) cross group issues test packaging? 10:30 Break 11:00 Al Geist – Summary SC Booth, demos, theater, software, handout (Brett) February review – reviewers, advisor, talks next meeting date: day before review 12:00 meeting ends

18 External SciDAC Review mtg Late February 2003 – may bubble over to early March 18 month checkup by MICS Each SciDAC Project is reviewed separately – Scalable Systems is the only thing on the agenda Full two days of detailed presentations So many of us will have to give presentations External review panel (different for each ISIC) We can suggest names Can’t be from our organizations or affiliated They will have been given our proposal beforehand

19 External SciDAC Review metrics I asked Fred and McGraw about Metrics: 1. How have we helped SciDAC Aps? Can we show use in CCS and NERSC and others. 2. Put Advisory Panel into place. Apps and Computer Center personnel I’ve asked Drake (Climate), Mezzacapa (Astro), Bland (CCS), Nichols (Chemistry) we need NERSC rep and others? 3. Show short term successes and use

20 External Review Panel Suggestions External review panel (different for each ISIC) We can suggest names - who? Barney McCabe Russ Miller Bart Miller Jose M (IBM) Someone from Cray Someone from Etnus – John Delsignore Someone from Unlimited Scale? Walt Ligon Andrew Lumsdaine Jim Garlick Steve Chapin

21 Meeting Notes Scott Jackson – rm progress Scope queue manager, job manager, scheduler, allocation, & meta Demo CCS, NERSC, and Chiba meta-schedule would be good Scheduler- enhance internal scalability to 64K nodes, add support for HTTP framing protocol. Qbank security enhanced Interface to PBS, LSF, LL for suspend/resume and requeue mgt Queue Manager-conforms to SSSRMAP XML spec. full wire protocol compatibility new enterface to Event Manager Allocation Manager-survey of 15 sites for requirements. Implemented HTTP framing, SHA1-HMAC security working with Qbank/Maui reframed bank objects (accounts, users, allocations) as dynamic object actions defined in metadata cache creation of dynamic web-GUI using PHP and javascript Meta scheduler – interoperates with Grid (globus), fault tolerance – global jobID tracking, scheduler reconnection. Improved user interface Current issues – job state mgt, data staging, job signaling, job steps

22 Meeting Notes Scott Jackson – rm progress (cont.) Next work- prepare for SC demos, scalability testing, BIG thing is release v1.0 RM system. Documentation, security authentication, extend suspend/resume schema beyond what PBS, LL does today Discussion of the need for a scalability testbed. Eric Debenidictis – validation progress Create machine independent test for testing supercomputer Infrastructure QMTest Tests (from all sources) Value- improved method execute the “SSS Standard Test body” Recent Activity – QMTest on SNL SciDAC cluster, test package definition Will McClendon – test architecture (diagram in slides) QMTest is scriptable test driver in Python HTTP based interface – Zope Running at SNL and PSC Requires exact match on STDOUT/STDERR

23 Meeting Notes Will McClendon – test architecture (cont.) QMTest Screenshot and discussion of how tests are done. Raw results need to be interpreted to determine pass or fail Mike ???- goes over the “package” details How to create a test package to the suite – Package File Layout Make-like Will present as a proposal tomorrow Paul Hargrove – pm group Progress – prototyping and development continue how to interface to something we can’t imagine validating schema for process manager node monitor schema created Checkpoint Manager- types serial checkpoints (independent but potentially multithreaded), done parallel checkpoints (MPI) scalable systems XML interfaces

24 Meeting Notes Rusty Lusk – process manager (see diagram in his slides) MPD1 (C) overview – added capabilities required by pmWG MPD is one prototype for SSS Process Manager MPD2 (python) diagram in slides for new design Python about 5X slower with this untuned version Mike Showerman- system monitoring component Craig Steffen full time on this project and a student Using new XML schema defined by Need to write graphical display that uses this new XML interface Run a small cluster in NCSA booth with SSS software stack Discussion – how about an animated meatball diagram Paul returns –Data migration meatball removed Next steps – interfaces continue to stabilize chkpt, PM, monitors Monitoring data... Details need defining

25 Meeting Notes Narayan Desai – Build and configure update Components – service directory (solid and on Chiba now), event manager completely rewritten, stable XML, SSSlib robust (bindings for C++, Java, Python, Perl) (wire protocol modules, basic, challenge, http, http-rm) Build and Config Management (third try at the abstraction) cluster HW build system (OSCAR module for this one in the works) node state manager Issues- Abstraction problems with second try. Multiple implementations important to validate abstraction DEMOS

26 Meta Scheduler Meta Monitor Meta Manager AccountingScheduler Node Configuration & Build Manager User DB Allocation Management Job Queue Manager Process Manager Usage Reports User Utilities High Performance Communication & I/O File System Application Environment Meta Services Testing & Validation System & Job Monitor Event Manager Service Directory Checkpoint / Restart Blue text – uses ssslib Red text – talks ssslib protocol Refined Picture on Next Slide

27 Accounting File System Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler User DB Allocation Management Process Manager Usage Reports User Utilities High Performance Communication & I/O Application Environment Meta Services System & Job Monitor Checkpoint / Restart Grid Interfaces Job Queue Manager These Interface To all Node Configuration & Build Manager


Download ppt "A View from the Top End of Year 1 Al Geist October 10-11 Houston TX."

Similar presentations


Ads by Google