Presentation is loading. Please wait.

Presentation is loading. Please wait.

A View from the Top Preparing for Review Al Geist February 24-25 Chicago, IL.

Similar presentations


Presentation on theme: "A View from the Top Preparing for Review Al Geist February 24-25 Chicago, IL."— Presentation transcript:

1 A View from the Top Preparing for Review Al Geist February 24-25 Chicago, IL

2 www.scidac.org/ScalableSystems Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC SDSC IBM SNL LANL Ames NCSA Cray Intel Unlimited Scale Participating Organizations Main Web Site

3 IBM Cray Intel Unlimited Scale Scalable Systems Software Participating Organizations ORNL ANL LBNL PNNL NCSA PSC SDSC SNL LANL Ames Collectively (with industry) define standard interfaces between systems components for interoperability Create scalable, standardized management tools for efficiently running our large computing centers Problem Goals Impact Computer centers use incompatible, ad hoc set of systems tools Present tools are not designed to scale to multi-Teraflop systems Reduced facility mgmt costs. More effective use of machines by scientific applications. Resource Management Accounting & user mgmt System Build & Configure Job management System Monitoring www.scidac.org/ScalableSystemsTo learn more visit

4 Grid Interfaces Accounting Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler Node State Manager Allocation Management Process Manager Usage Reports Meta Services System & Job Monitor Job Queue Manager Node Configuration & Build Manager Standard XML interfaces Working Components and Interfaces (bold) authentication communication Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite Checkpoint / Restart Progress so far on Integrated Suite Validation & Testing Hardware Infrastructure Manager

5 Scalable Systems Software Center October 10-11 Houston TX Review of Last Meeting Details in Main project notebook

6 Progress Reports at Oct. mtg Al Geist – preparation for Supercomputing 2002, booth space, posters, demos Working Group Leaders – What areas their working group is addressing Progress report on what their group has done Present problems being addressed Next steps for the group Discussion items for the larger group to consider Demonstrations of Prototype Components Prep for SC demo Slides can be found in Main Notebook page 29

7 Consensus and Voting:

8 Accounting File System Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler User DB Allocation Management Process Manager Usage Reports User Utilities High Performance Communication & I/O Application Environment Meta Services System & Job Monitor Checkpoint / Restart Grid Interfaces Job Queue Manager These Interface To all Node Configuration & Build Manager

9 Scalable Systems Software Center November-February Progress Since Last Meeting

10 SciDAC Booth

11 SC2002 Systems Posters

12 Five Project Notebooks filling up A main notebook for general information And individual notebooks for each working group Over 216 total pages – 20 added since last meeting A lot of XML scheme to comment on New subscription feature Get to all notebooks through main web site www.scidac.org/ScalableSystems Click on side bar or at “project notebooks” at bottom of page

13 Weekly Working Group Telecoms Resource management, scheduling, and accounting Tuesday 3:00 pm (Eastern) 1-800-664-0771 keyword “SSS mtg” Validation and Testing (hasn’t met since last year) Wednesday 1:00 pm (Eastern) 1-877-540-9892 mtg code 999157 Proccess management, system monitoring, and checkpointing Thursday 1:00 pm (Eastern) 1-877-252-5250 mtg code 160910 Node build, configuration, and information service Thursday 3:00 pm (Eastern) 1-888-469-1934 mtg code (changes)

14 Scalable Systems Software Center February 24-25, 2003 This Meeting

15 Agenda – February 24 8:30 Al Geist – Project Status. SciDAC PI mtg and External Project review 9:00 Matt Sottile – Science Appliance Project Working Group Reports 9:30Scott Jackson – Resource Management 10:30 Break 11:00 Erik Debenedictis – Validation and Testing 12:00 Lunch (on own - walk to cafeteria) 1:00 Paul Hargrove – Process Management 2:00 Narayan Desai – Node Build, Configure 3.00 Break 3:30 Large Scale Run on Chiba debugging components 5:00 Open Discussion of Review report 5:30 Adjourn Working groups may wish to hack in evening

16 Agenda – February 25 8:30 Discussion, proposals, straw votes Write paper on each component Draft report in main notebook Comments on “restricted interface” XML shown by Rusty External review demo – can we? 10:30 Break 11:00 Al Geist – Summary PI mtg talk and poster. External review agenda next meeting date: June 5&6 at Argonne. thank our hosts ANL 12:00 meeting ends

17 SciDAC PI mtg – all 50 projects March10-11, 2003 – Napa California Attending for Scalable Systems – Al Geist, Brett Bode 20 minute talk – presented by Al Scalable Systems, CCA, PERC, SDM Poster Presentation

18 External SciDAC Review mtg March12-13, 2003 – Napa California Attending for Scalable Systems – Al Geist, Brett Bode, Paul Hargrove, Narayan Desai, Mike Showerman. (Rusty) Four ISIC Projects are reviewed separately – Scalable Systems, CCA, PERC, SDM External review panel (8 members) Bob Lucas, Jim McGraw, Jose Munoz, Lauren Smith, Richard Mount, Ricky Kendall, Rod Oldehoeft, and Tony Mezzacappa [John Grosh?] We owe them a Review report Day 1 – Each gets 1 ¾ hours to present project Day 2 – Each project gets grilled by panel for 1½ hrs

19 External Review mtg Agenda Wednesday, March 12 7:45Welcome, charge to reviewers 8:15Plenary session for Common Component Architecture ISIC 10: 00Break 10:15Plenary session for Scalable Systems Software ISIC 12:00Reviewer caucus 12:15 Lunch 1:15Plenary session for Scientific Data Management ISIC 3:00Break 3:15Plenary session for Performance Engineering ISIC 5:00 Reviewer caucus 5:30Adjourn

20 External Review mtg Agenda Thursday, March 13 8:00Meetings between reviewers and ISIC members A. Common Component Architecture B. Scalable Systems Software 9:45Break 10:00Meetings between reviewers and ISIC members C. Scientific Data Management D. Performance Engineering 11:45Reviewer Caucus/End of ISIC Reviews 12:15Lunch (on your own) 1:15Programming Models Review Session I 3:00Break 3:15Programming Models Review Session II 5:00Programming Models Reviewer Caucus 5:30Meeting adjourns

21 Meeting Notes Matt - Pink: a 1024 node science appliance. Provide pseudo SSI that scales to 1024. Tolerates failure. Singe point for management. Reduce boot and install time by x100. Reduce number of FTP per number of nodes. Science Appliance – very little in common with older linux. Software is called Clustermatic – linuxBIOS, Bproc, V9fs, supermon, Panasas or Lustre (parallel file system by someone else) Beoboot, asymmetric SSI, private name spaces from Plan 9, BJS (Bproc Job Scheduler) Other work – ZPL (automatic check point) Debuggers (parallel, relative debugging –Guard) port totalview. Latency tolerant applications Users – SNL/CA, U Penn, Clemson What are overlap opportunities? Each piece can be separated out. Supermon, Bproc Remy will be sending more material on collaboration soon

22 Meeting Notes Scott- RM update. Diagram of architecture and infrastructure services Sc02 demo what components working. They used polling. Now moving to event driven components Release of initial RM suite – from website http://sss.scl.ameslab.gov/software/ OpenPBS-sss 2.3.15-1 Maui scheduler 3.2.6 Qbank 2.10.4 (accounting system) SSSRMAP protocol using HTTP validated Scalability testing performed on all components Scheduler progress Queue Manager progress Accounting and Allocation Manager progress (Qbank and Gold prototype) Meta-scheduler progress – Globus interface, Gold Information service. Next work Release 2 of RM interface Implement and test SSSRMAP security authentication (XML digital sigs) Discuss need to have SSS wrappers on initial RM suite

23 Meeting Notes Will- Validation and Testing update Users expect a high degree of quality in today’s HPC. Strategies QMTest – RM group using it (www.codesourcery.com) They like it “easy” App test packages APITEST – growing out of October discussion C++ driven XML schema scriptable test of network components blackbox testing. Tcp, ssslib, portals support, fault injection whitebox testing. Try to exercise all paths in a known suite v0.1a underway 75% done Discussion how this could be useful to Scalable Systems Cluster Integration Toolkit (CIT) –James Laros jhlaros@sandia.gov management tasks on Cplant – scalable to 1800 nodes done in Perl create Scalable Systems interface to CIT would be a good test of implementation of flexibility of standard. USI, IBM, and Linux Networx looking at it.

24 Meeting Notes Paul – Process management report. Moving beyond prototypes of: Checkpoint manager beta-code April release awaiting legal OK will do scalability test today working on XML interface for checkpoint/restart (draft in May) Mike - Monitoring – job, system, node, and meta-version what data is needed – an extensible framework defined stream and single item. working on scalability now Rusty - Process Manager schematic of PM component MPD-2 in python and distributed with MPICH-2 -supports separate executables, arguments, and environment variables New XML for PM (with queries that allow wildcards and ranges) Combination of published interfaces, XML, and communication lib gives us a power greater than the sum of its parts.

25 Meeting Notes Narayan – Build and configure report Tests suggest scalability to 2000 host clusters Communication Infrastructure more protocol support, high availability option. Build and configuration complete implementation on Chiba City second OSCAR implementation undreway three components - hardware manager (needs more modular, extensible design) - build system - node manager (admin control panel for a cluster) system diagnostics Restriction Based Syntax for XML interfaces API augmentation APIs need more documentation to describe event handling protocol

26 Meeting Notes John Dawson asks about license. Al says like MPI. Don (Cray) asks about license !GNU and holding a workshop for industry Talk with Remy about Science Appliance collaboration Talk with Rusty about writing a paper on each component. Groups Work on large scalability test on Chiba City and XTORC


Download ppt "A View from the Top Preparing for Review Al Geist February 24-25 Chicago, IL."

Similar presentations


Ads by Google