Progress on Release, API Discussions, Vote on APIs, and Quarterly Report Al Geist May 6-7, 2004 Chicago, ILL.

Slides:



Advertisements
Similar presentations
EPrints 2.0 / March 4 th 2002 / Glasgow / Chris Gutteridge Introduction to EPrints 2.0 March 4 th 2002 Glasgow Christopher Gutteridge from the Department.
Advertisements

The Premier Software Usage Analysis and Reporting Toolset CELUG Presentation – May 12, 2010 LT-Live : License Tracker’s License Server Monitor.
A View from the Top Al Geist February Houston TX.
Component Interface Testing SciDAC Quarterly Report Argonne, IL William McLendon Ron Oldfield Sandia National Laboratories Sandia is a multiprogram laboratory.
IWay Service Manager 6.1 Product Update Scott Hathaway iWay Software Copyright 2010, Information Builders. Slide 1.
E-commerce Project Erik Zeitler Erik Zeitler2 Lab 2  Will be anounced and scheduled later  We will deploy Java Server Pages on a Tomcat server.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
CYPRESS PNSQC (15-17 Oct. 2001) 1 Testing a Bluetooth Product With Web and Embedded Software Rick Clements cypress.com Cypress Semiconductors 9125.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
A View from the Top End of Year 1 Al Geist October Houston TX.
Progress on Integration, Vote on APIs SC2003, and SW release Al Geist September 11-12, 2003 Rockville, MD.
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
SC04 Release, API Discussions, SDK, and FastOS Al Geist August 26-27, 2004 Chicago, ILL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
A View from the Top November Dallas TX. Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC.
Oak Ridge National Laboratory — U.S. Department of Energy 1 The ORNL Cluster Computing Experience… Stephen L. Scott Oak Ridge National Laboratory Computer.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
A View from the Top Preparing for Review Al Geist February Chicago, IL.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Component updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist January 25-26, 2005 Washington DC.
Tech Terminology for non-technical people Tim Bornholtz 2006 Annual Conference.
Working Group updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist May 10-11, 2005 Chicago, ILL.
Semantic Interoperability Berlin, 25 March 2008 Semantically Enhanced Resource Allocator Marc de Palol Jorge Ejarque, Iñigo Goiri, Ferran Julià, Jordi.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Review report, Vote on APIs Quarterly report, and SW release Al Geist June 5-6, 2003 Chicago, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
SciDAC SSS Quarterly Report Sandia Labs August 27, 2004 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Process Management & Monitoring WG Quarterly Report January 25, 2005.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne,
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
A View from the Top Al Geist June Houston TX.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
SSS Build and Configuration Management Update February 24, 2003 Narayan Desai
Version Control and SVN ECE 297. Why Do We Need Version Control?
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Sprint 113 Review / Sprint 114 Planning August 12th, 2013.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Process Manager Specification Rusty Lusk 1/15/04.
“Warehouse” Monitoring Software Infrastructure Craig Steffen, NCSA SSS Meeting June 5, Argonne, Illinois.
An API for the Process Manager Component Meeting at Argonne June 5-6, 2003.
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
/16 Final Project Report By Facializer Team Final Project Report Eagle, Leo, Bessie, Five, Evan Dan, Kyle, Ben, Caleb.
SciDAC CS ISIC Scalable Systems Software for Terascale Computer Centers Al Geist SciDAC CS ISIC Meeting February 17, 2005 DOE Headquarters Research sponsored.
Process Management & Monitoring WG Quarterly Report August 26, 2004.
SciDAC SSS Quarterly Report Sandia Labs January 25, 2005 William McLendon Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Component updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist January 25-26, 2005 Washington DC.
A View from the Top Al Geist February Houston TX.
LCGAA nightlies infrastructure
Scalable Systems Software for Terascale Computer Centers
Presentation transcript:

Progress on Release, API Discussions, Vote on APIs, and Quarterly Report Al Geist May 6-7, 2004 Chicago, ILL

Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC SDSC IBM SGI SNL LANL Ames NCSA Cray Intel Participating Organizations How do we position ourselves for the DOE Ultrascale facility winner to be announced May 12 Regardless of who is chosen we should try to be in a position to help with the system software needs of the facility.

IBM Cray Intel SGI Scalable Systems Software Participating Organizations ORNL ANL LBNL PNNL NCSA PSC SDSC SNL LANL Ames Collectively (with industry) define standard interfaces between systems components for interoperability Create scalable, standardized management tools for efficiently running our large computing centers Problem Goals Computer centers use incompatible, ad hoc set of systems tools Present tools are not designed to scale to multi-Teraflop systems Resource Management Accounting & user mgmt System Build & Configure Job management System Monitoring To learn more visit

Grid Interfaces Accounting Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler Node State Manager Allocation Management Process Manager Usage Reports Meta Services System & Job Monitor Job Queue Manager Node Configuration & Build Manager Standard XML interfaces authentication communication Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite Checkpoint / Restart Validation & Testing Hardware Infrastructure Manager Packaging & Install Scalable Systems Software Suite Updates to this diagram

Scalable Systems Software Center January Argonne Review of Last Meeting Details in Main project notebook

Highlights from Jan. mtg Craig – 1280 dual xeon cluster “Titanium” is available this evening To test the scalability of SSS suite. One node will be used as Head node to install our suite and run on entire cluster. Could build everything but Bambo and ssslib due to Xerses Will begin to be available at 6pm Late night session on 1280 node testbed PM ran at 1280 worked at 4000, hung at 6000 Warehouse had a problem at 1280 and took out head node RM components ran on head node OK until Warehouse crashed it Scott Jackson – Gold running on 11 TF PNNL cluster Thomas Naughton – 2 nd release March. Discussion of how many orgs in our group could shakedown the tarball. Group feels better to have few very reliable components than all components

Highlights from Jan. mtg (cont.) Rusty Lusk – Process Manager Spec for first vote Presentation and discussion… Who is responsible for limited enforcement PM or QM? I.e. Must use certain amount of memory, must not execute OS command (in general - things that happen after fork) Rusty says the question is good and he needs to think about How this may affect the interface. Other items to think about - use of wildcard as “to be returned” operator – OK - Inclusion but don’t show me. - Dynamic jobs and PM. - improve readability Delay vote until we have a written proposal.

Highlights from Jan. mtg Discussion of having two XML syntax styles (functional, object) Al says he would like to see one common one across the suite that he didn’t care which one as long as the whole group could agree. Narayan – Restriction Syntax Overview. An issue of uniqueness was brought up and was to be taken into consideration by Narayan Rusty Lusk – Restriction Syntax on Chiba City David would like to see a paper of the requirements that the Chiba effort required. Andrew and Paul and Craig offer to investigate a prototype translator To see how / if it is possible. Investigate standardization of tokens across the two syntax

Scalable Systems Software Center January-May Progress Since Last Meeting

SciDAC PI mtg – March 22-24, 2004 In Charleston SC with several attending for Scalable Systems 2 page project summary report Annual report for Fred 20 minute talk – presented by Rusty Fred asked each ISIC to use new speaker Poster Presentation – by Stephen/John

Systems Software Suite 2 nd Release Target Date March ‘04 – So we could announce it at the PI meeting. Real Status? SSS-OSCAR – will hear more in next talk Need way to test that the suite is installed correctly

Five Project Notebooks A main notebook for general information And individual notebooks for each working group Over 300 total pages BC and PM groups need to get specs into their notebooks Add Telecom meeting notes even if short (Kudos to RM group) Get to all notebooks through main web site Click on side bar or at “project notebooks” at bottom of page

Bi-Weekly Working Group Telecoms RM is only notes I see in notebook Resource management, scheduling, and accounting Tuesday 3:00 pm (Eastern) keyword “SSS mtg” Proccess management, monitoring, and checkpointing Thursday 1:00 pm (Eastern) mtg code Node build, configuration, and information service Thursday 3:00 pm (Eastern) mtg code (changes)

Scalable Systems Software Center May 6-7, 2004 This Meeting

Major Topics this Meeting Stability of Systems Software Suite – second release is out. Are we ready for outside users? Quarterly Report Due – would like to get one to Fred by end of May. Will need text from WG leaders. Formal API presentations and voting - we left several things hanging last meeting MICS PI Mtg - August 9-12 at Argonne. A good time to have a highlight of outside user(s) SC04 Mtg - November in Pittsburg. Talks? Tutorial? Birds of a feather?

Agenda – May 6 8:30 Al Geist – Project Status. 9:15 Thomas Naughton – SSS OSCAR software suite release Working Group Reports Progress report on what their group has done API Proposals for adoption by the group Progress on software suite improvements 9:30Narayan Desai – Node Build, Configure 10:30 Break 11:30 Will McClendon – Validation and Testing 12:30 Lunch (on own – cafeteria) 1:30 Ron Oldfield – ASAP testing, and formalism issues 2:00 Paul Hargrove – Process Management Craig and Rusty 3:00 Scott Jackson – Resource Management 4:00Paul/Craig – findings about trying to build a syntax translator 4:30 Group Discussion on getting outside users of 2 nd release 5:00 Al – Discussion on SC04, other conferences, papers, etc. 5:30 Adjourn

Agenda – May 7 8:30 Discussion, proposals, votes Craig – discussion Paul – straw vote on two syntax Rusty - Process Manager proposal (deferred) Scott – Allocation Manager proposal (deferred) Al - Quarterly report, papers, SC04, other meetings. 10:30 Break 11:00 Al Geist – Release 2 and outside users (Jazz? Ram? NCSA? SNL?) MICS PI Mtg August at Argonne (news to come) next meeting date: August 26-27, 2004 location: Argonne 12:00 meeting ends

Meeting notes Al Geist – presents project overview and goals for this meeting Thomas Naughton – SSS-OSCAR: in tarball is Bamboo, BRLC, Gold, LAM/MPI, MAUI-SSS, SSSLib, Warehouse, MPD2 SSSLib contains SD, EM, PM, BCM, NSM, NHw, plus communication Todo: bug tracker, test sss-oscar-v2a6-v3.0 for pre-release, Documentation- use scidac review 1 pager, add license-sss to directory Need: A test suite and a few test machines to test on Discussion on APItest and who creates tests, etc. Each does individual Establish release schedule thru SC04 Add easier way for authors to “test just their stuff SC04 – fully tested release v1.0 with all SSS components code freeze Friday September 3

Meeting notes Narayan Dasi – Build Configure Library improvements- bugfixes, testing of java support, SSL testing Infrastructure Improvements-sss python library improvements, EM bugfixes BCM component usage experience Hardware infrastructure – still seeking purpose Restriction Syntax examples given and discused craig thankful that !d (don’t display this field) now works Uniqueness issue-default is to return all duplicates new flag “unique=true” to remove duplicates much discussion. Rusty suggests remove only duplicate lines Paul brings up the problem on “action” commands ie kill jobs twice Al says the problem is not solvable in general in restriction syntax Scott asked if RMAP syntax can handle this? Much work on the board. And question of atomicity of queries which require multiple SQL queries to complete.

Meeting notes Will McClendon – Component Interface Testing APITest v0.1.2 It is now available by FTP by putting it under GPL Cplant license ftp://ftp.sandia.gov/outgoing/apitest (also in notebook) Not integrated back into ssslib HTTP Interface development “Twisted Python” framework Info and Scott helped find bug in python popen3 – now uses Twisted SpawnProcess Better support for browsing test data within session Batch and test data stored in an in-memory in XML file format writing out data to file available soon Shows an XML example that runs test. Several questions answered Shows an XML batch file example. Runs live demo – works fine. Discussion follows. Ron Oldfield – replacing Eric DeBenedictis who is moving to other SNL jobs -ORNL help set up a testing environment -Testing for correct installation and individual tests, then whole suite test

Meeting notes Ron Oldfield (cont) – simulating real workloads performance and scalability testing needed in the future portability is important for our reference implementation discussion code portability vs feature portability authorization also needs testing What are the issues in lightweight OS Standard naming conventions both format and semantics someone really needs to go through the existing schemaes RMAP dictionary makes a good starting point Paul Hargrove – process management Still continue development on all three components Syntax translation effort to be discussed later today. Checkpoint –pre-emption (suspend and resume) works -checkpointing (ckpt works, restart in progress) Todo: migration, checkpoint file management – not overflow disks (list,delete) Query- “can I restart here”

Meeting notes Paul Hargrove – process management (cont) Suspend/resume works with Bamboo, SD, EM, OM, PM components Still need to design restart-time interactions with RM group Open files support under testing Bug fix releases as needed. Checkpoint manger outstanding issues Implement full interface using restriction syntax, event generation, error reporting Must implement file management think ls and rm, expiration Craig Steffan – no slides Tried run on 1280 nodes on Tungsten failed, did run on 128 Can now run on 1024 nodes. Being stopped by #sockets limit Harvesting can now be done of other info f.e. myrinet HW Next: adding support for “job” management start interfacing with Build group help to get it on Chiba

Meeting notes Rusty Lusk – process manager update PM component – added “limits” interface, dynamic jobs (mpi_comm_spawn) can spawn lots of nodes and the use “unused” ones as needed show limits spec MPD2 improvements found by production use on chiba support for limits support for mpi_comm_spawn interactive debugging via mpigdb – allows control of stdin, stderr, stdout Future: need to work more closely with QM QM interface for requesting dynamic jobs

Meeting notes Scott Jackson – resource manager update Diagram on board Released SSSRMAPv3 spec New things - wire protocol - message format - job groups Latest software release (in OSCAR) uses SSSRMAP v2 Second release of Bamboo in March w/ epilogue and prologue support Gold now fully SSSRMAP v2 - second alpha release due June - which will be in Perl (first release in Java ran into memory size limits) - user guide done - first release running on PNNL’s SGI Altix Testing using APITest begun Silver several,various improvements in XML Future work: implement SSSRMAP v3 in the components - merger of Maui 3.2 and SSS. Integrate chkpt/restart. Limit enforcement - now SSS affects all Maui users. Ability to handle dynamic jobs Job group Job T T T Task group Multi-step job Job

Meeting notes Paul – translator report (no slides) looking at the two syntax and seeing if we could automate Translation between sssrmap and restriction syntax Found: sssrmap could say 4<proc<16 but not in RS RS band aid – special operators to handle ranges For multiple table queries – nested RS syntax doesn’t have Information (primary data type) to know how to combine multiple SQL results There is no way to translate between these cases. Paul discourages the implementation of a translator.

Meeting notes – Day 2 Craig – General thoughts on official V1.0 (no slides) Released at SC04 this will be the first time many people will see Our orthogonal directions in syntax is damaging If we don’t make a decision soon - project progress towards V1.0 Brett, who works with both, favors the SSSRMAP He likes the more descriptive nature of it and OO nature. Rusty says that we need two written proposals for a component that we can compare and vote on otherwise we are just all talk. Paul says the one is better but two is not too bad. Scott doesn’t think we can reconcile Paul asks for straw vote for a preference, Scott second’s SSRMAP – 7 and 5 institutions (but one is Al) Restriction Syntax - 3 all ANL Abstain – 3 and 2 institutions Craig says he will do whatever it takes to make either work. he is going to make ssslib SSSRMAP work Neil says “users” are guiding factor and RMAP better there Paul says understandability and acceptability is key and RMAP is better Both say that RS is more compact and elegant.

Meeting notes – Day 2 (cont) Narayan- asks does it just need documentation and tutorials Paul says no. There is closer match for SOAP et al. the OO was not a factor in his choice, but it is more popular today. Neil says potential users won’t have a Narayan to figure this out. Components are both client and server so developer has to know syntax. Rusty – if there was something else added to RS that made it easier to use or understand. He is not sure it is a good idea. Will – documentation is better in RMAP and he has looked at RMAP more Would all this stuff be more abstracted? User does as little as they can read manual only after they get stuck. Doesn’t care as long we pick ONE! Need to have a same look and feel across the project. Rick – I don’t care which. I don’t like XML. What about the SD and EM that are already accepted. Al – says that he feels that RMAP would be more acceptable to vendors and this would be a critical to long term success of the project. Paul says that Process manager document is not complete enough to vote on at this time.

Meeting notes – Day 2 (cont) Discussion -