Review report, Vote on APIs Quarterly report, and SW release Al Geist June 5-6, 2003 Chicago, IL.

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

The MPI Forum: Getting Started Rich Graham Oak Ridge National Laboratory.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
A View from the Top Al Geist February Houston TX.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
OASIS PKI Action Plan – Overcoming Obstacles to PKI Deployment and Usage Steve Hanna, Co-Chair, OASIS PKI Technical Committee.
JAVAPOSTM Java for POS Devices
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting February 24-25, 2003.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
A View from the Top End of Year 1 Al Geist October Houston TX.
Exchange Network Node Help Desk NOLA Conference Feb 9-10, 2004.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Testing, Integration, Validation, and/or XML Erik DeBenedictis Sandia National Labs Sandia is a multiprogram laboratory operated by Sandia Corporation,
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Progress on Integration, Vote on APIs SC2003, and SW release Al Geist September 11-12, 2003 Rockville, MD.
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
Resource Management and Accounting Working Group Working Group Scope and Components Progress made Current issues being worked Next steps Discussions involving.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Aug 26-27, 2004 Argonne, IL.
SC04 Release, API Discussions, SDK, and FastOS Al Geist August 26-27, 2004 Chicago, ILL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 5-6, 2003.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
A View from the Top November Dallas TX. Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting Jan 25-26, 2005 Washington D.C.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
A View from the Top Preparing for Review Al Geist February Chicago, IL.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Component updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist January 25-26, 2005 Washington DC.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
Working Group updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist May 10-11, 2005 Chicago, ILL.
Resource Management Working Group SSS Quarterly Meeting November 28, 2001 Dallas, Tx.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Progress on Release, API Discussions, Vote on APIs, and Quarterly Report Al Geist May 6-7, 2004 Chicago, ILL.
Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting January 15-16, 2004 Argonne, IL.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting September 11-12, 2003 Washington D.C.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting May 10-11, 2005 Argonne, IL.
OS and System Software for Ultrascale Architectures – Panel Jeffrey Vetter Oak Ridge National Laboratory Presented to SOS8 13 April 2004 ack.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Oak Ridge National Laboratory -- U.S. Department of Energy 1 SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott May 2005, Argonne,
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Kemal Baykal Rasim Ismayilov
A View from the Top Al Geist June Houston TX.
Distributed Accounting Working Group (DAWG) Distributed Accounting Models Research Group Monday, 22 July 2002 Tuesday, 23 July 2002 Edinburgh, Scotland.
ClearQuest XML Server with ClearCase Integration Northwest Rational User’s Group February 22, 2007 Frank Scholz Casey Stewart
TAXII SC Call Agenda Administrivia Month Behind Discussion Month Ahead.
SSS Build and Configuration Management Update February 24, 2003 Narayan Desai
1 Global Design Effort: Controls & LLRF Controls & LLRF Working Group: Tuesday Session (29 May 07) John Carwardine Kay Rehlich.
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
25 April Unified Cryptologic Architecture: A Framework for a Service Based Architecture Unified Cryptologic Architecture: A Framework for a Service.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Process Manager Specification Rusty Lusk 1/15/04.
IPS Infrastructure Technological Overview of Work Done.
“Warehouse” Monitoring Software Infrastructure Craig Steffen, NCSA SSS Meeting June 5, Argonne, Illinois.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
SciDAC CS ISIC Scalable Systems Software for Terascale Computer Centers Al Geist SciDAC CS ISIC Meeting February 17, 2005 DOE Headquarters Research sponsored.
Process Management & Monitoring WG Quarterly Report August 26, 2004.
SQL Database Management
Component updates, SSS-OSCAR Releases, API Discussions, External Users, and SciDAC Phase 2 Al Geist January 25-26, 2005 Washington DC.
A View from the Top Al Geist February Houston TX.
Scalable Systems Software for Terascale Computer Centers
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

Review report, Vote on APIs Quarterly report, and SW release Al Geist June 5-6, 2003 Chicago, IL

Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC SDSC IBM SNL LANL Ames NCSA Cray Intel Unlimited Scale Participating Organizations External reviewers want to see more vendors involved Have begun working with Don Mason and John Lawson to set up a presentation to a vendor forum. Will need your participation when logistics are known

IBM Cray Intel Unlimited Scale Scalable Systems Software Participating Organizations ORNL ANL LBNL PNNL NCSA PSC SDSC SNL LANL Ames Collectively (with industry) define standard interfaces between systems components for interoperability Create scalable, standardized management tools for efficiently running our large computing centers Problem Goals Impact Computer centers use incompatible, ad hoc set of systems tools Present tools are not designed to scale to multi-Teraflop systems Reduced facility mgmt costs. More effective use of machines by scientific applications. Resource Management Accounting & user mgmt System Build & Configure Job management System Monitoring learn more visit

Scalable Systems Software Center February Chicago ILL Review of Last Meeting Details in Main project notebook

Progress Reports at Feb. mtg Al Geist – preparation for external review, SciDAC PI meeting, posters, and demos Working Group Leaders – What areas their working group is addressing Progress report on what their group has done Present problems being addressed Next steps for the group Discussion items for the larger group to consider Discussion of Prototype Components Prep for external review demo Slides can be found in Main Notebook

Consensus and Voting: None at last meeting. Something we need to start doing again.

Scalable Systems Software Center February-June Progress Since Last Meeting

SciDAC PI mtg – all 50 projects March10-11, 2003 – Napa California Attending for Scalable Systems – Al Geist, Brett Bode 20 minute talk – presented by Al Scalable Systems, CCA, PERC, SDM Poster Presentation

External SciDAC Review mtg March12-13, 2003 – Napa California Attending for Scalable Systems – Al Geist, Brett Bode, Paul Hargrove, Narayan Desai, Mike Showerman. (Rusty) Four ISIC Projects were reviewed separately – Scalable Systems, CCA, PERC, SDM External review panel (9 members) Bob Lucas, Jim McGraw, Jose Munoz, Lauren Smith, Richard Mount, Ricky Kendall, Rod Oldehoeft, and Tony Mezzacappa John Grosh Day 1 – We had 1 ¾ hours to present project Day 2 – We got grilled by panel for 1½ hrs

External Review mtg Agenda Wednesday, March 12 7:45Welcome, charge to reviewers 8:15Plenary session for Common Component Architecture ISIC 10: 00Break 10:15Plenary session for Scalable Systems Software ISIC Al Geist gives 1 hr project overview, vision, goals Last 45 minutes team gives demos, answer questions 12:00Reviewer caucus 12:15 Lunch 1:15Plenary session for Scientific Data Management ISIC 3:00Break 3:15Plenary session for Performance Engineering ISIC 5:00 Reviewer caucus 5:30Adjourn

Grid Interfaces Accounting Event Manager Service Directory Meta Scheduler Meta Monitor Meta Manager Scheduler Node State Manager Allocation Management Process Manager Usage Reports Meta Services System & Job Monitor Job Queue Manager Node Configuration & Build Manager Standard XML interfaces Working Components and Interfaces (bold) authentication communication Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite Checkpoint / Restart External Review Demo Validation & Testing Hardware Infrastructure Manager

External Review mtg Agenda Day 2 Thursday, March 13 8:00Meetings between reviewers and ISIC members A. Common Component Architecture B. Scalable Systems Software (Jim McGraw) 9:45Break 10:00Meetings between reviewers and ISIC members C. Scientific Data Management D. Performance Engineering 11:45Reviewer Caucus/End of ISIC Reviews 12:15Lunch (McGraw gives initial assessment) Team brain storms on the response to the initial comments These sent to McGraw that same day.

External Review Initial Comments Response on top two issues: 1. Lack of large-scale testbed for scalable systems Mike Showermann of NCSA says they will have a 900 processor system by late summer that Scalable Systems software could be tested on. He also said there are plans to get an additional 1300 node system. CPlant has also been thrown out as a possible large scale (~1200 processor) test platform. 2. Get more vendors involved and more "buy-in" I will redouble my efforts to get SGI to get back involved in Scalable Systems. HP has been a tough nut to crack, both PSC and PNNL have tried to get them to engage. I'll see if PSC and PNNL are willing to try again. By late summer we will have a beta release of the suite that I can use to demonstrate to vendors our progress and advantages of going the scalable systems path.

Official External Review Report Arrived in May 2003 Official External Review Report Arrived in May 2003 Organizationally the project has developed effective working units The project appears to be on schedule for technical issues It has made several noteworthy accomplishments Recommendations: The two greatest obstacles to the success of this project are the availability of an adequate testbed for proving scalability of the interface design and the willingness of vendors to adopt the design for future systems Secondary considerations Investigate relationship with CCA Investigate File system plan Investigate security plan Importance of fault tolerance at smaller cluster sizes Develop test workloads

Five Project Notebooks filling up A main notebook for general information And individual notebooks for each working group Over 270 total pages – few added since last meeting Add Telecon meeting notes even if short Have had several web server problems this quarter Get to all notebooks through main web site Click on side bar or at “project notebooks” at bottom of page

Bi-Weekly Working Group Telecons Have been sparse since March review Bi-Weekly Working Group Telecons Have been sparse since March review Resource management, scheduling, and accounting Tuesday 3:00 pm (Eastern) keyword “SSS mtg” Validation and Testing (hasn’t met since last year) Wednesday 1:00 pm (Eastern) mtg code Proccess management, system monitoring, and checkpointing Thursday 1:00 pm (Eastern) mtg code Node build, configuration, and information service Thursday 3:00 pm (Eastern) mtg code (changes)

Scalable Systems Software Center February 24-25, 2003 This Meeting

Major Topics this Meeting MICS request for Highlights – Fred sent out a call for 2 page highlights due to MICS by June 12. Has anyone responded? I sent in our 2 pager Response to Reviewers Report – need feedback from the team on our official response to the points in the report Quarterly Report Due – would like to get one to Fred by end of June. Will need text from WG leaders. Formal API presentations and voting - it is that time in the project when we should be settling on some APIs. SC2003 Tutorial - proposal submitted at Fred’s request. Have a software suit released before SC2003

Agenda – June 5 8:30 Al Geist – Project Status. Qtr report coming up and External review report 9:00 Matt Sottile – Using Scalable Systems API Working Group Reports 9:30Scott Jackson – Resource Management 10:30 Break 11:00 Erik Debenedictis – Validation and Testing 12:00 Lunch (on own - walk to cafeteria) 1:00 Paul Hargrove – Process Management + Rusty slides 1:30 Craig Stefan – Warehouse Monitoring framework 2:00 Narayan Desai – Node Build, Configure Stephen Scott – OSCAR release with SSS inside 3.00 Break 3:30 Presentation of formal APIs for discussion 5:00 Rusty, Scott, Narayan, Paul? 5:30 Adjourn Working groups may wish to prepare material for voting Friday

Agenda – June 6 8:30 Discussion, proposals, straw votes Discussion of review report API proposals for envelope 10:30 Break 11:00 Al Geist – Summary Qtr Report. next meeting date:. 12:00 meeting ends

Meeting notes Matt Sottile – bproc (bstat_sss) software integrated with cluster status component Good (was able to do it in a day) bad (shouldn’t take 8 hours) ugly (python) Example with distribution didn’t help much. XML isn’t well documented But it is a prototype distribution so some of these issues are expected Major gripes had to write code for Socket code and XML parsing and creation These should be APIs – He then talks about Linux TCP being a hack XML parsing – the schema and associated parser are intimately related Noted that code had some constructs that could be made more robust CCA thoughts on relation to our project His expertise is language interoperability and runtime frameworks Law of least surprises. Consistency is good Insulate developers from the support structure Components the wheel everyone continues to reinvent But is SSS there aren’t components – just XML and wire protocol CCA provides: SIDL, standard interfaces to runtimes – CCAFFEINE, CCAT, Dune, … Suggests: Could try to leverage CCA messaging layer, Define interfaces in SIDL, Build services that conform to SIDL. CCA provides no security Concentrate on interfaces and problem of mapping concrete services into the interface space of SSS Conclusion: Clean up APIs to minimize possibilities for version skew. Too late to adopt CCA model Overall things worked – a good accomplishment Showed demo

Meeting notes Scott Jackson – RM wg report Progress – SSS front end created for Qbank, Soon Release v1.0 Open PBS, Maui, and Qbank all with SSS XML front end. Created Job Object specification v2.0 Created SSSRMAP v 2.0 – in notebook Scheduler progress: 40% of clients now using SSSRMAP, supports dynamic reservations to support growing and shrinking MPI jobs Security- support for a user specified keyfile Fault tolerance – implemented a fallback server Ease of use- initial web-GUI developed Oueue Manager Progress – updated service directory and event manager interfaces Accounting and allocation manager progress – GOLD All functionality of Qbank plus support for deposits, support for hierarchical accounts, support for refunds, guaranteed quotes, negotiation of options Added role-based access control, authentication, and encryption Got PNL OK to open source as BSD, sent to Fred for DOE OK Will talk about SSSRMAP v2 details this afternoon, in particular interfaces to other working group components.

Meeting notes Will McLendon – Validation and testing WG update Strategies for distributed runtime system testing – users expect high quality ESP benchmark – out of NERSC used in procurement to predict the effectiveness of a system before it is purchased. Could be used to test the SSS suite Consider putting ESP on the SSS testbed(s) APItest – most of the work this quarter is going on here. Recoded in Python for portability (C++ version had portability problems) Integrated into SSSlib as part of the distribution Tests well under develoment for ssslib components Status slide shows working, prototype, and planned features Black box testing – does component support the API White box testing – coverage tests, internal states of component, unreachable states Encoding XML inside XML is a problem Ran real demos of APItest running on Chiba City MySQL database support – used to store raw test results Work still to do – see status slide

Meeting notes Paul Hargrove – used my laptop for presentation – see slides Checkpoint/restart progress is stalled because person has been pulled off our project by Bill McCurdy to work on NERSC projects. Craig Steffan – Warehouse Monitoring Software Infrastructure Describes the old way cluster monitor worked and scalability issues with it Presents new design – each node is a peer each can be root of subtree They can be grouped into “information storehouses” w/ multiple sources and sinks Showed how it can be used to monitor multiple clusters in a compute center Information storehouse infrastructure is done. Sources and Sinks – next step will be to write simple ones, then more complex Lots of questions about the design. Good answers from Craig Only update changing information In next 6 months - Self balancing systems by tuning update intervals and Message passing to request information through the tree

Meeting notes Narayan Desai – BCWG report All APIs changed to restriction syntax – draft spec Service directory – new schema and new implementation Event manager – same SSSlib – more wire protocol modules – SSL, SSSRMAP OSX port in progress Build and configuration now has diagnostic services Hardware infrastructure issues discussed – what does system look like right now? Open issues specification formats – what tests does it need to pass release formats – see OSCAR slides XML interface formats multiple implementations Thomas Naughton – SSS deployment using OSCAR How users download and install SSS suite? Propose leverage OSCAR framework OSCAR core – SIS, C3, ODA, Env-Switcher OSCAR package facility – RPMs and other package classes OSCAR package loader Seems to be consensus of group to do this for SC2003

Meeting notes Rusty Proposal – an API for the Process Mangement Component He says the material is not quite in the form needed to vote on, but here is the process we should follow to vote in standard APIs Voting should be on a document that has descriptions examples both simple and complex Details of XML schema See his slides for details of his process manager interface proposal Much discussion. Scott Jackson – SSSRMAP v2 proposal Have taken an object oriented approach to jobs and attributes Goes over Basic examples in proposal (found in RM notebook) Discuss of the differences between RM Schema and BC Schema Part of the difference is the incorporation of security Another part is functional vs object oriented Discussion of outer (envelope, signature, body) framing and put in SSSlib (vote)

Meeting notes Day 2 Al Geist – action items 1. Need Working group leaders to send me a couple pages for the Qtr rpt Status and Progress from Feb-June 2. Any comments on points in the external reviewers report. Paragraph or two is fine.

Meeting notes Narayan Desai – Restriction syntax proposal Goes over basic command syntax where an attribute can be “*” wildcarded Goes over complex command syntax Matching semantics – especially for wildcards Benefits of this approach – compact, powerful, simple syntax, validatable, data ownership is explicit Uses MySQL on the backend This syntax has Constructive Normal Form Discussion that need to add negation before this is true What about regular expression support? – More discussion on how to do various things like “join” and “union” Discussion of the Communication Infrastructure Spec Draft (hardcopy handed out) We should be able to hardwire components together. Existence of static file to define where things are – may just have service directory Uunix Domain socket protocol for SMP servers Vote – accept the spec pending Yes 15, No, 0 abstaning 0

Meeting notes Paul – Discusses the idea of hiding the socket code in a library Matt says he would be happy to contribute such a server. Discussion of scalability of the event manager – not a problem because the Number of meatballs does not increase with system size. Question about the Ordering of events notification Scott – Lively discussion of the two XML variants What are the strengths and weakness of both Agreement for having common error objects with 3 digit codes and messages Message is human readable string. Two special ones 000 success 999 unknown Straw vote: 15 no 1 Abs 0 Add “supported scheme version” to Service directory Vote: 15 no 0 Abs 0 Next meeting September 9-10 in DC so Fred can attend?