Peter F. Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison

Slides:



Advertisements
Similar presentations
© 2006 Open Grid Forum Build, Test and Certification of Grid and distributed software Community Group Current practices and short term plans in Building,
Advertisements

Configuration management
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Introduction to Software Testing
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Keeping Your Software Ticking Testing with Metronome and the NMI Lab.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Release Process Maria Alandes Pradillo.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
SPI Software Process & Infrastructure EGEE France - 11 June 2004 Yannick Patois
EGEE is a project funded by the European Union under contract IST JRA1 Testing Activity: Status and Plans Leanne Guy EGEE Middleware Testing.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Metronome and The NMI Lab: This subtitle included solely to.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
EMI INFSO-RI EMI Quality Assurance Processes (PS ) Alberto Aimar (CERN) CERN IT-GT-SL Section Leader EMI SA2 QA Activity Leader.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
Peter F. Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Condor Team Welcome to Condor Week #10 (year #25 for the project)
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor : A Concept, A Tool and.
ETICS All Hands meeting Bologna, October 23-25, 2006 NMI and Condor: Status + Future Plans Andy PAVLO Peter COUVARES Becky GIETZEL.
INFSO-RI Enabling Grids for E-sciencE The gLite Software Development Process Alberto Di Meglio EGEE – JRA1 CERN.
Continuous Integration and Code Review: how IT can help Alex Lossent – IT/PES – Version Control Systems 29-Sep st Forum1.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Evolution of Grid Projects and what that means for WLCG Ian Bird, CERN WLCG Workshop, New York 19 th May 2012.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Packaging & Testing: NMI & VDT.
Feedback from the POOL Project User Feedback from the POOL Project Dirk Düllmann, LCG-POOL LCG Application Area Internal Review October 2003.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Confidential Continuous Integration Framework (CIF) 5/18/2004.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Testing Grid Software on the Grid Steven Newhouse Deputy Director.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
Software Engineering Chapter: Computer Aided Software Engineering 1 Chapter : Computer Aided Software Engineering.
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
2005 GRIDS Community Workshop1 Learning From Cyberinfrastructure Initiatives Grid Research Integration Development & Support
Improving Software with the UW Metronome Becky Gietzel Todd L Miller.
INFSO-RI Enabling Grids for E-sciencE The gLite Software Development Process Alberto Di Meglio EGEE – JRA1 CERN.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
U.S. Grid Projects and Involvement in EGEE Ian Foster Argonne National Laboratory University of Chicago EGEE-LHC Town Meeting,
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
EMI INFSO-RI EMI Quality Assurance Tools Lorenzo Dini (CERN) SA2.4 Task Leader.
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
INFSOM-RI ETICS: E-infrastructure for Testing, Integration and Configuration of Software Alberto Di Meglio Project Manager.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
INFSOM-RI WP2: Infrastructure and Service Management Status Report ETICS All-Hands – 29 May 2006 Peter F. Couvares ETICS WP2 Leader.
1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
OSG Facility Miron Livny OSG Facility Coordinator and PI University of Wisconsin-Madison Open Science Grid Scientific Advisory Group Meeting June 12th.
DECTRIS Ltd Baden-Daettwil Switzerland Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)
INFSOM-RI Elisabetta Ronchieri INFN CNAF ETICS 2 nd EU Review (CERN) 15 February 2008 WP3 - Software Configuration Tools and Methodologies.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Interoperability via common Build & Test (BaT)
Bob Jones EGEE Technical Director
E-Infrastructure for Testing, Integration and Configuration of Software Alberto Di Meglio CERN, INFN, Engineering, 4D Soft, University of Wisconsin.
Marc-Elian Bégin ETICS Project, CERN
WP2: Infrastructure and Service Management
Leanne Guy EGEE JRA1 Test Team Manager
Building and Testing using Condor
Introduction to Software Testing
Leigh Grundhoefer Indiana University
Module 01 ETICS Overview ETICS Online Tutorials
Presentation transcript:

Peter F. Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison Condor Build & Test: NMI, OMII, ETICS

How the Condor Team Got Started in the Build/Test Business: Prehistory › Oracle shamed^H^H^H^H^H^Hinspired us. › The Condor team was in the stone age, producing modern software to help people reliably automate their computing tasks -- with our bare hands. Every Condor release took weeks/months to do. Build by hand on each platform, discover lots of bugs introduced since the last release, track them down, re-build, etc.

What Did Oracle Do? › Oracle selected Condor as the resource manager underneath their Automated Integration Management Environment (AIME) › Relied on to perform automated build and regression testing of multiple components for Oracle's flagship Database Server product. › Oracle chose Condor because they liked the maturity of Condor's core components.

Doh! › Oracle used distributed computing to automate their build/test cycle, with huge success. › If Oracle can do it, why can’t we? › Use Condor to build Condor! › NSF Middleware Initiative (NMI) right initiative at the right time! opportunity to collaborate with others to do for production software developers like Condor what Oracle was doing for themselves important service to the scientific computing community

NMI Statement › Purpose – to develop, deploy and sustain a set of reusable and expandable middleware functions that benefit many science and engineering applications in a networked environment › Program encourages open source software development and development of middleware standards

Why should you care? From our experience, the functionality, robustness and maintainability of a production-quality software component depends on the effort involved in building, deploying and testing the component. If it is true for a component, it is definitely true for a software stack Doing it right is much harder than it appears from the outside Most of us had very little experience in this area

Goals of the NMI Build & Test System › Design, develop and deploy a complete build system (HW and SW) capable of performing daily builds and tests of a suite of disparate software packages on a heterogeneous (HW, OS, libraries, …) collection of platforms › And make it: Dependable Traceable Manageable Portable Extensible Schedulable

The Build Challenge › Automation - “build the component at the push of a button!” always more to it than just “configure” & “make” e.g., ssh to right host; cvs checkout; untar; setenv, etc. › Reproducibility – “build the version we released 2 years ago!” Well-managed & comprehensive source repository Know your “externals” and keep them around › Portability – “build the component on nodeX.cluster.com!” No dependencies on “local” capabilities Understand your hardware & software requirements › Manageability – “run the build daily on 15 platforms and me the outcome!”

The Testing Challenge › All the same challenges as builds (automation, reproducibility, portability, manageability), plus: › Flexibility “test our RHEL4 binaries on RHEL5!” “run our new tests on our old binaries” important to decouple build & test functions making tests just a part of a build -- instead of an independent step -- makes it difficult/impossible to: run new tests against old builds test one platform’s binaries on another platform run different tests at different frequencies

“Eating Our Own Dogfood” › What Did We Do? We built the NMI Build & Test Lab on top of Condor, DAGMan, and other distributed computing technologies to automate the build, deploy, and test cycle. To support it, we’ve had to construct and manage a dedicated, heterogeneous distributed computing facility. Opposite extreme from typical “cluster” -- instead of 1000’s of identical CPUs, we have a handful of CPUs each for ~40 platforms. Much harder to manage! You try finding a sysadmin tool that works on 40 platforms! We’re just another big Condor user If Condor sucks, we feel the pain.

NMI Build & Test Facility MySQL Results DB Web Portal Finished Binaries Customer Source Code Condor Queue NMI Build & Test Software Customer Build/Test Scripts INPUT OUTPUT Distributed Build/Test Pool Spe c File DAGMa n DAG results build/test jobs DAG results

Numbers 100 CPUs 39HW/OS “Platforms” 34OS 9 HW Arch 3 Sites ~100GB of results per day ~1400 Builds/tests per month ~350 Condor jobs per day

Condor Build & Test › Automated Condor Builds Two (sometimes three) separate Condor versions, each automatically built using NMI on platforms nightly Stable, developer, special release branches › Automated Condor Tests Each nightly build’s output becomes the input to a new NMI run of our full Condor test suite › Ad-Hoc Builds & Tests Each Condor developer can use NMI to submit ad-hoc builds & tests of their experimental workspaces or CVS branches to any or all platforms

More Condor Testing Work Advanced Test Suite Using binaries from each build, we deploy an entire self-contained Condor pool on each test machine Runs a battery of Condor jobs and tests to verify critical features Currently >150 distinct tests each executed for each build, on each platform, for each release, every night Flightworthy Initiative Ensuring continued “core” Condor scalability, robustness NSF funded, like NMI Producing new tests all the time

NMI Build & Test Customers › NMI Build & Test Facility was built to serve all NMI projects › Who else is building and testing? Globus NMI Middleware Distribution many “grid” tools, including Condor & Globus Virtual Data Toolkit (VDT) for the Open Science Grid (OSG) 40+ components Soon TeraGrid, NEESgrid, others…

Build & Test Beyond NMI › We want to integrate with other, related software quality projects, and share build/test resources... an international (US/Europe/China) federation of build/test grids… Offer our tools as the foundation for other B&T systems Leverage others’ work to improve out own B&T service

OMII-UK Integrating software from multiple sources Established open-source projects Commissioned services & infrastructure Deployment across multiple platforms Verify interoperability between platforms & versions Automatic Software Testing vital for the Grid Build Testing – Cross platform builds Unit Testing – Local Verification of APIs Deployment Testing – Deploy & run package Distributed Testing – Cross domain operation Regression Testing – Compatibility between versions Stress Testing – Correct operation under real loads Distributed Testbed Need a breadth & variety of resources not power Needs to be a managed resource – process

NMI/OMII-UK Collaboration › Phase I: OMII-UK developed automated builds & tests using the NMI Build & Test Lab at UW- Madison › Phase II: OMII-UK deployed their own instance of the NMI Build & Test Lab at Southampton University Our lab at UW-Madison is well and good, but some collaborators want/need their own local facilities. › Phase III (in progress): Move jobs freely between UW and OMII-UK B&T labs as needed.

Next: ETICS Build system, software configuration, service infrastructure, dissemination, EGEE, gLite, project coord. Software configuration, service infrastructure, dissemination Web portals and tools, quality process, dissemination, DILIGENT Test methods and metrics, unit testing tools, EBIT NMI Build & Test Framework, Condor, distributed testing tools, service infrastructure

ETICS Project Goals › ETICS will provide a multi-platform environment for building and testing middleware and applications for major European e-Science projects › “Strong point is automation: of builds, of tests, of reporting, etc. The goal is to simplify life when managing complex software management tasks” One button to generate finished package (e.g., RPMs) for any chosen component › ETICS is developing a higher-level web service and DB to generate B&T jobs -- and use multiple, distributed NMI B&T Labs to execute & manage them This work complements the existing NMI Build & Test system and is something we want to integrate & use to benefit other NMI users!

ETICS Web Interface

OMII-Japan What They’re Doing “…provide service which can use on-demand autobuild and test systems for Grid middlewares on on-demand virtual cluster. Developers can build and test their software immediately by using our autobuild and test systems” Underlying B&T Infrastructure is NMI Build & Test Software

This was a Lot of Work… But It Got Easier Each Time › Deployments of the NMI B&T Software with international collaborators taught us how to export Build & Test as a service. › Tolya Karp: International B&T Hero Improved (i.e., wrote) NMI install scripts Improved configuration process Debugged and solved a myriad of details that didn’t work in new environments

What This Means For You › NMI B&T Lab Deployment Experience + Improved Packaging + Improved Portability… › We now have unique ability to give you not only source code, but a whole production build & test infrastructure to go along with it › … and we have done it for a number of users already

New Condor+NMI Users › Yahoo First industrial user to deploy NMI B&T Framework to build/test custom Condor contributions › Hartford Financial Deploying it as we speak…

What’s to Come › More US & international collaborations OMII-Europe More Industrial User/Developers… › New Features Becky Gietzel: parallel testing! Major new feature: multiple co-scheduled resources for individual tests Going beyond multi-platform testing to cross- platform parallel testing › UW-Madison B&T Lab: ever more platforms “it’s time to make the doughnuts” Questions?