Download presentation
Presentation is loading. Please wait.
Published byCassandra Loraine Gibson Modified over 9 years ago
1
Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu Metronome and The NMI Lab: This subtitle included solely to steal the “longest title” award from Ewa, who thought she won it this morning with, “Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National Cyberinfrastructure”
2
CondorProject.org Decision Time › Past Quick Review: why, what, who › Present Current status, new this year › Future Future plans, new next year
3
CondorProject.org Why: The Problem › Good distributed computing (“grid”) software is… badly needed hard to find hard to build and test
4
CondorProject.org The Fix (Part of it, anyway) › Good build/test cycle › To be good, build/test process must be… frequent reliable automatic repeatable
5
CondorProject.org The (Next) Problem › Building and testing distributed computing software requires… Distributed resources Not always in-house, not always dedicated to builds I.e., shared, scheduled resources Unless you have a spare Blue Gene lying around… and an old Alpha running RedHat 7.2… and an HPUX 11 box… and an Itanium running Scientific Linux 3 (CERN-flavored) … and… Distributed testbeds, tests Not: “the grid works on my machine… ship it!”
6
CondorProject.org Grid Build and Test › Building and testing distributed computing software brings distributed challenges… Complex workflows, cross-site/project/user scheduling priorities, data management, fault- tolerance, failure recovery A lot like “real” distributed computing Tinderbox or the latest Web 2.0 build system doesn’t cut it › Deep, integrated software stacks Distributed providers
7
CondorProject.org How We Do It › Use proven grid software to build and test new grid software › “Condor works, let’s use Condor” › Metronome is our second-generation build/test framework built on top of Condor, DAGMan, and other distributed computing technologies › NSF-funded
8
CondorProject.org Metronome Principles › Tool-independent › Lightweight › Encourage explicit, well-controlled build/test environments › Central results repository › Fault-tolerance › Support platform-neutral and platform-specific tasks › Build/test separation
9
Metronome MySQL Results DB Web Portal Finished Binaries Customer Source Code Condor Queue NMI Build & Test Software Customer Build/Test Scripts INPUT OUTPUT Distributed Build/Test Pool Spe c File DAGMan DAG results build/test jobs DAG results
10
CondorProject.org NMI Lab Dedicated, heterogeneous distributed computing facility Opposite extreme from typical “cluster” -- instead of 1000’s of identical CPUs, we have a handful of CPUs each for 50+ platforms. Much harder to manage! You try finding a monitoring tool that works on 50 platforms! › Carefully-controlled resources No mystery meat
11
CondorProject.org The Team › Subset of the Condor Team Becky Gietzel, master of all things NMI Todd Miller, new guy on the block Andy Pavlo, part-timer, short-timer Ken Hahn, sysadmin to the stars Me
12
CondorProject.org Dogfood and Hats › Eating our own dogfood… Condor builds failed last weekend (true!) Condor developers complained to NMI Lab (“your build system failed… fix it!”) NMI Lab discovered Condor bug (“hmm…”) NMI Lab complained to Condor developers (“your software failed… fix it!”) › Feel the love!
13
CondorProject.org The Past Year: What We Did on Our Summer Vacation
14
CondorProject.org New Name! › Before: NMI Build & Test System, NMI Build & Test Software, NMI Build & Test Framework, NMI Software, NMI Build & Test Lab, UW-Madison Build & Test Lab, Build & Test Lab at UW-Madison › After: Metronome + the NMI Lab › Why? Old names were a mouthful Clear separation between the software framework (Metronome) and the facility (the NMI Lab)
15
CondorProject.org Real Work › Extremely Productive Collaborations TeraGrid: production Metronome deployment using dynamically provisioned resources ETICS, OMII: building higher-level services to generate and manage build/test jobs across an international federation of Metronome deployments › Extremely Productive Users Condor, TeraGrid, Open Science Grid / VDT, Globus, NCSA (MyProxy), SDSC (SRB), LIGO, many others in this room…
16
CondorProject.org New Metronome Capabilities › “Productization”, customization for other sites › Parallel testing Enables dynamic, co-scheduled, distributed testbeds! › Automatic cross-site job migration Run your own local Metronome pool with access to ours for exotic platforms › Many smaller features and extensions for production users -- users drive development › More bugs fixed than introduced!
17
CondorProject.org New NMI Lab Capabilities › More platforms “always with the platforms…” new Itanium platforms, NLOTW (New Linux of the Week), additional vendor Unix machines, etc. Now over 50 (!) platforms › Improved Lab Management No, not me… better design and automation of systems & their administration
18
CondorProject.org Future
19
CondorProject.org The Plan: Metronome › “Support, maintain, enhance” VM--I mean slot--no wait, I mean VM support Enhanced parallel testing support Custom testbed environments (network, etc.) Dynamic deployments (glide-in) Advanced scheduling policies Scalability testing enhancements Better docs/installation/management
20
CondorProject.org The Plan: NMI Lab › “Support, maintain, enhance” More platforms, always with the platforms More capacity VM servers for… Root-level testing On-demand platforms Federation with other Metronome labs Better support, smoother management, less downtime New sysadmin starting in June: take a bow, Ross!
21
CondorProject.org You › Want to use it? › Metronome › The NMI Lab › http://nmi.cs.wisc.edu/
22
CondorProject.org Feedback › When we started, the state of the art was unimpressive (almost non-existant)… we had to build our own › More build tools now exist -- if you know & like one of them, what do you like about it? › We’d like to better understand what we do well, what we don’t, and how we can integrate with other systems you find useful…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.