Download presentation
Presentation is loading. Please wait.
Published byEmil Reeves Modified over 9 years ago
1
Keeping Your Software Ticking Testing with Metronome and the NMI Lab
2
Background: Why (In a Slide!) Grid Software: Important to Science and Industry Quality of Grid Software: Not So Much Testing: Key to Quality Testing Distributed Software: Hard Testing Distributed Software Stacks: Harder Distributed Software Testing Tools: Nonexistent (before) We Needed Help, We Built Something to Help Ourselves and Our Friends, We Think It Can Help Others
3
Background: What (In a Slide!) A Framework and Tool: Metronome – Lightweight, built atop Condor, DAGMan, and other proven distributed computing tools – Portable, open source – Language/harness independent – Assumes >1 user, >1 project, >1 environment needing resources at >1 site. – Encourages explicit, well-controlled build/test environments for reproducibility – Central results repository – Fault-tolerant – Encourages build/test separation A Facility: The NMI Lab – 200+ cores, 50+ platforms @ UW (Noah’s Ark; the Anti-Cluster) – Built to use distributed resources at other sites, grids, etc. – 200 users, dozens of registered projects (most of them “real”) – 84k builds & tests managed by 1M Condor jobs, producing 6.5M tracked tasks in the DB A Team – Subset of Condor Team: Becky Gietzel, Todd Miller, Ross Oldenburg, myself. (More coming.) A Community – Working with TeraGrid, OSG, ETICS, others towards a common intl. build/test infrastructure.
4
MySQL Results DB Web Status Pages Finished Binaries Customer Source Code Condor Queue Metronome Customer Build/Test Scripts INPUT OUTPUT Distributed Build/Test Pool Spec File DAGMan DAG results build/test jobs DAG results Metronome Architecture (In a Slide!)
5
Why Is This Architecture Powerful? Fault tolerance, resource management. Real scheduler, not a toy or afterthought. Flexible workflow tools. Nothing to deploy in advance on worker nodes except Condor – can harness “unprepared” resources. Advanced job migration capabilities – critical for goal of a common build/test infrastructure across projects, sites, countries.
6
Example: NMI Lab / ETICS Site Federation with Condor-C
7
10k Foot View Past: – humble beginnings, ragtag crew of developers making building & testing easier for the projects around them (Condor, Globus, VDT, Teragrid...) Present: – now we have tax money and users should have higher expectations – good news: six months into a new 3y funding cycle, our "professionalism" has improved from our humble beginnings -- better hardware, better processes, better staffing – bad news: we’re still a bit ragtag -- inconsistent support/development request tracking, inconsistent info on resource/lab improvements, issues, and resolution, generally reactive to problems – we're clearly contributing to the build & test capabilities of the community, but we’d like to deliver much more, especially WRT testing.
8
10k Foot View: Future Maintain Metronome and the NMI Lab – continue to professionalize lab infrastructure, improve availability, stability, uptime – Better monitoring -> more proactive response to issues – Better scheduling of jobs, better use of VMs to respond to uneven x86 platform demand Enhance Metronome and the NMI Lab – New features, new capabilities – but might be less important than clarity, usability, fit & finish of existing features.
9
10k Foot View: Future Support Metronome and the NMI Lab – more systematic support operation (ticketing, etc.) – more utilization of basic testing capabilities by new users – more utilization of advanced testing capabilities by existing users – more & better information for users, admins, and pointed-haired bosses better reporting on users, resources, usage, operations, etc. Nurture Distributed Software Testing Community – to identify common B&T needs to improve software quality. – to challenge and help us to provide software & services to help meet B&T needs. – Tuesday’s meeting was a good start, I hope…
10
Maslow’s Pyramid of Testing Needs
11
Testing Opportunities more resources == more possibilities (just like science) – don’t just test under normal conditions, test the not-so-edge cases too (e.g., with CPU load!) – test everywhere your users run, not just where you develop – old/exotic/unique resources you don’t own (NMI Lab, TeraGrid) “black box” – run your existing tinderbox, etc. test harness inside Metronome decoupled builds & tests – run new tests on old builds – cross-platform binary compatibility testing – run quick smoke tests continuously, heavy tests nightly, performance/scalability tests before release
12
Testing Opportunities managed (static) vs. “unmanaged” (auto-updating) platforms – isolate your changes from the OS vendors – test your changes against a fixed target – test your working code against a moving target root-level testing automated reports from testing tools – ValGrind, Purify, Coverity, etc. cross-platform binary testing (build on A, test on B)
13
Testing Opportunities Parameterized dependencies – build with multiple library versions, compilers, etc. – test against every Java VM, Maven, Ant version around – test against different DBs (MySQL, Postgres, Oracle, etc.), VM platforms (Xen, VMWare, etc.), batch systems – make sure new versions of Condor, Globus, etc. don’t break your code Parallel scheduled testbeds – cross-platform testing (A to B) – deploy software stack across many hosts, test whole stack – multi-site testing (US to Europe) – network testing (cross-firewall, low-bandwidth, etc.) – scalability testing
14
Upshot This is all work we’d like to help this community do. Start small -- automated builds are an excellent start. Think big -- what kinds of testing would pay dividends? Let us know what we can do to help make it happen.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.