Data Management Overview David M. Malon Argonne NSF/DOE Review of U.S. ATLAS Physics and Computing Project NSF Headquarters 20 June 2002
David M. Malon, ANL NSF/DOE Review 2 Outline Technology transition Architecture and design Support for data challenges LHC-wide common projects Database support for detector description Other collaborative efforts Summary
20 June 2002 David M. Malon, ANL NSF/DOE Review 3 Technology transition Objectivity/DB is the persistence technology for Data Challenge 0, but will be phased out this year Retained as a reference implementation for a short time thereafter No Objectivity support to be provided by database group after December 2002 Technology strategy is to adopt LHC-wide common persistence infrastructure (hybrid relational and ROOT-based streaming layer) as soon as this is feasible A U.S.-developed ROOT-based conversion service provides the persistence technology for at least Phase I of Data Challenge I This, too, will be phased out when common project software is sufficiently capable
20 June 2002 David M. Malon, ANL NSF/DOE Review 4 Technology transition (2) ATLAS architectural separation of transient and persistent representations has meant that the transition has been relatively painless for physicists and physics software developers Not so painless for the database group, partly because of the need to support multiple technologies simultaneously with limited personpower But AthenaROOT conversion services provide valuable prototyping for LHC common project work Short-term problem in any case Complicated by need to support data access inside and outside Athena Remember that Geant3 simulations are still in FORTRAN When ADL (see David Quarrie’s talk) is more mature, such transitions should be substantially easier
20 June 2002 David M. Malon, ANL NSF/DOE Review 5 Architecture and design U.S.-led effort produced an event store architecture document last fall Since last review, a U.S.-led effort produced a hybrid (relational/streaming) event store design document, using the architecture document as a starting point Represents the most detailed thinking among any of the LHC experiments about how to build a hybrid store Circulated to other LHC software architects, and the principal subject of an April database workshop in Orsay CERN IT/DB and ROOT team experts attended as well Not all of the ideas will survive an LHC-wide common project, but many will, and they provide a non-trivial starting point for LHC-wide discussions in any case
20 June 2002 David M. Malon, ANL NSF/DOE Review 6 Support for data challenges Data Challenge 0 is not yet complete, while Phase 0 of Data Challenge 1 is well underway(!) It’s a good thing, though, that ATLAS is not just declaring success and closing DC0 without true continuity tests—these should be the true legacy of DC0 (acceptance tests for later user releases) Database group is supporting two different persistence technologies (Objectivity/DB and AthenaROOT) for these data challenges Also supporting event generation for both data challenges, to different extents Seizing the opportunity to introduce grid project technologies into ATLAS data challenges Magda from PPDG Virtual data ideas from GriPhyN in event generation and simulation recipes—even in advance of the release of GriPhyN VDL toolkit(!)
20 June 2002 David M. Malon, ANL NSF/DOE Review 7 Support for data challenges (2) U.S. database group has been trying to avoid getting dragged into still more data challenge responsibilities—not so easy CERN-based database effort (Goossens, Smirnov), though, has been largely lost to the data challenges ATLAS Data Challenge Coordinator (Poulard) is also the CERN group leader
20 June 2002 David M. Malon, ANL NSF/DOE Review 8 LHC-wide common projects First RTAG (Requirements Technical Assessment Group) commissioned by SC2 was to try to find sufficient common ground for an LHC-wide project to deliver a shared persistence infrastructure RTAG membership: Brun (ROOT), Duellmann (IT/DB), Innocente (CMS), Malon (ATLAS; convenor), Mato (LHCb), Rademakers (ALICE) Succeeded in producing a document and achieving consensus sufficient to launch a common project Final report delivered 5 April 2002 Proposes ROOT-based streaming layer plus a relational database layer Persistence project launch workshop held 5-6 June 2002 at CERN Quarrie and Malon also represent ATLAS in the LCG Architects Forum (and there are other RTAGs)
20 June 2002 David M. Malon, ANL NSF/DOE Review 9 LHC common persistence infrastructure (POOL) Workshop produced agreement to attempt to meet a rather aggressive schedule—a September 2002 release with non-trivial functionality, and a Spring 2003 release sufficient to support serious data challenges ATLAS database group is fully committed to contributing to this effort and to adopting this technology Plan is to introduce an Athena conversion service based upon POOL immediately after its release U.S. will contribute approximately 2 FTEs to this effort, if funding permits; more, if possible, as Objectivity responsibilities wane Orsay has expressed an intention to contribute ~1 FTE
20 June 2002 David M. Malon, ANL NSF/DOE Review 10 U.S. contributions to POOL We are attempting to avoid mission creep in the common project as well, by participation in selected clearly defined work packages Common project event collections and collection management (ANL) Persistence for non-ROOT objects (BNL) Craig Tull (LBNL) is also contributing to dictionary effort (strongly related to ADL work) Both ANL and BNL (Malon, Adams) will continue to contribute to overall common project architecture and design We have already established liaisons with Fermilab-based CMS contributors to the common project (Joshi, Tanenbaum) Hope to reuse ideas from HENP Grand Challenge project (ANL, BNL, LBNL) that delivered order-optimized iteration over event data for STAR in queryable event collection effort
20 June 2002 David M. Malon, ANL NSF/DOE Review 11 Database support for detector description Most database effort to date has been directed toward event store, but detector description data can no longer be ignored Persistence of detector description read from Zebra output of simulation already possible in Objectivity/DB Real support needed for September ATLAS release U.S. will deliver access to “primary numbers”—numbers that parameterize ATLAS geometry description—via a conversion service that respects the Gaudi/Athena architecture Numbers are resident in a MySQL database Approach strongly leverages NOVA work, funded at BNL as an LDRD project
20 June 2002 David M. Malon, ANL NSF/DOE Review 12 Other collaborative efforts—conditions databases Strategy has been to use IT-provided conditions database if possible, rather than writing such a service ourselves IT implementations, though, are in Objectivity/DB and Oracle9i Lisbon ATLAS group has delivered a MySQL implementation for TDAQ community; we have enlisted their help, and encouraged them to contribute this to LCG repository (done!) U.S. database group has refrained from work in this area in an effort to avoid overcommitment, but real work is needed soon to connect conditions infrastructure to Athena Have asked the Orsay group to do this
20 June 2002 David M. Malon, ANL NSF/DOE Review 13 Summary In the midst of a major technology transition while supporting data challenges, with no reserve personpower No replacement for Ed Frank (Chicago), and U.S. database group is funded thus far well below agency guidance Committed to ensuring the success of the LCG persistence project, and to using the resulting infrastructure as the principal ATLAS persistence technology Relying upon joint projects and leveraging other projects wherever possible (LCG (POOL), CERN IT (conditions), PPDG (Magda), GriPhyN (virtual data), LDRD (NOVA for primary numbers), HENP Grand Challenge (POOL event collections and iterators), …)