Download presentation
Presentation is loading. Please wait.
Published byDeirdre Chandler Modified over 9 years ago
1
ATLAS Offline Database Architecture for Time-varying Data, with Requirements for the Common Project David M. Malon LCG Conditions Database Workshop CERN, Geneva, Switzerland 8 December 2003
2
David M. Malon, ANL LCG Conditions Database Workshop 2 Architectural principles All data with a time interval (or run interval) of validity are managed via the same temporal database infrastructure Sometimes people distinguish between conditions and configurations and other kinds of detector description, but (offline) users see no difference in the machinery one uses to get the conditions or the configuration in effect when an event was taken. We refer to the underlying database infrastructure as an interval of validity database (IOV database) rather than a conditions database for two reasons: so as not to prejudge the types of data accessible via this means, and because it is principally a temporal database: conditions data may not reside within this database, but rather, may be stored externally to the database hosting the interval-of-validity infrastructure
3
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 3 Architectural principles It must be possible to assign an interval of validity to any data object accessible to standard execution frameworks (Athena, for ATLAS), independent of the technology used to store that object. Storing an object, and assigning a validity interval to it, may be (widely) separated in time. Example 1: Online, a configuration may be chosen from a portfolio of stored configurations, each with no inherent interval of validity. Example 2: The expert who updates the muon geometry does not know the range of simulation runs for which it will be used.
4
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 4 Architectural principles It must not be necessary to copy an object in order to assign an interval of validity to it. Example 1: A reference to the configuration selected online, which may reside in a relational database, is registered with a range of test beam runs as the interval of validity Example 2: A reference to the muon geometry, which may be described in an XML file, is registered with a range of simulation runs as the interval of validity Example 3: If I use the same configuration as in Example 1 or the same geometry as in Example 2 for a later range of runs, I should not need to write it a second time.
5
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 5 Registration and mediation The IOV service is, principally, a registration service and a mediator. An object may be stored in any supported technology (ROOT, POOL ROOT, MySQL{NOVA}, plain ASCII or XML files, strings,…), and later registered in the IOV database. This does not mean that all technology choices are equally sensible for all purposes Storing the data object in the temporal database itself is one important possibility, but it is an optimization choice—it must not be a design limitation. LHC experiments already know how to store complex objects In ROOT directly, via POOL,… Should not be required to solve this problem again in order to use an IOV service Registration means associating an interval of validity, a tag/version, …, with (a reference to) the object.
6
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 6 Mediation On input, the IOV service mediates access to data, helping to choose the correct instance of a data object—the one with the correct timestamp and tag In Athena, the transient IOV service 1. checks the current event timestamp as appropriate 2. consults IOV database to get a reference to the correct version of data 3. invokes standard Athena conversion services to put conditions objects in the transient store “Correct” for non-specialists usually means “the endorsed one corresponding to this event” Version/tag information is likely supplied in standard job options Both mediated and unmediated access are possible: if one has a “direct” reference to the object of interest, it is not necessary to pass through the IOV database (mediator) to retrieve it. One can get Version P of the ATLAS muon geometry without dealing with interval of validity databases, on the other hand, the IOV database would be used to discover that Version P was used for simulation runs [m,n] Similar calibration example…
7
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 7 IOV Database Conditions Data Writer Conditions or other time- varying data 1. Store an instance of data that may vary with time or run or … 2. Return reference to data 3. Register reference, assigning interval of validity, tag, …
8
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 8 IOV Database Athena Transient Conditions Store Conditions data 1. Folder (data type), timestamp, tag, 2. Ref to data (string) 3. Dereference via standard conversion services 4. Build transient conditions object Conditions data client
9
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 9 ATLAS development strategy to date Employ common solutions wherever possible. ATLAS has contributed requirements and feedback to common CERN IT/DB, ATLAS, LHCb, HARP, COMPASS, … project Lisbon TDAQ group has implemented this interface in MySQL: this is what ATLAS offline uses for its IOV database Athena transient interval-of-validity service checks current event timestamp, compares to validity intervals of already-loaded time-varying objects, triggers loading of references time-valid objects when needed IOVDbSvc does the loading, allowing standard conversion services to build the transient object from the persistent data
10
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 10 Notes on architecture The architecture lets one register conditions data stored in ASCII files, ROOT files, MySQL databases, …, in an IOV database. Is this all that it takes to be “in” the conditions database: do whatever you want, but register your objects in the IOV database? (I hope we’ll do better.) We still need to manage all those files coherently, and catalog them. One can imagine Configuration data written to their own files or databases DCS data written to their own files or databases in a possibly different way Subdetector-specific conditions written to their own files or databases Different simulation geometries in different ASCII (XML?) files …other partitioning by domain… …all registered in the same IOV service Possible as long as one can represent a “reference” to the data object as a string
11
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 11 Technologies What about storage technologies for conditions objects themselves? Anything readable by our frameworks is okay in theory, but what are good choices? For complex objects, an obvious alternative is to use the same technology that is used for event data: POOL infrastructure, with ROOT as the storage layer For small amounts of data, one can imagine storing the data, rather than a reference to the data, in the string (blob) managed by the IOV database ATLAS offline has used IOV+NOVA (an ATLAS relational-database-hosted product) IOV+POOL: expect the common project to support this IOV+{XML strings}
12
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 12 Storage technologies Since we are using a relational IOV database implementation, using the same relational database is another “obvious” and attractive alternative Are schema equally obvious? Perhaps this project can find consensus One can imagine using the LCG SEAL dictionary for conditions object definitions, and POOL/ROOT (or POOL/{relational database}) as a storage layer This has the advantage that users would describe event and conditions data using exactly the same tools Conversely, it is easy to imagine a standard transient mapping (via POOL?) of simple relational table structures; with reasonable “reference” conventions, these could easily be used for data managed by the IOV database Sometimes transient object definition has primacy; sometimes persistent table schema has primacy: we should support both cases
13
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 13 Beyond intervals of validity Is it obvious that intervals of validity are the right model for all temporal LHC data? What about alarms, and periodic measurements? If I measure pressure at times t0, t1, t2, …, it is entirely artificial to say that the pressure at t1 has an interval of validity [t1,t2) At time t in [t1,t2), I am more likely to want the previous and next pressure measurements, or all the pressure measurements in (t-d,t+d) No reason to say that the pressure at t1 is the valid one Need an extended API: We would like the common project to think about this
14
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 14 Appendix: tagging extensions Several people (myself included) expressed concern about the limitations of the current tagging model in the common project interval of validity (“conditions”) software at the 4-5 February 2003 ATLAS database workshop The following slides describe a modest proposal to change/extend the tagging interface, beginning with a simplified scenario that motivates this proposal Agreed (ATLAS, LHCb: Pere Mato), but extensions not yet implemented
15
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 15 Calibration scenario: Phase 1 A calibration expert is experimenting with a variety of algorithms and algorithm parameters. After a calibration run, she produces calibration constants using three different algorithms, with an interval of validity that lets her apply them to a range of runs and compare the results Algorithm 1 Algorithm 2 Algorithm 3 time “version”
16
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 16 Calibration scenario: Phase 2 After looking at the results, she believes that Algorithm 2 is pretty good, but Algorithm 3 is the best After the next calibration run, she therefore computes calibration constants using Algorithm 3, and assigns an interval of validity corresponding to a new range of runs Algorithm 1 Algorithm 2 Algorithm 3 time “version”
17
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 17 Calibration scenario: Phase 3 Just to be certain before announcing anything for collaboration-wide use, she computes calibration constants from this latest calibration run using Algorithms 1 and 2, and compares the results when these are applied to the recent runs Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 1 Algorithm 2 time “version”
18
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 18 Calibration scenario: Phase 4 She is slightly surprised when it appears that Algorithm 2 is a better choice, and, after looking at her results from the first calibration run, she decides that the Algorithm 2 results are what should be tagged for Production … but the two Algorithm 2 objects were NEVER the HEAD: there is nothing she could have done (unless she were prescient) with tools that tag only the HEAD Algorithm 1 Algorithm 2 Algorithm 3 Algorithm 1 Algorithm 2 time “version”
19
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 19 Things could have been easy What she would have liked to do was this: When she inserted an object produced by Algorithm N, she wanted to label (tag) it “Algorithm N” at insertion time She may not be a C++ expert, but she could certainly have added the string “Algorithm N” to her argument list inside her Algorithm N code How would this work with overlapping intervals? Easy: an interval added with a tag splits only intervals with the same tag (and the HEAD, if you like, for you folks who like to trust the HEAD)
20
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 20 Comments on robustness It is needlessly risky to build a database infrastructure that relies on order of insertion into the database What could the calibration expert have done differently—waited until the Nth calibration run to begin her comparison, making sure she always ran her algorithms in the same order? If someone makes a mistake, do we need to be so unforgiving? Internal versions do not help. Even if she kept a log of everything she did, including the order in which she ran her algorithms, she might be able to guess the version numbers when intervals do not overlap— when they do, the situation is hopeless …and it’s worse if she has a colleague exploring alternatives (but people assure me that this will never happen …) Are we willing to bet our database on this?
21
8 December 2003 David M. Malon, ANL LCG Conditions Database Workshop 21 A question with no context For some conditions, run ranges are the most natural intervals of validity; for others, time ranges are more natural With some work, “real” runs can be associated with time intervals, but for simulation, this requires applying some rather arbitrary and artificial conventions (retroactively, in our case) Query to other experiments: Would it be useful to have the project support more than one kind of validity “key,” e.g., timestamps and {run,event} ranges, or {run,event} time mapping services?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.