Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 11 1 11 June 2003LCG Applications Area Meeting Conditions DataBase Overview of existing projects Andrea Valassi (CERN IT-DB)

Similar presentations


Presentation on theme: "1 11 1 11 June 2003LCG Applications Area Meeting Conditions DataBase Overview of existing projects Andrea Valassi (CERN IT-DB)"— Presentation transcript:

1 1 11 1 11 June 2003LCG Applications Area Meeting Conditions DataBase Overview of existing projects Andrea Valassi (CERN IT-DB)

2 2 22 2 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database What is the ConditionsDB? The ConditionsDB is a package to handle data that –Can be classified into many independent data items –VARY IN TIME –Can have many different versions (for a given time and data item) Pere Mato (Feb 2000)

3 3 33 3 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database History Feb.2000: “CondDB Interface Specification Proposal” by Pere Mato (LHCb) Feb.2000-Sep.2000: Requirement collection by Stefano Paoli (IT-DB) –Emphasis on functional requirements and definition of C++ API –Active participation by many experiments (Harp, Compass, LHCb, Atlas…) –Earlier experience in BaBar and RD45 taken into account Oct.2000–Oct.2001: Objy implementation by Stefano Paoli et al. (IT-DB) Apr.2001-Oct.2002: Harp and Compass data-taking using Objy CondDB Mar.2002-Aug.2002: Oracle implementation by Emil Pilecki (IT-DB) –Objy  Oracle migration tool developed for Compass and Harp Jun.2002-Jun.2003: MySQL implementation by Jorge Lima et al. (Atlas) –More requirements collected from Atlas users, leading to API extensions May.2003: “Proposal to bring CondDB into LCG AA” by Pere Mato Jun.2003: Atlas test beam data-taking using MySQL CondDB

4 4 44 4 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Original API and functionality Objectivity and Oracle implementations

5 5 55 5 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Data item classification CondDB “folder” –All condition data of a given item (for various times and versions) –Different folders contain measurements of different physical observables CondDB “folder set” –A collection of related folders and/or folder sets Folders and folder sets are identified by Unix-like path names –e.g., folder “/ConditionsDB/SlowControl/Ecal/Module1/Temperature” Like a Unix file (with all condition data for a data item for various times/versions) –e.g., folder set “/ConditionsDB/SlowControl/Ecal/Module1/” Like a Unix directory containing files (folders) and/or other directories (folder sets) The ConditionsDB was initially optimized to handle time-varying data –No emphasis on data hierarchies more complex than directory-like –No relational-DB-like functionality for folder addressing SQL-like “type=Slow,det=Ecal,Module=1” rather than “/SlowControl/Ecal/Module1/” ? This may be useful to associate attributes to folders (“magnet on” vs “magnet off”)?

6 6 66 6 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Stored data type: byte stream Every “CondDBObject” (the atomic unit of storage) has associated: –Metadata: (1) folder name, (2) time validity interval, (3) version number –Data: a BYTE STREAM –Suggested granularity: a chunk of related data values that are measured all at the same time is best stored as a single CondDB object (i.e., in the same folder) Storing a byte stream allows the flexibility of very different choices –Human-readable strings of characters Embedded XML strings (LHCb) External object refs: Objy OOID (Atlas), RDBMS refs, POOL refs… External file names: XML files, ROOT files, ASCII files (Harp)… –Streamed vectors of numbers (Harp, Compass), ROOT objects, DATE records… Storage of complex data structures: limitation of present API? –Data values with the same time granularity (data contents of one CondDB object) can be grouped into one XML string, streamed object or external file –But API generally lets external frameworks decode internal substructures Additional (4 th ) metadata dimension may be useful (variable name and type)? Naturally mapped to RDBMS (user defined folder/table), storage/query optimization Steps in this direction in the Lisbon extended API

7 7 77 7 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Time validity Time instances are of CondDBKey type –Defined as 64-bit signed integer (_int64, long long) –Flexible enough to indicate actual times, run numbers, … e.g., store actual times as “nanoseconds since Jan 1, 1970” –translate (run#, event#) pairs to nanoseconds using bookkeeping information Each data block is stored with a ‘[since,till)‘ range –Two values (time range) are needed for storing –A single value (time instance) is needed for retrieval Always together with a folder name and a version/tag Can also retrieve an iterator over consecutive intervals (within a tag) Condition data in CondDB and event data are loosely coupled by design –Conditions and events are stored separately and related only by the event time –Data synchronization is left to individual frameworks Only frameworks have notion of “time of event being reconstructed” Usually a global tag would be used Should a common data cache synchronization layer be developed?

8 8 88 8 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Different versions of a condition object may exist –In the same folder and for the same validity time –e.g., the alignment constants may be recalculated several times Folder tags identify a consistent set in a given folder –i.e. a set of blocks such that 0 or 1 block is valid at any given time –Similar to CVS tag mechanism –Tagged sets are needed by read users (e.g. reconstruction /analysis jobs) –Iterators allow to retrieve the next object (by validity time) within the tag Only the HEAD version of a folder can be tagged –The HEAD is automatically maintained self-consistent when data are inserted –If the new version has a different validity range, previous versions are split into more than one objects, one of which may belong to the HEAD Tags are identified by their name –The same name may be used in more than one folder (~ “global tag”) –It is also possible to associate a new tag to an existing tagged set (~ “re-tag”) Versioning and tagging

9 9 99 9 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Data partitioning (Objy/Oracle) Advantages of partitioning –scalability, archiving, backup, data distribution, … Partitioning by folder –Objy-based: data partitioned on different database files in one FDB New folders and folder sets can be associated to separate database files –Oracle-based: data partitioned on different table partitions in one DB Different folders are automatically stored in separate table partitions Partitioning by time range –Pre-defined partitioning of individual folders by time range not possible Was discussed in the past but not yet defined or implemented Internal partitioning –Objy-based: new database files are automatically open if one becomes full –Oracle-based: hash subpartitioning API foresees user-defined criteria for partitioning –Some (too much!) freedom of interpretation exists in the API implementations –Should be revisited and enhanced

10 10 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database C++ abstract interfaces ICondDBMgr –top level entry (allows to retrieve data access, tag manager, folder manager) –technology-dependent initialization –create new CondDB, open existing CondDB –start, commit, abort transactions ICondDBDataAccess –store objects –find objects by (folder,tag,time) –retrieve iterators to browse objects in tag ICondDBFolderMgr –create folders and folder sets –retrieve existing folder sets and folders in a folder set ICondDBTagMgr –create, delete, retrieve tags –tag HEAD version of a folder –associate a new tag to an existing tagged set; untag a tagged set

11 11 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Technology migration The C++ abstract interface provides an insulation layer –Users can write 99% of their code in a technology independent manner –Only the initialization parameters are implementation-dependent –A technology migration involves only minor changes to the user code A data-migration tool has been developed –Two-step process: export to temporary data file + import from file –It uses only the abstract C++ interface for both export and import Initially developed by Stefano for Objy  Objy Readapted and tested by Emil for Objy  Oracle (Compass migration) Easily readaptable to other storage technologies Run-time choice of API implementation is also possible –I tested Oracle/MySQL (some rewriting to avoid name conflicts) –Different data can be stored using different backends if desired

12 12 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Example: CondDB in LHCb No production data are stored in the CondDB yet Emphasis of prototyping on interface to Gaudi data/conversion services –XML strings are stored in the CondDB and must be XML-interpreted –Gaudi DataSvc (Cache manager) has a CondDB opaque address Folder name from file-based XML detector description e.g. Tag name from job options file Event time from EventDataSvc –Stored data type (here, XML) and classID retrieved from folder “description” A second, temporary, opaque address is created of XML type to trigger XML parsing Stored XML content can of course reference other file-based or CondDB-based XML Cache (transient store) synchronization mechanism triggered on demand –dataService->updateObject(object)

13 13 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Extended API and functionality MySQL implementation Many thanks to the Atlas Lisbon group: Antonio Amorim, Jorge Lima, Dinis Klose, Luis Pedro

14 14 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Extended API API extension driven by needs of Atlas test beam users –Detector slow control (temperatures, voltages) monitored via PVSS –Support added for “tiny objects” mapped to PVSS “data-points” Readily implemented in MySQL (no Oracle implementation so far) –A portable PVSSManager module using CondDB via its API was also developed Special folders can be created to store PVSS data points –Two types of folders coexist: default (strings) and PVSS (tiny objects) Separate methods to create folders, store/retrieve/browse objects –API modification consists in addition of only four methods –No versioning/tagging is foreseen in PVSS folders Designed for online data-taking (slow control) rather than (re-)alignment “Tiny object” support is not specific to PVSS –float/int/… values rather than opaque byte streams Also arrays (of fixed length, each element may be of different type) –Methods mention PVSS in their names, they will all be renamed May be a good occasion for wider discussion of API and CondDB object data content

15 15 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Implementation issues (MySQL) Ad-hoc partitioning –Each table corresponds to one file in MySQL, no native partitioning Postgres implementation is being prototyped to overcome this problem –ConditionsDB partitioning by allocating a separate table for each folder Individual tables may be spread across different MySQL databases Relational schema redesigned to accommodate tiny objects –float, int, bool, string, char, time stored natively in MySQL columns –Future handling of schema evolution is being thought about Supported platforms –Linux (gcc2.95.2, gcc3.2, gcc3.3), Windows (VS,.NET) Oracle implementation: Linux (gcc2.95.2), Windows (VS) –Windows is very useful for PVSS users (e.g. Atlas test beams)

16 16 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Lisbon group plans till end 2003 Respond to feedback from Atlas Muon test beam users –This will be the highest priority: data-taking started last week! Improve API for tiny objects –Generalize and rename relevant methods (remove reference to PVSS) Define API for and implement extended tagging mechanism –Attach user tags to CondDB objects at creation time (rather than tag HEAD) Develop two graphical tools to browse the ConditionsDB contents –One Web-based using jsp (calling C++ methods via JNI), another using ROOT –Both based on the API (portable to other CondDB implementations) –Both allowing read-only access for the moment Extend API with support for XML storage Extend API with support for ROOT objects with external references Prototype a Postgres implementation in alternative to MySQL

17 17 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Expectations from a common LCG project Summary Thanks for their comments to Clara Gaspar, Pere Mato, Fons Rademakers…

18 18 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Feedback and comments (1) Development of tools is felt as a very high priority –Tools to slice/export the data into subsamples for distribution purposes Need cross-platform portability (Oracle MySQL) Distribution to laptops (individual users), distribution over Grid (production centers) Exporting a DB referencing external files should allow to distribute those files too May also require changes or new conventions in the API for partitioning – Graphical data browsers with editing capabilities Slicing, tagging, retagging… Interest in file-based backends (ROOT, XML, …) –For distribution on very lightweight systems (no Oracle, no MySQL installed) POOL-software independence should be preserved –To allow the use of the CondDB even if POOL is not used (e.g. in Alice) –No contradiction with POOL-project integration

19 19 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Feedback and comments (2) PVSS interface may become less important in the long run –Next version of PVSS will have built-in Oracle data store May be used for online debug of controls/DAQ, no need for CondDB there (e.g. LHCb) –PVSS raw data volume at LHC too large to be stored in the CondDB Only pre-filtered information is likely to end up in the CondDB Extended tagging mechanisms may be needed in the API –And/or provide models to solve common use cases with existing API: Distinguish data added by different users or under different configurations… Recover versions HEAD-1, HEAD-2, to make up for mistakes…

20 20 Andrea Valassi IT-DBLCG AAM, 11 June 2003Conditions Database Summary Common API exists to store time-varying data in a “Conditions DB” –Discussed with many experiments, used for data-taking by Compass and Harp –Several backends share the same abstract interface Oracle implementation maintained by IT-DB –With the corresponding CondDB service (Oracle server management) –CondDB Objy implementation dropped soon (Objy support discontinued) MySQL implementation maintained by Atlas Lisbon group –Extending functionalities and API beyond the original project –Technology-independent tools under development (WWW/ROOT GUI’s) An LCG common project has been proposed See next talk by Torre


Download ppt "1 11 1 11 June 2003LCG Applications Area Meeting Conditions DataBase Overview of existing projects Andrea Valassi (CERN IT-DB)"

Similar presentations


Ads by Google