The Database Project a starting work by Arnauld Albert, Cristiano Bozza
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Server technology: Oracle Enterprise Server (RAC - Cluster) Well hosted and supported at CCIN2P3 Working licenses already available in several sites in Italy Discounts possible through CERN agreement Good experience of technical support by Oracle (also 24 7 for critical cases) Huge (unequalled?) variety of tools and libraries (by Oracle and independent software producers) Accessible through: C/C++, Java, C#, Python, PHP, VB, Perl, ODBC, ODP.NET, … OS-independent
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan System: Symmetric datacenters Continuous data synchronization
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Schema for construction – detector description Locations Users Products PBS Descriptions Container Mapping Logical and hierarchical relationships among products are stored The structure of the detector (PBS) is also stored and documented in the DB
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Schema for construction – activity management Locations Users Operations Status Decisions Each operation corresponds to a well defined task An operation may contain one or more sub-operations, like a tree Each operation is linked to the place where it is performed and the user that does it or is responsible for it. At a certain time, it is in a certain “status”; on completion, decisions may be taken concerning it.
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Schema for construction – activity management Operations Decisions The Type of a certain operation contains links to programs and their operating parameters. This allows not only documentation but also management through the DB Each operation type has a set of possible decisions. Operation Types Possible Decisions Operation Types
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Schema for construction – detailed bookkeeping information Operations Depending on the type of operation, additional information may be required Detail tables are linked to the main operation table For each kind of test, the list of parameters to be tested is defined For each component all the output of the testing is stored; it is easy to add new parameters as needed Product bookkeeping IntegrationTests Test types Parameter values Sets of parameters
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Schema for construction – Content The DB is completely flexible about testing parameters The DB can contain explanation of parameters and the testing procedure can be described and documented We are collecting information from experts about testing procedures For the moment, we have some sample data (thanks to Tamas, Oleg and Emanuele)
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Access Three basic user roles have been defined: Administrator (km3net), Reader (km3read), Writer (km3write) with obvious meaning As for information, we distinguish between “information author/consumer” and DB user Information author/consumer: any person that produces or uses data from the DB, directly or indirectly stored in Users table DB User: person or batch process that connects directly to the DB, using part of its memory, CPU, disk speed Oracle DB Server authentication
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Access A DB user can have multiple connections if needed A DB user can initiate transactions, lock rows/tables, run SQL queries A person should have a DB user account if he/she is aware of the related responsibility in resource usage User accounts can also be given on institution or group basis Most people will just need data, and also in friendly format Many people will want to stay focused on data and do not care about technicalities A special DB user, named km3web, is used by a dedicated Web site to provide a friendly user interface The Web site can provide information properly formatted and with detailed explanations The Web site can also be used to upload information about results of tests This can be also done by uploading data files
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan KM3Net Web DB access Login provided through Users table
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan KM3Net Web DB access For the moment, we are waiting for expert feedback to build useful pages The Web site can also provide “raw SQL access” (technical tests, maintenance, etc.) As the recent trends of Internet show, Web servers are increasingly becoming application servers for machine-to-machine data exchange
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Outlook on Physics data storage It is possible to use a relational DB to store not only “data about the detector” but also “data from the detector” Moreover, intermediate output of reconstructions can also be stored, and specific datasets can be flagged and indexed – help and support analysis! There is recent fruitful experience concerning this technique (e.g. OPERA DB – designed to range between 50 and 100 TB, currently 34 TB) A first technical test has been successful for NEMO-phase-1 post-trigger data (thanks to Tommaso for data, discussion and help) Operational settings (“datacards”), trigger configuration, detector status, and sampled waveforms all easily stored A C++ library has also been developed to store data without any knowledge of SQL
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Outlook on Physics data storage Test for NEMO-phase-1 post-trigger data
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Outlook on Physics data storage Test for NEMO-phase-1 post-trigger data: code snippet to store a full run The only places where you know you’re storing to a DB are those in red: fill username/password to connect, and commit the transaction to tell the DB that all data were written and they can be stored
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Outlook on Physics data storage Detector data storage is also related to the whole computing model Pro’sCon’s Reliable storage, with corruption checkAdditional checks load CPU and disk OS-independentData access requires linking a library (ODBC, OCI, ODP.NET, …) Languange and technology independent storage (C++,C#,Java,VB,Python,PHP,…) Easy to manage using SQL Consistency and integrity are automatically enforced Views help present data effectively with few lines of code In case of data model evolution, old programs still run without changes (no recompilation needed) Normally, also file-system storage uses an I/O library, so this is not a characteristic “con” of relational DB’s DB administrators of course take also care of developing and maintaining the I/O library
The Database Project A. Albert, C. Bozza – KM3Net Collaboration Meeting – Marseille, Jan Conclusions The work to set up the DB to document and support construction has already begun Startup schema defined User access defined Web site to make access user-friendly already set up, needs to be filled with useful pages (input from experts!) WE NEED DATA! Outlook It is possible to store not only construction data, but also raw data and physics output There is already know-how on that, and we can start a broad discussion