Andrea Valassi (CERN IT-SDC) DPHEP Full Costs of Curation Workshop CERN, 13 th January 2014 The Objectivity migration (and some more recent experience.

Slides:



Advertisements
Similar presentations
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Advertisements

CERN - IT Department CH-1211 Genève 23 Switzerland t CORAL Server A middle tier for accessing relational database servers from CORAL applications.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Persistency Framework CORAL, POOL, COOL – Status and Outlook A. Valassi, R. Basset,
CORAL and COOL news for ATLAS (update since March 2012 ATLAS sw workshop)March 2012 Andrea Valassi (IT-SDC) ATLAS Database Meeting 24 th October 2013.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
CERN - IT Department CH-1211 Genève 23 Switzerland t Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment.
Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March , La Jolla, California.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
CERN - IT Department CH-1211 Genève 23 Switzerland t SVN Pilot: CVS Replacement Manuel Guijarro Jonatan Hugo Hugosson Artur Wiecek David.
CERN - IT Department CH-1211 Genève 23 Switzerland t Partitioning in COOL Andrea Valassi (CERN IT-DM) R. Basset (CERN IT-DM) Distributed.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
CERN IT Department CH-1211 Genève 23 Switzerland t SDC Stabilizing SQL execution plans in COOL using Oracle hints Andrea Valassi (IT-SDC)
Irina Sourikova Brookhaven National Laboratory for the PHENIX collaboration Migrating PHENIX databases from object to relational model.
Conditions DB in LHCb LCG Conditions DB Workshop 8-9 December 2003 P. Mato / CERN.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
LHC: ATLAS Experiment meeting “Conditions” data challenge Elizabeth Gallas - Oxford - August 29, 2009 XLDB3.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CHEP 2006, Mumbai13-Feb-2006 LCG Conditions Database Project COOL Development and Deployment: Status and Plans Andrea Valassi On behalf of the COOL.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
A. Valassi – QA for CORAL and COOL Forum – 29 h Sep Quality assurance for CORAL and COOL within the LCG software stack for the LHC.
Andrea Valassi & Alejandro Álvarez IT-SDC White Area lecture 16 th April 2014 C++11 in practice Implications around Boost, ROOT, CORAL, COOL…
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
CERN IT Department CH-1211 Genève 23 Switzerland t ES Future plans for CORAL and COOL Andrea Valassi (IT-ES) For the Persistency Framework.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
The Persistency Patterns of Time Evolving Conditions for ATLAS and LCG António Amorim CFNUL- FCUL - Universidade de Lisboa A. António, Dinis.
CERN - IT Department CH-1211 Genève 23 Switzerland t COOL Conditions Database for the LHC Experiments Development and Deployment Status Andrea.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
Peter Chochula ALICE Offline Week, October 04,2005 External access to the ALICE DCS archives.
Servicing HEP experiments with a complete set of ready integrated and configured common software components Stefan Roiser 1, Ana Gaspar 1, Yves Perrin.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
A. Valassi – Python bindings for C++ in PyCool ROOT Workshop – 16 th Sep Python bindings for C++ via PyRoot User experience from PyCool in COOL.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
3D Testing and Monitoring Lee Lueking LCG 3D Meeting Sept. 15, 2005.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
11th November Richard Hawkings Richard Hawkings (CERN) ATLAS reconstruction jobs & conditions DB access  Conditions database basic concepts  Types.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
CORAL CORAL a software system for vendor-neutral access to relational databases Ioannis Papadopoulos, Radoval Chytracek, Dirk Düllmann, Giacomo Govi, Yulia.
Summary of persistence discussions with LHCb and LCG/IT POOL team David Malon Argonne National Laboratory Joint ATLAS, LHCb, LCG/IT meeting.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
CERN IT Department CH-1211 Genève 23 Switzerland t ES Developing C++ applications using Oracle OCI Lessons learnt from CORAL Andrea Valassi.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Conditions Database Status and Plans for 2005 Andrea Valassi (CERN IT-ADC) LCG Applications Area Review 31 March 2005.
SPI Software Process & Infrastructure Project Plan 2004 H1 LCG-PEB Meeting - 06 April 2004 Alberto AIMAR
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
CERN - IT Department CH-1211 Genève 23 Switzerland t Persistency Framework CORAL, POOL, COOL status and plans Andrea Valassi (IT-PSS) On.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
CERN IT Department CH-1211 Genève 23 Switzerland t ES CORAL Server & CORAL Server Proxy: Scalable Access to Relational Databases from CORAL.
INFN Tier1/Tier2 Cloud WorkshopCNAF, 22 November 2006 Conditions Database Services How to implement the local replicas at Tier1 and Tier2 sites Andrea.
Database Replication and Monitoring
(on behalf of the POOL team)
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Andrea Valassi (IT-ES)
3D Application Tests Application test proposals
The COMPASS event store in 2002
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Conditions Data access using FroNTier Squid cache Server
LHCb Conditions Database TEG Workshop 7 November 2011 Marco Clemencic
Presentation transcript:

Andrea Valassi (CERN IT-SDC) DPHEP Full Costs of Curation Workshop CERN, 13 th January 2014 The Objectivity migration (and some more recent experience with Oracle)

13 th January 2014A. Valassi – Objectivity Migration2 Outline – past and present experiences  Objectivity to Oracle migration in pre-LHC era (2004)  COMPASS and HARP experiments – event and conditions data  Preserving (moving) the bits and the software  Oracle conditions databases at LHC (2005-present)  ATLAS, CMS and LHCb experiments – CORAL and COOL software  What would it take to do data preservation using another database?  Preserving (moving) the bits and the software  Operational “continuous maintenance” experience  Preserving (upgrading) the software and the tools  Conclusions

13 th January 2014A. Valassi – Objectivity Migration3 COMPASS and HARP Objectivity to Oracle migration (2004)

13 th January 2014A. Valassi – Objectivity Migration4 Objectivity migration – overview  Main motivation: end of support for Objectivity at CERN  The end of the object database days at CERN (July 2003)  The use of relational databases (e.g. Oracle) to store physics data has become pervasive in the experiments since the Objectivity migration  A triple migration!  Data format and software conversion from Objectivity to Oracle  Physical media migration from StorageTek 9940A to 9940B tapes  Two experiments – many software packages and data sets  COMPASS raw event data (300 TB)  Data taking continued after the migration, using the new Oracle software  HARP raw event data (30 TB), event collections and conditions data  Data taking stopped in 2002, no need to port event writing infrastructure  In both cases, the migration was during the “lifetime” of the experiment  System integration tests validating read-back from the new storage

13 th January 2014A. Valassi – Objectivity Migration5 Migration history and cost overview  COMPASS and HARP raw event data migration  Mar2002 to Apr2003: ~2 FTEs (spread over 5 people) for 14 months  Dec2002 to Feb2003: COMPASS 300TB, using 11 nodes for 3 months (and proportional numbers of Castor input/output pools and tape drives)  Feb2003: COMPASS software/system validation before data taking  Apr2003: HARP 30 TB, using 4 nodes for two weeks (more efficiently)  HARP event collection and conditions data migration  May2003 to Jan2004: ~0.6 FTE (60% of one person) for 9 months  Collections: 6 months (most complex data model in spite of low volume)  Conditions: 1 month (fastest phase, thanks to abstraction layers…)  Jan2004: HARP software/system validation for data analysis COMPASS 3 months, 11 nodes Integrated on nodes: MB/s peak - 2k events/s peak HARP 2 weeks, 4 nodes

13 th January 2014A. Valassi – Objectivity Migration6 Raw events – old and new data model  COMPASS and HARP used the same model in Objectivity  Raw data for one event encapsulated as a binary large object (BLOB)  Streamed using the “DATE” format - independent of Objectivity  Events are in one file per run (COMPASS: 200k files in 4k CASTOR tapes)  Objectivity ‘federation’ (metadata of database files) permanently on disk  Migrate both experiments to the same ‘hybrid’ model  Move raw event BLOB records to flat files in CASTOR  BLOBs are black boxes – no need to decode and re-encode DATE format  No obvious advantage in storing BLOBs in Oracle instead  Move BLOB metadata to Oracle database (file offset and size)  Large partitioned tables (COMPASS: 6x10 9 event records) ……..………………..… …xxxxxxxxxxxxxxxxxx xxxxxxx…………..…… …………….……….… …xxxxxxxxxxxxxxxxxx xxxxxxxxx..…………… …………….……….. /castor/xxx/Run12345.raw

7A. Valassi – Objectivity Migration13 th January 2014 Raw events – migration infrastructure Setup to migrate the 30 TB of HARP (4 migration nodes) – a similar setup with more nodes (11) was used to migrate the 300 TB of COMPASS peak) A “large scale” migration by the standards of that time – today’s CASTOR “repack” involves much larger scales 4GB/s)repack Two jobs per migration node (one staging, one migrating)

8A. Valassi – Objectivity Migration13 th January 2014 HARP event collections  Longest phase: lowest volume, but most complex data model  Reimplementation of event navigation references in the new Oracle schema  Reimplementation of event selection in the Oracle-based C++ software  Exploit server-side Oracle queries  Completely different technologies (object vs. relational database)  Re-implementing the software took much longer than moving the bits

13 th January 2014A. Valassi – Objectivity Migration9 HARP conditions data  Stored using technology-neutral abstract API by CERN IT  Software for time-varying conditions data (calibration, alignment…)  Two implementations already existed for Objectivity and Oracle  This was the fastest phase of the migration  Abstract API decouples experiment software from storage back-end  Almost nothing to change in the HARP software to read Oracle conditions  Migration of the bits partly done through generic tools based on abstract API  Compare to LHC experiments using CORAL and/or COOL  See the next few slides in the second part of this talk

13 th January 2014A. Valassi – Objectivity Migration10 ATLAS, CMS and LHCb conditions databases: the CORAL and COOL software (2005-now)

13 th January 2014A. Valassi – Objectivity Migration11 CORAL component architecture DB lookup XML COOL C++ API OracleAccess (CORAL Plugin) OCI C API CORAL C++ API (technology-independent) Oracle DB SQLiteAccess (CORAL Plugin) SQLite C API MySQLAccess (CORAL Plugin) MySQL C API MySQL DB SQLite DB (file) OCI (password, Kerberos) OCI FrontierAccess (CORAL Plugin) Frontier API CoralAccess (CORAL Plugin) coral protocol Frontier Server (web server) CORAL server JDBC http coral Squid (web cache) CORAL proxy (cache) coral http XMLLookupSvc XMLAuthSvc (CORAL Plugins) Authentication XML (file) CORAL plugins interface to 5 back-ends -Oracle, SQLite, MySQL (commercial) -Frontier (maintained by FNAL) -CoralServer (maintained in CORAL) No longer used but minimally maintained CORAL is used in ATLAS, CMS and LHCb in most of the client applications that access Oracle physics data Oracle data are accessed directly on CERN DBs (or their replicas at T1 sites), or via Frontier/Squid or CoralServer C++ code of LHC exp. (DB-independent) use CORAL directly

13 th January 2014A. Valassi – Objectivity Migration12 Data preservation and abstraction layers  Conditions and other LHC physics data are stored in relational databases using software abstraction layers (CORAL and/or COOL)  Abstract API supporting Oracle, MySQL, SQLite, Frontier back-ends  May switch back-end without any change in the experiment software  Same mechanism used in HARP conditions data preservation  Objectivity and Oracle implementations of the same software API  Major technology switches with CORAL have already been possible  ATLAS: replace Oracle direct read access by Frontier-mediated access  ATLAS: replicate and distribute Oracle data sets using SQLite files  LHCb: prototype Oracle conditions DB before choosing SQLite only  CORAL software decoupling could also simplify data preservation  For instance: using SQLite, or adding support for PostgreSQL

13 th January 2014A. Valassi – Objectivity Migration13 Adding support for PostgreSQL? COOL C++ API C++ code of LHC exp. (DB-independent) use CORAL directly CORAL C++ API (technology-independent) OracleAccess (CORAL Plugin) OCI C API Oracle DB OCI FrontierAccess (CORAL Plugin) Frontier API CoralAccess (CORAL Plugin) coral protocol Frontier Server (web server) CORAL server JDBC http coral Squid (web cache) CORAL proxy (cache) coral http PostgresAccess (CORAL Plugin) Postgres C API Postgres DB libpq JDBC libpq Main changes: 1. Add PostgresAccess plugin 2. Deploy DB, copy O(2 TB) data In addition: 3. Support Postgres in Frontier 4. Query optimization (e.g. COOL) In a pure data preservation scenario, some of the steps above may be simple or unnecessary Most other components should need almost no change…

13 th January 2014A. Valassi – Objectivity Migration14 Continuous maintenance – software  COOL and CORAL experience in these ten years  O/S evolve: SLC3 to SLC6, drop Windows, add many MacOSX  Architectures evolve: 32bit to 64bit (and eventually multicore)  Compilers evolve: gcc3.2.3 to 4.8, icc11 to 13, vc7 to vc9, clang…  Languages themselves evolve: c++11!  Build systems evolve: scram to CMT (and eventually cmake)  External s/w evolves: Boost 1.30 to 1.55, ROOT 4 to 6, Oracle 9i to 12c...  API changes, functional changes, performance changes  Need functional unit tests and experiment integration validation  Smoother transitions if you do quality assurance all-along (e.g. Coverity)  Continuous software porting has a (continuous) cost  O(1) FTE for CORAL/COOL alone adding up IT, PH-SFT, experiments?  Freezing and/or virtualization is unavoidable eventually?

13 th January 2014A. Valassi – Objectivity Migration15 Continuous maintenance - infrastructure  CORAL: CVS to SVN migration – software and documentation  Preserve all software tags or only the most recent versions?  Similar choices needed for conditions data (e.g. alignment versions)  Keep old packages? (e.g. POOL was moved just in case…)  Any documentation cross-links to CVS are lost for good  CORAL: Savannah to JIRA migration – documentation  Will try to preserve information – but know some cross-links will be lost  Losing information in each migration is unavoidable?  Important to be aware of it and choose what must be kept

13 th January 2014A. Valassi – Objectivity Migration16 Conclusions

13 th January 2014A. Valassi – Objectivity Migration17 Conclusions - lessons learnt?  The daily operation of an experiment involves “data preservation”  To preserve the bits (physical media migration)  To preserve the bits in a readable format (data format migration)  To preserve the ability to use the bits (software migration and upgrades)  To preserve the expertise about the bits (documentation and tool migration)  It is good (and largely standard) practice to have validation suites for all this  Everyday software hygiene (QA and documentation) makes transitions smoother  Data and software migrations have a cost  For Objectivity: several months of computing resources and manpower  Layered approach to data storage software helps reducing these costs  Continuous maintenance of software has a cost  Using frozen versions in virtualized environments is unavoidable?  Continuous infrastructure upgrades may result in information loss  Watch out and keep in mind data preservation…

13 th January 2014A. Valassi – Objectivity Migration18 Selected references Objectivity migration  M. Lübeck et al, MSST 2003, San Diego  M. Nowak et al., CHEP 2003, La Jolla  A.Valassi et al., CHEP 2004, Interlaken ionId=24&confId=0http://indico.cern.ch/contributionDisplay.py?contribId=448&sess ionId=24&confId=0  A.Valassi, CERN DB Developers Workshop CORAL and COOL  R. Trentadue et al., CHEP 2012, Amsterdam =6&resId=0&materialId=paper&confId=149557http://indico.cern.ch/getFile.py/access?contribId=104&sessionId =6&resId=0&materialId=paper&confId=  A.Valassi et al., CHEP 2013, Amsterdam ionId=9&confId=214784http://indico.cern.ch/contributionDisplay.py?contribId=117&sess ionId=9&confId=214784