CTA: CERN Tape Archive Rationale, Architecture and Status Germán Cancio, Daniele Kruse, Eric Cano, Steven Murray CASTOR face to face meeting, June 2016 CTA Project
Rationale & objectives CTA :CERN Tape Archive Stand alone tape system Decoupled from disk storage First target EOS, potentially others systems Straightforward architecture with minimum number of elements Session preemption: optimized use of drives Target is LHC Run 3: EOS+CTA as the DAQ interface for all experiments CASTOR face to face meeting, June 2016 CTA Project
Architecture Shared storage concept New queueing system Only 2 daemons: front end for CLI (xrootd based) & taped. New queueing system Based on Ceph Only for transient data Each queue (per tape/tapepool) is an independent object Avoids a single huge queue table Allows storage of rich objects Separate file catalogue Based on usual relational DB For persistent data cta-taped an adapted tapeserverd from CASTOR CASTOR face to face meeting, June 2016 CTA Project
Architecture CTA Catalogue CTA tape server CTA front-end CTA All of the CTA business logic and data management code is in the client interface to metadata CTA Catalogue Evolved from CASTOR Files Routing Commands/control Tape pools Mount policies CTA tape server CTA front-end CTA tape server CTA front-end CTA tape server CTA Queues Drive status Queues Tape Tape Tape Tape Tape Tape Two prototypes Ceph Local file system Data CTA command-line tools CTA command-line tools CTA CLI Disk Disk Remote storage EOS w/workflow Disk Disk CASTOR face to face meeting, June 2016 CTA Project
EOS integration EOS triggers archivals and retrievals Handled by workflow engine Through CLI interface of CTA EOS maintains a stub for each file Namespace belongs to EOS CTA files referenced by unique numeric ID Path only interpreted in EOS EOS does not query CTA CTA maintains Disaster Recovery (DR) data File metadata stored in the catalogue as a DR blob CASTOR face to face meeting, June 2016 CTA Project
Status “Passive EOS” integration achieved end 2015 CTA transfers file in/out of an EOS instance Hand triggered via the CLI Prototype with a full namespace, now removed Some persistent data on Ceph side Namespace based on shared filesystem “Active EOS” integration targeted for 2016 Integration with EOS’s workflow engine Changes needed on both sides New catalogue replaces namespace Queuing system adapted to new catalogue and performance validated Refactoring of cta-taped code Longer term targets Repack Verify Session preemption instead of dedication Saturate the otherwise idle drives with low-priority, high-volume sessions (repack, verify) Yield to higher priority sessions (user access) Disaster recovery CASTOR import (metadata) cta-taped improvements in parallel (low priority) Recommended Access Order (drive-recommended read order) Out of order archiving (writing files in reception from disk order) CASTOR face to face meeting, June 2016 CTA Project