CTA: CERN Tape Archive Overview and architecture

Slides:



Advertisements
Similar presentations
ProcessIt Document Library 8.0 Controlled Documents Suite.
Advertisements

The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Lecture The Client/Server Database Environment
The Client/Server Database Environment
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS New tape server software Status and plans CASTOR face-to-face.
Simplify your Job – Automatic Storage Management Angelo Session id:
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
16/4/2004Storage Resource Sharing with CASTOR1 Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov.
Experiences and Challenges running CERN's High-Capacity Tape Archive 14/4/2015 CHEP 2015, Okinawa2 Germán Cancio, Vladimír Bahyl
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Data management in grid. Comparative analysis of storage systems in WLCG.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
INFNGrid Constanza Project: Status Report A.Domenici, F.Donno, L.Iannone, G.Pucciani, H.Stockinger CNAF, 6 December 2004 WP3-WP5 FIRB meeting.
Functional description Detailed view of the system Status and features Castor Readiness Review – June 2006 Giuseppe Lo Presti, Olof Bärring CERN / IT.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
 CASTORFS web page - CASTOR web site - FUSE web site -
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
New stager commands Details and anatomy CASTOR external operation meeting CERN - Geneva 14/06/2005 Sebastien Ponce, CERN-IT.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Evaluating distributed EOS installation in Russian Academic Cloud for LHC experiments A.Kiryanov 1, A.Klimentov 2, A.Zarochentsev 3. 1.Petersburg Nuclear.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS New tape server software Status and plans CASTOR face-to-face.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
OpenStorage Training Introduction to NetBackup
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Tape write efficiency improvements in CASTOR Department CERN IT CERN IT Department CH-1211 Genève 23 Switzerland DSS Data Storage.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR and EOS status and plans Giuseppe Lo Presti on behalf.
CTA: CERN Tape Archive Rationale, Architecture and Status
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
CASTOR: possible evolution into the LHC era
CASTOR Giuseppe Lo Presti on behalf of the CASTOR dev team
WP18, High-speed data recording Krzysztof Wrona, European XFEL
N-Tier Architecture.
The Client/Server Database Environment
Chapter 12: File System Implementation
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
CTA: CERN Tape Archive Adding front-ends and back-ends Status report
The Client/Server Database Environment
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
CS703 - Advanced Operating Systems
CASTOR: CERN’s data management system
IBM Tivoli Storage Manager
Production Manager Tools (New Architecture)
Presentation transcript:

CTA: CERN Tape Archive Overview and architecture Vlado Bahyl, Germán Cancio, Eric Cano, Michael Davis, Cristina Moraru, Steven Murray CASTOR face to face meeting, May 2017 CTA Project

Overview What is CTA Why CTA When is CTA Architecture details Software architecture Summary CASTOR face to face meeting, May 2017 CTA Project

What is CTA – 1 CTA is: Natural evolution of CASTOR A tape backend for EOS A preemptive tape drive scheduler A clean separation between disk and tape One EOS instance per experiment Central CTA instance EOS namespace and redirector Archive and retrieve requests CTA front-end Tape file catalogue, archive / retrieve queues, tape drive statuses, archive routes and mount policies Archive and retrieve requests Tape files appear in EOS namespace as replicas. EOS workflow engine glues EOS to CTA CTA metadata Completion reports Tape drive Tape library Scheduling information and tape file locations EOS disk server EOS disk server CTA tape server CTA tape server EOS disk server CTA tape server Files Files CASTOR face to face meeting, May 2017 CTA Project

What is CTA – 2 EOS owns the name space EOS calls CTA hooks on events CTA provides an extra replica to disk Reference to replica stored in EOS namespace CTA files only have numeric Ids EOS calls CTA hooks on events Using EOS work flow engine Close after write (archive trigger) Open for read with no replica on disk and prepare (retrieve trigger) Delete EOS receives callbacks from CTA Archive completion report Listing of contents for reconciliation EOS manages the lifecycle of disk copies (garbage collection) CTA stores backup of EOS metadata for tape-backed files One CTA system serves several EOS instances Other disk system should provide the same hooks and callbacks CASTOR face to face meeting, May 2017 CTA Project

What is CTA – 3 EOS plus CTA is a “drop in” replacement for CASTOR Current deployments with CASTOR Future deployments with EOS plus CTA CASTOR Experiment Tape libraries EOS Files EOS + CTA Experiment Tape libraries EOS Files CASTOR face to face meeting, May 2017 CTA Project

Why CTA – 1 EOS has become the de facto disk storage for LHC physics data Natural evolution from CASTOR Remove duplication between CASTOR disk storage and EOS Thin layer on top of existing CASTOR tape server Stronger and more decoupled separation between disk and tape CASTOR face to face meeting, May 2017 CTA Project

Why CTA – 2 CTA preemptive scheduler Use drives at full speed all of the time Single step scheduling vs the multi step scheduling of CASTOR with partial information Same tape format as CASTOR – only need to migrate metadata Full flat catalogue of all tape files can be used for disaster recovery Less networked components than CASTOR (no CUPV, VDQM or VMGR) CASTOR face to face meeting, May 2017 CTA Project

When is CTA Middle of 2017 Middle of 2018 End of 2018 Additional and redundant backups of AFS/NFS and LEP data Ready for small experiments Ready for LHC experiments Volunteer experiments to test are more than welcome! CTA and CASTOR use the same tape format. Only metadata will need to be migrated. No files will need to be copied between tapes. CASTOR face to face meeting, May 2017 CTA Project

Architecture details Shared storage concept New queueing system Only 2 daemons: front end for CLI & disk system interface (xrootd based) & cta-taped. New queueing system Based on Ceph (Rados) Only for transient data Each queue (per tape/tapepool) is an independent object Avoids a single huge queue table Allows storage of rich objects Separate file catalogue Based on usual relational DB For persistent data Can be replaced by another implementation cta-taped an adapted tapeserverd from CASTOR Multiple data transfer protocols already present (xroot (2 flavors), local file, Rados striper, (rfio in CASTOR)) URL based selection, file-by-file granularity CASTOR face to face meeting, May 2017 CTA Project

Software architecture Client’s system Front end servers Tape servers CTA CLI interface Xrootd server Tape session process Xrootd client lib CTA Xrootd plugin Scheduler Disk client Scheduler Sched. DB Catalogue EOS MGM Scheduler DB Xrootd client lib Catalogue Data Persistent control data Metadata (reports) CTA Metadata CatalogueDB (Oracle) SchedulerDB (Rados) EOS (Remote storage + MGM) CASTOR face to face meeting, May 2017 CTA Project

CTA scheduler DB & Catalogue OStoreDB: multi object logic Xrootd server CTA Xrootd plugin objectstore::objects serialization is here Scheduler Object store backend (2 flavours: Rados and VFS) Catalogue Scheduler DB OStoreDB is an implementation of the Scheduler DB interface Object store based Can leverage rados already running in our group (and its scaling) Can be implemented over a simple filesystem (testing, small scale system) Relies on lock, full reads and writes, and delete Serialization with Google protocol buffers Multithreading and asynchronous IO to achieve performance Catalogue has a similar layout See Steve’s presentation CASTOR face to face meeting, May 2017 CTA Project