Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna)

Slides:



Advertisements
Similar presentations
CERN Castor external operation meeting – November 2006 Olof Bärring CERN / IT.
Advertisements

16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
CASTOR Upgrade, Testing and Issues Shaun de Witt GRIDPP August 2010.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
16/4/2004Storage Resource Sharing with CASTOR1 Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov.
Course 6425A Module 9: Implementing an Active Directory Domain Services Maintenance Plan Presentation: 55 minutes Lab: 75 minutes This module helps students.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.
The protection of the DB against intentional or unintentional threats using computer-based or non- computer-based controls. Database Security – Part 2.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Functional description Detailed view of the system Status and features Castor Readiness Review – June 2006 Giuseppe Lo Presti, Olof Bärring CERN / IT.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
New stager commands Details and anatomy CASTOR external operation meeting CERN - Geneva 14/06/2005 Sebastien Ponce, CERN-IT.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
CASTOR status Presentation to LCG PEB 09/11/2004 Olof Bärring, CERN-IT.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Operational experiences Castor deployment team Castor Readiness Review – June 2006.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CASTOR 2 History, architecture, development process, running service and future plans SFT Group meeting – July 2006 Sebastien Ponce CERN / IT.
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
19 Copyright © 2004, Oracle. All rights reserved. Database Backups.
Storage & Database Team Activity Report INFN CNAF,
CASTOR new stager proposal CASTOR users’ meeting 24/06/2003 The CASTOR team.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR Overview.
Item 9 The committee recommends that the development and operations teams review the list of workarounds, involving replacement of palliatives with features.
CASTOR: possible evolution into the LHC era
Scalable sync-and-share service with dCache
CASTOR Giuseppe Lo Presti on behalf of the CASTOR dev team
Managing Multi-User Databases
Technical Design Technology choices
StoRM: a SRM solution for disk based storage systems
Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.
Lead SQL BankofAmerica Blog: SQLHarry.com
Service Challenge 3 CERN
CERN-Russia Collaboration in CASTOR Development
Castor services at the Tier-0
CTA: CERN Tape Archive Adding front-ends and back-ends Status report
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
CTA: CERN Tape Archive Overview and architecture
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
CASTOR: CERN’s data management system
Grid Coordination by Using the Grid Coordination Protocol
Presentation transcript:

Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna) CASTOR2@CNAF Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna)

Outline CASTOR1 at CNAF CASTOR2 architecture CASTOR2 deployment at CNAF Test results Conclusions

CASTOR1 Client: rfopen(…) RFIOD Stager ... Disk Servers RTCPD /castor/cnaf.infn.it/… NameServer Disk Servers RFIOD Stager stagerastor stager_castor staer_castor ... tape1,tape2 tape3… VMGR Tape Servers drive1, drive2, … RTCPD TapeDaemon VDQM

disk tape 7 stager: 4 LHC, 2 non-LHC, 1 SC ~15 disk server Tot = 75 TB 7 stager: 4 LHC, 2 non-LHC, 1 SC ~15 disk server 10 tape server disk STK 5500 6 LTO-2 drives with 1200 tapes (240 TB) 4 9940B drives (+ 3 to be installed next weeks) with 680 + 650 tapes (130 TB =>260 TB) Tot = 224 TB tape

CASTOR2 Architecture (1) Client: rfopen(…) Rmmaster RH Stager Scheduler Oracle DB Disk Servers Mover GC StagerJob MigHunter NameServer RTCPClientD Tape Servers TapeDaemon VMGR RTCPD VDQM DLF

CASTOR 2 Architecture (2) Database centric architecture “Surrounding” daemons are stateless Important operational decisions can be translated into SQL statements Preparation of migration or recall streams Weighting of file systems used for migration/recall Draining of disk servers or file systems Garbage collection decision Two databases supported : Oracle and MySQL (not ready) Request throttling thanks to the request handler Stateless components can be restarted/parallelized easily -> no single point of failure stager split in many independent services distinction between queries, user requests and admin requests fully scalable Disk access is scheduled All user requests are scheduled Advanced scheduling features for ‘free’ (e.g. fair-share) 2 schedulers provided : LSF and Maui (not ready)

CASTOR 2 Architecture (3) Plugable Mover Client can choose its mover rfio and rootd supported at this time. xrootd to come ? Dynamic migration/recall streams Multiple concurrent requests for same volume will be processed together New requests arriving after the stream has started are automatically added to the stream Configurable garbage collector policy depends on the service class decision implemented in SQL framework automatically deletes marked files Distributed Logging Facility used by all components

same tapesrvs as for Castor1 CNAF deployment stagerdb Oracle/RHE dlfdb Oracle/SL ns db Oracle/RHE oracle01 diskserv-san-13 castor-4 Castor1 services (vdqm, vmgr, ns, cupvd) RH, stager, MigHunter, rtcpclientd DLF, rmmaster, expertd LSF master castor castor-6 diskserv-san-13 castorlsf01 2x2TB 2x2TB 2x2TB 2x2TB diskserv-san-33 diskserv-san-34 diskserv-san-35 diskserv-san-36 same tapesrvs as for Castor1

Test (1) 100 job write 1GB files 100 job read 1GB files Average rate = 2.3 MB/s Total rate 230 MB/s Average rate = 2.7 MB/s Total rate 270 MB/s

Test (2) Write Read

Conclusions CASTOR2 services are stable and works fine. The admin interfaces is not mature. The installation is not easy but quattor gives a big help. There are no (known) limits in the # files on disk and provides a better logic for tape recalls (less rewinds, mounts and dismounts) Some more work is needed before production: DB configuration (archives log rotation, table space size, backups) tuning of # LSF slots for the disk servers experiences with admin tasks such as draining fs and servers… evaluate the LSF+Oracle overhead with smal files Next stress test will be the throughput phase of SC4.

Client Several interfaces provided No pending connection SRM provided rfcp, RFIO API backward compatible can talk to new and old stagers stager API and commands not backward compatible No pending connection opens a port calls Request Handler and closes connection waits on port for the Request Replier SRM provided backward compatible, can talk to new and old stagers

Migration The decision which file to migrate and when is entirely externalized to one or several ‘hunter’ processes Files to be migrated are linked to one or more “Streams” Stream is a container of files to be migrated Migration candidate is a TapeCopy with status TAPECOPY_WAITINSTREAMS that is associated with one or several Streams. When a migration candidate is associated with several Streams, it will be picked up by one of them. This allows for almost 100% drive utilization A running Stream is associated with a Tape. However, the same Stream may ‘survive’ several Tapes. The Stream is destroyed when there are no more eligible TapeCopy candidates Stream creation and linking of migration candidates to streams are pure DB operations Could be performed directly with a SQL script Several scripts for different policies can work concurrently A default ‘MigHunter’ is provided Optionally supports a legacy mode emulating the current CASTOR stager New stager will NOT segment files

Recall Recall differs from Migration in that it is usually executed on demand An active request is waiting for the file to be recalled However, with the new architecture the decision what to recall and when can be externalized (‘hunter’ processes) A recall candidate is TapeCopy associated with a tape Segments (Tape + Segment) Requests requiring a recall are scheduled like normal file access requests  scheduler can be configured to prevent massive recall attacks The “job” will simply put the DiskCopy and SubRequest in WAITRECALL status, create the tape Segment information in the catalogue and exit Use of the recallPolicy attribute in SvcClass: If no recall policy is defined (recallPolicy attribute is empty in SvcClass), the job triggers the recall immediately by setting the Tape status to >= TAPE_PENDING If a recall policy is defined, the TapeCopy and tape Segment are created without modifying the Tape status: If tape is already mounted it will automatically pick up the candidate Otherwise an offline process can later decide to trigger the recall by updating the Tape status.

Garbage collection Like for migration/recall a disk file garbage collection is triggered via a DB update Disk files (DiskCopy class in catalog) to be garbage collected are marked with a special status: DISKCOPY_GCCANDIDATE gcdaemon retrieves a list of local files to be removed and updates the catalog when the files have been removed (can be lazy since the DiskCopy is already marked for GC) The GC policy deciding which files to be removed is configurable per SvcClass and written in PL/SQL directly in the DB A gcWeight attribute of the disk copy is provided for externally setting its weight to be used when compared with other candidates This could be based on experiment policies: e.g. all files beginning with “ABC” should be given low weight for removal By default all weights are zero

Internal file replication Disk file replication of “hot” files is supported SvcClass attributes regulates the replication: maxReplicaNb: limits the maximum number of replicas allowed by the SvcClass replicationPolicy names a policy to be called if maxReplicaNb is not defined (≤0) Replication is performed on demand, when a job is started and the file is not on the scheduled filesystem If maxReplicaNb or replicationPolicy allows for it otherwise the file system will be forced via a job submission attribute

File system selection The file system selection is called from several places When scheduling access for a given client request When selecting the best migration candidate When selecting a file system for recalling a tape file The FileSystem table has several attributes updated by external policies based on load and status monitoring “free” is the free space on the file system “weight” reflects the current load calculated using an associated policy “fsDeviation” is the deviation to be subtracted from the weight every time a new stream is added to the file system. This assures that the same file system is not selected twice

‘Hunter’ processes The ‘Hunter’ processes are not strictly part of the stager itself Can be daemons or cron-jobs that run offline and independently of other CASTOR servers implement specific policies either via calls to the expert system (expertd) or via SQL queries to the catalog database The action taken by the Hunter would normally result in the triggering of a CASTOR task, e.g. Migration Recall Garbage collection Retry of tape exceptions Internal replication

Logging facility All new castor components use the Distributed Logging Facility (DLF) Log to files and/or database (Oracle or MySQL) Web based GUI for problem tracing using the DLF database The following services currently log to DLF rhserver stager rtcpclientd, migrator, recaller stagerJob MigHunter

DLF GUI

Instant performance views Cmonitd has been part of CASTOR since 2002 Central daemon collecting UDP messages from Tape movers (rtcpd) Tape daemon (mount/unmount) Original GUI written in Python GUI rewritten in Java (swing), September ‘04 Web start Drive performance time-series plots

Monitoring GUI