Castor dev Overview Castor external operation meeting – November 2006 Sebastien Ponce CERN / IT.

Slides:



Advertisements
Similar presentations
October Dyalog File Server Version 2.0 Morten Kromberg CTO, Dyalog LTD Dyalog’13.
Advertisements

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Technical Design Technology choices Core framework Castor Readiness Review – June 2006 Giuseppe Lo Presti, German Cancio, Sebastien Ponce CERN / IT.
16/9/2004Features of the new CASTOR1 Alice offline week, 16/9/2004 Olof Bärring, CERN.
CASTOR Upgrade, Testing and Issues Shaun de Witt GRIDPP August 2010.
1 Infrastructure Hardening. 2 Objectives Why hardening infrastructure is important? Hardening Operating Systems, Network and Applications.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Functional description Detailed view of the system Status and features Castor Readiness Review – June 2006 Giuseppe Lo Presti, Olof Bärring CERN / IT.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
CERN - IT Department CH-1211 Genève 23 Switzerland t CASTOR Status March 19 th 2007 CASTOR dev+ops teams Presented by Germán Cancio.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Report from CASTOR external operations F2F meeting held at RAL in February Barbara Martelli INFN - CNAF.
New stager commands Details and anatomy CASTOR external operation meeting CERN - Geneva 14/06/2005 Sebastien Ponce, CERN-IT.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Deployment and Operation Castor Delta Review – December Olof Bärring CERN / IT.
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Castor-dev planning and resources for 2007 Castor Development Team Castor Delta Review –December 2006 German Cancio, Giuseppe Lo Presti, Sebastien Ponce.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
Database Performance Eric Grancher - Nilo Segura Oracle Support Team IT/DES.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Overview of DMLite Ricardo Rocha ( on behalf of the LCGDM team.
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
CERN - IT Department CH-1211 Genève 23 Switzerland Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CASTOR Operations Face to Face 2006 Miguel Coelho dos Santos
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR Overview.
Nov 05, 2008, PragueSA3 Workshop1 A short presentation from Owen Synge SA3 and dCache.
SQL Database Management
CASTOR: possible evolution into the LHC era
Jean-Philippe Baud, IT-GD, CERN November 2007
Securing Network Servers
CASTOR Giuseppe Lo Presti on behalf of the CASTOR dev team
Netscape Application Server
Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.
Diskpool and cloud storage benchmarks used in IT-DSS
Giuseppe Lo Re Workshop Storage INFN 20/03/2006 – CNAF (Bologna)
BDII Performance Tests
Overview – SOE PatchTT December 2013.
CASTOR-SRM Status GridPP NeSC SRM workshop
Generator Services planning meeting
CERN-Russia Collaboration in CASTOR Development
Olof Bärring LCG-LHCC Review, 22nd September 2008
LCGAA nightlies infrastructure
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
LCG Monte-Carlo Events Data Base: current status and plans
Discussions on group meeting
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Data Management cluster summary
Introduction of Week 3 Assignment Discussion
CASTOR: CERN’s data management system
Presentation transcript:

Castor dev Overview Castor external operation meeting – November 2006 Sebastien Ponce CERN / IT

Sebastien Ponce (IT/FIO/FD) 2 Outline  The “new” CASTOR dev team  New features since last meeting  client, DLF, protocols, repack, tapes, GC,...  Bug fixes  disk server tuning, dead locks, bad states,...  Still to be done  build and release schema, security  Known bugs  very few now  deployments : repack 2, VDQM 2, SRM 2

Sebastien Ponce (IT/FIO/FD) 3 Who is CASTOR dev ? Sebastien Ponce Project Leader Code generation, LSF, DB Giuseppe Lo Presti DB, core framework, SRM, external institutes support Giulia Taurelli test suites, RFIO, VDQM 2, DB interface Dennis Waldron DLF, expert system Rosa Maria Garcia Rioja web site, 64 bits port, gridFTP v2, xroot Felix Ehm repack for CASTOR 2 Hugo Caçote tape related software Hardware team Olof Barring SRM 1, tape part Operation team External contributors from RAL, IHEP Moscow and JINR Dubna Miro Siket tape optimizations Arne Wiebalck tape related software Harware team

Sebastien Ponce (IT/FIO/FD) 4 New features (Client part)  RFIO api was made thread safe  protocol change that triggered a 2.1 release of CASTOR  castor-lib-compat package created containing a backward compatible libshift.so.2.0  A port range is now used for the stager callback  default [ ]  port number is chosen randomly within the set  still problems of concurrency at the bind/listen level  name server command lines for ACLs and links  nsln, nsgetacl and nssetacl  Improved stage_qry  -s option, regexps

Sebastien Ponce (IT/FIO/FD) 5 New features (Core)  New DLF  much quicker and no impact on main process  see Dennis' talk  Finished request are kept for some time  allows correct answer to stager_qry  Cleaning service introduced  removes old requests  removes very old errors  Database reconnections  when a DB connection is lost, it is automatically renewed  makes the service robust against network glitches

Sebastien Ponce (IT/FIO/FD) 6 New features (Core 2)  Garbage collection was improved  ORACLE jobs independent of other transactions handles GC requests queuing in a table  job is restarted by a trigger if needed  Support for 64 bits  done for a pure 64 bits install  compatibility tests to be done rfcp problems have been reported  LSF plugin / monitoring rewrite  mostly done  needs intensive debugging  See next talk

Sebastien Ponce (IT/FIO/FD) 7 New features (Protocols)  RFIO  merged with DPM code base (see Giulia's talk)  GridFTP v2  integrated, under tests, see Rosa's talk  Xroot  almost integrated, see Rosa's talk

Sebastien Ponce (IT/FIO/FD) 8 New features (Disk part)  XFS preallocation  avoids fragmentation on XFS file systems  RFIO support for O_DIRECT  avoids kernel paging for big files  Credits to Peter Kelemen and Andras Horvath from the CERN linux team  Net result should be much better performance ( see )  Never tested in production yet. Note that the O_DIRECT tuning requires SLC4 diskservers

Sebastien Ponce (IT/FIO/FD) 9 New features (Tape part)  Check of tape MIR validity  for LTO-3, 3592 and T10K  Tape Alerts reported  at release time to identify certain media/read-write hardware problems.  Support of the IBM 700GB tape  Changes on the compression reporting  in order to use the IBM's Non Volatile Caching  useful for writing small files.  Tapeserver running on SLC4  A patch removed from 2.6 kernel we have to use drivers Macro to get the same information.

Sebastien Ponce (IT/FIO/FD) 10 New components  SRM v2.2  already deployed on preproduction setup  see Shaun's presentations  Repack 2  will be used to repack 5 PB from 9940 to T10K and 3592s between now and June next year  see Felix's presentation  Test suite  > 100 tests implemented  discovered already many problems

Sebastien Ponce (IT/FIO/FD) 11 Bug fixes (short list)  Removed most of database dead locks  a single one remains on the alice instance, none on atlas  Fixed most of the bad status in database  e.g. no more diskcopy staying in waitDisk2DiskCopy  Removed accumulations in the database  better cleaning mostly in case of errors  Stager permissions fixed  now respects ACLs and Cupv priviledges needed for repack and for admin purposes  takes properly the open flags into account  Fixed stager_qry to be immune to large number of DiskCopies for a given file  usually failed ones when a file is overwritten > 500 times

Sebastien Ponce (IT/FIO/FD) 12  Remaining deadlock (note singular)  in fileDeletedProc with itself  leaves DiskCopies in BEINGDELETED (but they were)  Creation of readonly files fails  create them normally, change permissions afterward  Problems with putDone on multiple files  only first one taken into account ?  Problems with bind/listen  not “thread safe”  lead to strange situations where 2 requests use the same port and mix answers  not visible unless you try very hard and remove randomness Known bugs

Sebastien Ponce (IT/FIO/FD) 13 Future plans (major items)  Major improvements of the build system  ongoing, trying CMT  targeting ETICS  see Giuseppe's talk  Security  Authentication  Authorization VOMS support ? CUPV rewrite ?  Port to other platforms  essentially windows and MacOS  only for the client part  Please comment on what you're needing

Sebastien Ponce (IT/FIO/FD) 14

Sebastien Ponce (IT/FIO/FD) 15 stager_qry -s POOL default CAPACITY 3.64T FREE 3.62T(99%) RESERVED 0( 0%) DiskServer lxfsrk4703 DISKSERVER_PRODUCTION CAPACITY 3.64T FREE 3.62T(99%) RESERVED 0( 0%) FileSystems STATUS CAPACITY FREE RESERVED GCBOUNDS /srv/castor/01/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30 /srv/castor/02/ FILESYSTEM_PRODUCTION 1.36T 1.36T(99%) 0( 0%) 0.20, 0.30 /srv/castor/03/ FILESYSTEM_PRODUCTION 1.36T 1.34T(98%) 0( 0%) 0.20, 0.30 POOL dteam CAPACITY 46.56T FREE 41.60T(89%) RESERVED 30.27G( 0%) DiskServer lxfsra3001 DISKSERVER_PRODUCTION CAPACITY 4.66T FREE 4.20T(90%) RESERVED 3.91G( 0%) FileSystems STATUS CAPACITY FREE RESERVED GCBOUNDS /srv/castor/01/ FILESYSTEM_PRODUCTION 1.16T 1.02T(87%) M( 0%) 0.20, 0.30 /srv/castor/02/ FILESYSTEM_PRODUCTION 1.75T 1.61T(92%) M( 0%) 0.20, 0.30 /srv/castor/03/ FILESYSTEM_PRODUCTION 1.75T 1.57T(90%) 1.95G( 0%) 0.20, 0.30 DiskServer lxfsra3002 DISKSERVER_PRODUCTION CAPACITY 4.66T FREE 4.20T(90%) RESERVED 4.88G( 0%) FileSystems STATUS CAPACITY FREE RESERVED GCBOUNDS /srv/castor/01/ FILESYSTEM_PRODUCTION 1.16T 1.06T(90%) 1.95G( 0%) 0.20, 0.30 /srv/castor/02/ FILESYSTEM_PRODUCTION 1.75T 1.62T(92%) 1.95G( 0%) 0.20, 0.30 /srv/castor/03/ FILESYSTEM_PRODUCTION 1.75T 1.53T(87%) M( 0%) 0.20, POOL repack CAPACITY 6.11T FREE 6.11T(99%) RESERVED 0( 0%) DiskServer lxfs6132 DISKSERVER_PRODUCTION CAPACITY 1.53T FREE 1.53T(99%) RESERVED 0( 0%) FileSystems STATUS CAPACITY FREE RESERVED GCBOUNDS /srv/castor/01/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30 /srv/castor/02/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30 /srv/castor/03/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30 DiskServer lxfs6134 DISKSERVER_PRODUCTION CAPACITY 1.53T FREE 1.53T(99%) RESERVED 0( 0%) FileSystems STATUS CAPACITY FREE RESERVED GCBOUNDS /srv/castor/01/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30 /srv/castor/02/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30 /srv/castor/03/ FILESYSTEM_PRODUCTION G G(99%) 0( 0%) 0.20, 0.30