AMS02 Data Volume, Staging and Archiving Issues AMS Computing Meeting CERN April 8, 2002 Alexei Klimentov.

Slides:

Advertisements

Similar presentations

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.

Advertisements

Deutsches Zentrum für Luft- und Raumfahrt e.V. Bench mark study for new technology archiving devices H.-J. Wolf K.-D Mißling, G. M.Pinna CEOS Subgroup.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Distributed IT Infrastructure for U.S. ATLAS Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

5 Nov 2001CGW'01 CrossGrid Testbed Node at ACC CYFRONET AGH Andrzej Ozieblo, Krzysztof Gawel, Marek Pogoda 5 Nov 2001.

Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.

Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.

Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.

Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Storage Solutions The use case at the National Library of the.

BACKUP/MASTER: Immediate Relief with Disk Backup Presented by W. Curtis Preston VP, Service Development GlassHouse Technologies, Inc.

CT NIKHEF June File server CT system support.

Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.

HEPIX 3 November 2000 Current Mass Storage Status/Plans at CERN 1 HEPIX 3 November 2000 H.Renshall PDP/IT.

CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Storage Survey and Recent Acquisition at LAL Michel Jouvin LAL / IN2P3

AMS Computing Y2001-Y2002 AMS Technical Interchange Meeting MIT Jan 22-25, 2002 Vitali Choutko, Alexei Klimentov.

CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.

CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.

AMS TIM, CERN Apr 12, 2005 AMS Computing and Ground Centers Status Report Alexei Klimentov —

Maintaining File Services. Shadow Copies of Shared Folders Automatically retains copies of files on a server from specific points in time Prevents administrators.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005.

Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.

Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.

Computing for LHCb-Italy Domenico Galli, Umberto Marconi and Vincenzo Vagnoni Genève, January 17, 2001.

Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606

Systems in AMS02 AMS July 2003 Computing and Ground MIT Alexei Klimentov —

HPSS for Archival Storage Tom Sherwin Storage Group Leader, SDSC

CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.

US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.

CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.

Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.

PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.

CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.

1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.

14 th April 1999CERN Site Report, HEPiX RAL. A.Silverman CERN Site Report HEPiX April 1999 RAL Alan Silverman CERN/IT/DIS.

1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.

 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.

Status of the new NA60 “cluster” Objectives, implementation and utilization NA60 weekly meetings Pedro Martins 03/03/2005.

CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team

Storage & Database Team Activity Report INFN CNAF,

26. Juni 2003Bernd Panzer-Steindel, CERN/IT1 LHC Computing re-costing for for the CERN T0/T1 center.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,

Integrating Disk into Backup for Faster Restores

Local Area Networks, 3rd Edition David A. Stamper

PC Farms & Central Data Recording

Experiences with Large Data Sets

Vanderbilt Tier 2 Project

Bernd Panzer-Steindel, CERN/IT

Update on Plan for KISTI-GSDC

The INFN TIER1 Regional Centre

Bernd Panzer-Steindel, CERN/IT

The INFN Tier-1 Storage Implementation

ALICE Computing Upgrade Predrag Buncic

CASTOR: CERN’s data management system

Presentation transcript:

AMS02 Data Volume, Staging and Archiving Issues AMS Computing Meeting CERN April 8, 2002 Alexei Klimentov

A.Klimentov AMS Computing Meeting, CERN, Apr Outline  AMS 02 data volume  AMS/CASPUR Technical Meeting – Mar 2002  Projected Characteristics for disks, processors and tapes  AMS data storage issues

A.Klimentov AMS/CASPUR Technical Meeting, Bologna, Mar AMS Data Volume (Tbytes) Data/ Year Total Raw ESD Tags Data& ESD MC Grand Total ~400 STS91 AMS02 on ISS

A.Klimentov AMS Computing Meeting, CERN, April AMS/CASPUR technical meeting, Bologna Mar, 2002 Participants : V.Bindi,M.Boschini, D.Casadei, A.Contin, V.Choutko, A.Klimentov, A.Maslennikov, F.Palmonari, PG.Rancoita, PP.Ricci, C.Sbara, P.Zuccon Topic : Archiving and staging strategy, AMS02 data volume To propose coherent scheme for AMS data storage in SOC and Remote center(s). Possible solutions : - disks servers - staging (tapes+disks) - outsourcing (CASTOR)

A.Klimentov AMS Computing Meeting, CERN, Apr Staging - Staging system is a generic name for a tape-to-disk migration tool. The files are migrated by user before they are about to be accessed on the disk. Migration of the disk files to tape may be automatic or manual. - Older known staging implementations required the user to keep track of his/her tape files (old CERN staging) - CASPUR flavour (in production since 1997) does the tape/file bookkeeping on behalf of the user. It uses NFS, and features a fairly easy installation and management. - CASTOR (CERN, project started in 1999) gives a user an option to migrate files both manually, and via the specially modified I/O calls from within a program. Uses a fast data transfer protocol (RFIO). Installed and maintained by CERN IT since 2000, currently used by COMPASS to store raw data and ESD, also ALICE and CMS made I/O tests. Currently the primary option for LHC experiments.

A.Klimentov AMS Computing Meeting, CERN Apr Projected characteristics for disks, processors and tapes Components Intel/AMD PC Dual-CPU Intel PII, rated at 450 MHz, 512 MB RAM. 7.5 kUS$ Dual-CPU Intel, Rated at 2.2 GHz, 1GB RAM and SCSI and IDE RAID controllers 7 kUS$ Dual-CPU rated at 8GHz, 2GB RAM and IDE RAID controller 5 kUS$ Magnetic disk 18 GByte SCSI 80 US$/Gbyte SG 180 GByte SCSI 10 US$/Gbyte WD 120 Gbyte IDE 2 US$/Gbyte IDE-FC 5.5 US$/Gbyte SCSI 700 Gbyte 2 US$/Gbyte IDE 800 Gbyte 0.6 US$/Gbyte IDE-FC 1.3 US$ /Gbyte Magnetic tape DLT 40 GB compressed 3 US$/Gbyte SDLT and LTO 200 GB compressed 0.8 US$/Gbyte ? 600 GB compressed 0.3 US$/Gbyte

A.Klimentov AMS Computing Meeting, CERN, April AMS staging and archiving system : requirements and considerations Storage strategy might be different for raw, ESD and MC data. All data must be archived. At least two copies of raw and ESD are required. I believe that data must be under control of AMS collaboration Archiving system should be scalable and independent from the HW technology Data Volume TB TB Throughput 2TB/day 23MB/sec

A.Klimentov AMS Computingl Meeing, CERN, Apr Cost estimation (I) Disks servers TB/Year RAID5 2.1 TB / server 3-4 servers/ year 23.3 kUS$/server/ % disk’s price drop/year, migration to IDE disks system 197 kUS$/total 6.2 US$/GByte 2006 and beyond  100 TB/Year  RAID5 5.6 TB /server  servers/year  8.8 kUS$/server/2006  50% disk’s price drop/year  411 kUS$/total  1.4 US$/GB

A.Klimentov AMS Computing Meeting, CERN, Apr Cost estimation (II) Staging TB/Year LTO Library 58 kUS$ 2 servers 10 kUS$ Cartridg./year FC switch 15 kUS$ 0.8 TB disks/year 30% IDE-FC disk’s price drop/year 111 kUS$/total 3.5 US$/GB 2006 and beyond  100 TB/Year  LTO Libray/ biennial  2 servers / year  Cartridg. /year  FC switch 15 kUS$  10 TB disks/ year  30% IDE-FC disk’s price drop/year  300 kUS$/total  1 US$/GB

A.Klimentov AMS Computing Meeting, CERN, Apr Cost estimation (III) Castor TB/Year 1.8 US$/GByte 57.6 kUS$/total 2006 and beyond  100 TB/Year  0.8 US$/GByte  240 kUS$/total

A.Klimentov AMS Computing Meeting, CERN, Apr Storage Solution (Summary) DiskServersStagingCASTOR System Complexity High : 50 servers 0.5 PB online Medium : 8 servers 0.05 PB online Low Cost kUS$ (2002/2006) 197/411111/ /240 Data accessReal-time5-10 mins delay mins delay Manpower 0.5 FTE 0.1 FTE System availability (short/long term ) fall 2002/ 2005 (R&D req) May 2002/ ? Special IssueAMS controlled CERN & AMS controlled

A.Klimentov AMS Computing Meeting, CERN, Apr Conclusion  CASTOR might be the best solution for the short term and MC data storage, CERN central maintenance is one of its advantages. I won’t suggest to use CASTOR for AMS critical applications and one should note that due to CERN budget cut the cost/GB can be changed for non-CERN experiments and the priority always will be given to LHC groups.  Disk Servers solution is still too expensive to store ALL data, it also increases the complexity of the system (even if one assumes that the same servers will be used for data processing), for the Raw data and selected ESD it might be the way how we will proceed  Staging system represents the most cost/efficient solution for a case when AMS maintain full control of data. For the experiment lifetime the overall cost of staging system will be only 25% higher when the CASTOR. R&D requires to prove “CASPUR system” scalability to hundreds of Tbytes data volume and multi- servers/data movers proccesses.