GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief.

Slides:



Advertisements
Similar presentations
30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
Advertisements

GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Peter Berrisford RAL – Data Management Group SRB Services.
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Azure Services Platform Piotr Zierhoffer. Agenda Cloud? What is Azure? Environment Basic glossary Architecture Element description Deployment.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Generic policy rules and principles Jean-Yves Nief.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
IRODS usage at CC-IN2P3 Jean-Yves Nief. Talk overview What is CC-IN2P3 ? Who is using iRODS ? iRODS administration: –Hardware setup. iRODS interaction.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Introduction to iRODS Jean-Yves Nief. Talk overview Data management context. Some data management goals: –Storage virtualization. –Virtualization of the.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
March 6, 2009Tofigh Azemoon1 Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
BaBar RAL Staff Talks. 4th April 2005 RAL BaBar Staff Talk 2 Outline The Group  Fergus Wilson – Group Leader since Oct 2004  Tim Adye  Emmanuel Olaiya.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Data Distribution and Management Tim Adye Rutherford Appleton Laboratory BaBar Computing Review 9 th June 2003.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Managing Petabytes of data with iRODS at CC-IN2P3
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Xrootd Proxy Service Andrew Hanushevsky Heinz Stockinger Stanford Linear Accelerator Center SAG September-04
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
12 Mars 2002LCG Workshop: Disk and File Systems1 12 Mars 2002 Philippe GAILLARDON IN2P3 Data Center Disk and File Systems.
Lynda : Lyon Neuroimaging Database and Applications (1) Institut des Sciences Cognitives UMR 5015 CNRS ; (2) parallel computing ENS-Lyon ; (3)Centre de.
Jean-Yves Nief, CC-IN2P3 CC-IN2P3 KEK-CCIN2P3 meeting on Grids. September 11th – 12th, 2006.
Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
IRODS at CC-IN2P3: overview Jean-Yves Nief. Talk overview iRODS in production: –Hardware setup. –Usage. –Prospects. iRODS developpements in Lyon: –Scripts.
CCIN2P3 Site Report - BNL, Oct 18, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center.
Management of the Data in Auger Jean-Noël Albert LAL – Orsay IN2P3 - CNRS ASPERA – Oct Lyon.
Eleonora Luppi INFN and University of Ferrara - Italy
U.S. ATLAS Grid Production Experience
SAM at CCIN2P3 configuration issues
Universita’ di Torino and INFN – Torino
CC-IN2P3 Jean-Yves Nief, CC-IN2P3 HEPiX, SLAC
CC and LQCD dimanche 13 janvier 2019dimanche 13 janvier 2019
The LHCb Computing Data Challenge DC06
Presentation transcript:

GDB meeting - Lyon - 16/03/05 An example of data management in a Tier A/1 Jean-Yves Nief

GDB meeting - Lyon - 16/03/052 Overview of CC-IN2P3 CC-IN2P3: « mirror » site of Slac for BaBar since November 2001: –real data. –simulation data. ( total = 290 TB: Objectivity eventstore (obsolete) + ROOT eventstore (new data model) ) Provides the services needed to analyze these data by all the BaBar physicists (data access). Provides the data in Lyon within 24/48 hours after production (data management). Provides resources for the simulation production.

GDB meeting - Lyon - 16/03/053 Data access.

GDB meeting - Lyon - 16/03/054 Overview of CC-IN2P3 (general considerations) Large volume of data (100’s TB). Mainly, non modified data(write once, read many times). Number of clients accessing the data in // (100’s to 1000’s): –Performant access necessary: Latency time reduced. –Data volume / demand increasing over time: Scalability. Using distributed architectures: –Fault tolerance. Hybrid storage (tapes + disk): –Transparent access to the data on tapes. –Transparent disk cache management.

GDB meeting - Lyon - 16/03/055 BaBar CC-IN2P – 2004: ~ 20-30% of the CPU available. Up to 600 users’ jobs running in //. « Distant access » of the Objy and root files from the batch worker (BW):  random access to the files: only the objects needed by the client are transfered to the BW (~kB per request).  hundreds of connections per server.  thousands of requests per second.

Data access model Client HPSS Master server: Xrootd / Objy T1.root (1) (4) (2) T1.root ? (etc…) Slave server: Xrootd / Objy (5) (3) (1) + (2): load balancing (6): random access 2 servers (260 To) Slave server: Xrootd / Objy (6) 15 data servers – 40 TB of disk cache (4) + (5): dynamic staging

GDB meeting - Lyon - 16/03/057 Xrootd for the data access (I). Scalable. Very performant (trash NFS!). Fault tolerant (server failures don’t prevent the service to continue). Lots of freedom in the site configuration: –Choice of the hardware, OS. –Choice of the MSS (or no MSS), protocol being used for the dialog MSS/Xrootd (ex: RFIO in Lyon). –Choice of the architecture (ex: proxy services).

GDB meeting - Lyon - 16/03/058 Xrootd for the data access (II). Already being used in Lyon by other experiments (ROOT framework): –D0 (HEP). –INDRA (Nuclear Physics). –AMS (astroparticle). Can be used outside the ROOT framework (POSIX client interface): Ex: could be used in Astrophysics for example (FITS files).  Xrootd: very good candidate for data access at the PetaByte scale. BUT….

GDB meeting - Lyon - 16/03/059 Data structure: the fear factor (I). A performant data access model depends also on this. Deep copies (full copy of subsets of the entire data sample) vs « pointers’ » files (only containing pointers to other files) ? Deep copies« Pointers » files - duplicated data - ok in a «full disk» scenario - ok if used with a MSS (if not too many deep copies!) - no data duplication - ok in a «full disk» scenario - potentially very stressful on the MSS (VERY BAD)

GDB meeting - Lyon - 16/03/0510 Data structure: the fear factor (II). In the BaBar Objectivity event store: –Usage of pointer skims: very inefficient  people build their own deep copies. –For a single physics event: Data spread over several databases.  At least 5 files opened (staged) for one event! Deep copies are fine, unless there are too many of them!!!  data management more difficult, cost increasing (MSS, disk). The best data access model can be ruined by a bad data organization.

GDB meeting - Lyon - 16/03/0511 Dynamic staging. What is the right ratio ratio (disk cache / tape) ? Very hard to estimate, no general rules! It depends on: –The data organization (« pointers » files? …). –The data access pattern (number of files « touched » per jobs, total number of files potentially being « touched » by all the jobs per day).  Estimate measuring (providing files are read more than once): -lifetime of the files on the disk cache. -time between 2 restaging of the same file. Right now for BaBar ratio = 44 % (for the ROOT format)  OK, could be less ( ~30% at the time of Objectivity ). Extreme cases: Eros (Astrophysics) ratio = 2.5% is OK (studying one area of the sky at a time).

GDB meeting - Lyon - 16/03/0512 Data management.

GDB meeting - Lyon - 16/03/0513 Data import to Lyon. Data available within 24/48 hours: –Hardware: performant network, servers configuration should scale. –Software: performant and robust data management tools. Since January 2004, using SRB (Storage Resource Broker): –Grid middleware developed by SDSC (San Diego, CA). –Virtualized interface to heterogeneous storage devices (disk, tape systems, databases). –Portable on many platforms (Linux, Solaris, AIX, Mac OS X, Windows). –Handling users, access rights, replica, meta data and many, many more. –API available in various languages (C, Java, Perl, Python), Web interfaces. –Used in production in many areas: HEP, biology, Earth science, astrophysics. –Save a lot of time in developing performant and automatic applications for data shipment.

SRB and BaBar. Massive transfers (170,000 files, 95 TB). Peak rate: 3 TB / day tape to tape (with 2 servers on both side). CC-IN2P3 HPSS/SLACHPSS/Lyon SRB SLAC (Stanford, CA) (1) (2) (3) RAL (Oxford, UK)

Fermilab (US)  CERN SLAC (US)  IN2P3 (FR) 1 Terabyte/day SLAC (US)  INFN Padva (IT) Fermilab (US)  U. Chicago (US) CEBAF (US)  IN2P3 (FR) INFN Padva (IT)  SLAC (US) U. Toronto (CA)  Fermilab (US) Helmholtz-Karlsruhe (DE)  SLAC (US) DOE Lab  DOE Lab SLAC (US)  JANET (UK) Fermilab (US)  JANET (UK) Argonne (US)  Level3 (US) Argonne  SURFnet (NL) IN2P3 (FR)  SLAC (US) Fermilab (US)  INFN Padva (IT) Example of network utilization on ESNET (US): 1 server.

GDB meeting - Lyon - 16/03/0516 Data import: conclusion. SRB: –Very powerful tool for data management. –Robust and performant. –Large community of users in many fields. Pitfalls: –Huge amount of files to handle. –If a some of them missing:  Should be easy to track down the missing files: Logical File Name  Physical File Name (was not the case within Objectivity framework).  Good data structure important.

GDB meeting - Lyon - 16/03/0517 Simulation production.

GDB meeting - Lyon - 16/03/0518 BaBar simulation production. For the last production: SP6. More than 20 production sites. Data produced at each sites shipped to SLAC and redistributed to the Tier 1. CC-IN2P3: 11% of the prod. 2nd largest producer. but ~ 80% of the prod in non Tier 1 sites. activity completly distributed. Important role of the non Tier 1 sites.

GDB meeting - Lyon - 16/03/0519 Conclusion. 1.Data access / structure model: the most important part of the story. 2.Xrootd: very good answer for performant, scalable and robust data access. 3.Interface SRM / Xrootd: valuable for LCG. 4.Ratio ( disk space / tape ): very hard to estimate. Needs at least experience with a test-bed (after having answered to 1.). 5.Data management: SRB great! 6.Lessons learned from past errors: computing model « lighter ».