Federated Data Storage System Prototype for LHC experiments

Slides:



Advertisements
Similar presentations
OptorSim: A Replica Optimisation Simulator for the EU DataGrid W. H. Bell, D. G. Cameron, R. Carvajal, A. P. Millar, C.Nicholson, K. Stockinger, F. Zini.
Advertisements

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
High Performance Computing Course Notes Grid Computing.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Experience of xrootd monitoring for ALICE at RDIG sites G.S. Shabratova JINR A.K. Zarochentsev SPbSU.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
ICHEP06, 29 July 2006, Moscow RDIG The Russian Grid for LHC physics analysis V.A. Ilyin, SINP MSU V.V. Korenkov, JINR A.A. Soldatov, RRC KI LCG.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
Future home directories at CERN
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Evaluating distributed EOS installation in Russian Academic Cloud for LHC experiments A.Kiryanov 1, A.Klimentov 2, A.Zarochentsev 3. 1.Petersburg Nuclear.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Operations in R ussian D ata I ntensive G rid Andrey Zarochentsev SPbSU, St. Petersburg Gleb Stiforov JINR, Dubna ALICE T1/T2 workshop in Tsukuba, Japan,
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
EMI is partially funded by the European Commission under Grant Agreement RI Future Proof Storage with DPM Oliver Keeble (on behalf of the CERN IT-GT-DMS.
Storage discovery in AliEn
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
CASTOR: possible evolution into the LHC era
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Extension of the INFN Tier-1 on external resources
WLCG IPv6 deployment strategy
Scalable sync-and-share service with dCache
WP18, High-speed data recording Krzysztof Wrona, European XFEL
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Status of the SRM 2.2 MoU extension
Eos at 6,500 kilometres wide An Australian Experience
Distributed cross-site storage with single EOS end-point
Reminder : The collaboration
dCache “Intro” a layperson perspective Frank Würthwein UCSD
StratusLab Final Periodic Review
StratusLab Final Periodic Review
dCache – protocol developments and plans
dCache Scientific Cloud
Russian Regional Center for LHC Data Analysis
Victor Abramovsky Laboratory “GRID technology in modern physics”
RDIG for ALICE today and in future
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Network Requirements Javier Orellana
Brookhaven National Laboratory Storage service Group Hironori Ito
Ákos Frohner EGEE'08 September 2008
DCache things Paul Millar … on behalf of the dCache team.
CTA: CERN Tape Archive Overview and architecture
Storage Virtualization
Replication Middleware for Cloud Based Storage Service
Grid Canada Testbed using HEP applications
INFNGRID Workshop – Bari, Italy, October 2004
Russian academic institutes participation in WLCG Data Lake project
Presentation transcript:

Federated Data Storage System Prototype for LHC experiments and data intensive science Andrey Kiryanov, Alexei Klimentov, Artem Petrosyan, Andrey Zarochentsev on behalf of BigData lab @ NRC “KI” and Russian Federated Data Storage Project 1 1

Russian federated data storage project In the fall of 2015 the “Big Data Technologies for Mega- Science Class Projects" laboratory at NRC "KI" has received a Russian Fund for Basic Research (RFBR) grant to evaluate federated data storage technologies. This work has been started with creation of a storage federation for geographically distributed data centers located in Moscow, Dubna, St. Petersburg, and Gatchina (all are members of Russian Data Intensive Grid and WLCG). This project aims at providing a usable and homogeneous service with low requirements for manpower and resource level at sites for both LHC and non-LHC experiments.

Basic Considerations Single entry point Scalability and integrity: it should be easy to add new resources Data transfer and logistics optimisation: transfers should be routed directly to the closest disk servers avoiding intermediate gateways and other bottlenecks Stability and fault tolerance: redundancy of core components Built-in virtual namespace, no dependency on external catalogues. EOS and dCache seemed to satisfy these requirements.

Federation topology RUSSIA EOS dCache SPbSU PNPI JINR NRC «KI» SINP MSU MEPhI ITEP CERN DESY

Initial testbed proof of concept tests and optimal settings evaluation PNPI SPbSU Managers File servers SINP JINR NRC “KI” ITEP CERN MEPhI Metadata File I/O Managers: entry point, virtual name space. Metadata sync between managers. File servers: Data storage servers. Clients: user interface for synthetic and real-life tests. Clients: xrootd client, voms client, test and expriment tools

Extended testbed for full-scale testing Managers: entry point, virtual name space. Metadata sync between managers. File servers: data storage servers. Two types of storages – highly reliable and normal. Clients: user interface for synthetic and real-life tests. NRC “KI” Default FST MEPhI SINP JINR PNPI SPbSU ITEP CERN Metadata File I/O Managers Clients: xrootd client, voms client, test and expriment tools File servers

Test goals, methodology and tools Set up a distributed storage, verify basic properties, evaluate performance and robustness Data access, reliability, replication Tools: Synthetic tests: Bonnie++: file and metadata I/O test for mounted file systems (FUSE) xrdstress: EOS-bundled file I/O stress test for xrootd protocol Experiment-specific tests: ATLAS TRT Athena-based reconstruction program for high multiplicity (I/O and CPU intensive) ALICE ROOT-based event selection program (I/O intensive) Network monitoring: perfSONAR: a widely-deployed and recognized tool for network performance measurements Software components: Base OS: CentOS 6, 64bit Storage system: EOS Aquamarine, dCache 2.16 Authentication scheme: GSI / X.509 Access protocol: xrootd

Network performance measurements Throughput > 100Mb/s 10Mb/s ÷ 100Mb/s < 10Mb/s No data Throughput from Site X to Site Y Site Y to Site X X Y

Bonnie++ test with EOS on initial testbed: MGM at CERN, FSTs at SPbSU and PNPI Data read-write Metadata create-delete pnpi-local – local test on PNPI FST cern-all – UI at CERN, MGM at CERN, Federated FST spbu-local – local test on SPbSU FST sinp-all – UI at SINP, MGM at CERN, Federated FST pnpi-all – UI at PNPI, MGM at CERN, Federated FST cern-pnpi – UI at CERN, MGM at CERN, FST at PNPI spbu-all – UI at SPbSU, MGM at CERN, Federated FST cern-spbu – UI at CERN, MGM at CERN, FST at SPbSU metadata I/O performance depends solely on a link between client and manager data I/O performance does not depend on a link between client and manager

Experiment-specific tests with EOS for two protocols: pure xrootd and locally-mounted file system (FUSE) Experiment’s applications are optimized for different protocols (remote vs. local)

Our first experience with EOS and intermediate conclusion Basic stuff works as expected Some issues were discovered and communicated to developers Metadata I/O performance depends solely on a link between client and manager while data I/O performance does not depend on it Experiment-specific tests for different data access patterns have contradictory preferences with respect to data access protocol (pure xrootd vs. FUSE-mounted filesystem)

Data placement policies Number of data replicas depends from data family (replication policy has to be defined by experiments / user community); Federated storage may include reliable sites(“T1s”) and less reliable sites (“Tns”); Taking aforementioned into account we have three data placement scenarios which can be individually configured per dataset: Scenario 0: Dataset is randomly distributed among several sites Scenario 1: Dataset is located as close as possible to the client. If there’s no close storage, the default reliable one is used (NRC “KI” in our tests) Scenario 2: Dataset is located as in scenario 1 with secondary copies as in scenario 0 These policies can be achieved with EOS geotags and dCache pool and replica managers.

Synthetic data placement stress test for EOS Stress test procedure is as follows: Scenario 0: Files are written to and read from random file servers Scenario 1: Files are written to and read from a closest file server if there is one or the default file server at NRC “KI” Scenario 2: Primary replicas are written as in Scenario 1, secondary replicas as in Scenario 0. Reads are redirected to a closest file server if there is one or to the default file server at NRC “KI” – this client can find data on a closest file server – closest and default file server is the same Xrootd write stress test Presence of the closest server does not bring much of improvement, but presence of the high-performance default one does. Replication slows down data placement because EOS creates replicas during the transfer. Xrootd read stress test

Synthetic data placement stress test for dCache Stress test procedure is as follows: Scenario 0: Files are randomly scattered among several file servers Scenario 1: Files are written to and read from a closest file server if there is one or the default file server at NRC “KI” Scenario 2: Primary replicas are written as in Scenario 1, secondary replicas as in Scenario 0. Reads are redirected to a closest file server if there is one or to the default file server at NRC “KI” – this client can find data on a closest file server – closest and default file server is the same Xrootd write stress test dCache creates replicas in the background after the transfer. Xrootd read stress test

First experience with dCache Well-known and reliable software used on many T1s Different software platform (Java) and protocol implementations dCache xrootd implementation does not support FUSE mounts This is supposed to be fixed in dCache 3 within a collaborate project between NRC “KI”, JINR and DESY (thanks to Ivan Kadochnikov) No built-in security for control channel between Manager and File servers Firewall-based access control works well for a single site This is solved by stunnel in dCache 3, but we didn’t have a chance to test it yet No built-in Manager redundancy Also part of dCache 3 feature set

ATLAS test on initial testbed for different protocols. Times of five repetitions of the same test along with a mean value. Four test conditions on the same federation: local data, data on EOS accessed via xrootd, data on EOS accessed via FUSE mount, data on dCache accessed via xrootd. As we can see, first FUSE test with a cold cache takes a bit more time than the rest, but it’s still much faster than accessing data directly via xrootd. First run with dCache also shows this pattern (internal cache warm-up?) Tests were run from PNPI UI

Summary We have set up a working prototype of federated storage: Seven Russian WLCG sites organized as one homogeneous storage with single entry point All basic properties of federated storage are respected We have conducted an extensive validation of the infrastructure using synthetic and experiment-specific tests We have exploited EOS as our first technological choice and we have enough confidence to say that it behaves well and has all the features we need We have finished testing of dCache 2 and our first results look very promising. We’re on our way to dCache 3 testing. One of the major concerns expressed so far was xroot not being a standard protocol. While standard HTTP clearly misses some of the necessary features, HTTP/2 feature set looks much closer to what we need. We need a well-defined standard protocol supported by major software stacks as both EOS and dCache are going to stick around for at least another decade.

Acknowledgements This talk drew on presentations, discussions, comments and input from many. Thanks to all, including those we’ve missed. P. Hristov and D. Krasnopevtsev for help with experiment-specific tests. A. Peters for help with undestending EOS internals. P. Fuhrmann and dCache core team for help with dCache. This work was funded in part by the Russian Ministry of Science and Education under Contract No. 14.Z50.31.0024 and by the Russian Fund for Fundamental Research under contract “15-29-0794_2 офи_м” Authors express appreciation to SPbSU Computing Center, PNPI Computing Center, NRC “KI” Computing Center, JINR Cloud Services, ITEP, SINP and MEPhI  for provided resources.

Thank you!