Russian academic institutes participation in WLCG Data Lake project

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Module – 7 network-attached storage (NAS)
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CERN and Computing … … and Storage Alberto Pace Head, Data.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Evaluating distributed EOS installation in Russian Academic Cloud for LHC experiments A.Kiryanov 1, A.Klimentov 2, A.Zarochentsev 3. 1.Petersburg Nuclear.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
7. Grid Computing Systems and Resource Management
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
CernVM-FS vs Dataset Sharing
Accessing the VI-SEEM infrastructure
WLCG IPv6 deployment strategy
Integrating Disk into Backup for Faster Restores
DPM at ATLAS sites and testbeds in Italy
Status: ATLAS Grid Computing
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Dynamic Storage Federation based on open protocols
Ian Bird WLCG Workshop San Francisco, 8th October 2016
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Ricardo Rocha ( on behalf of the DPM team )
Virtualization and Clouds ATLAS position
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Vincenzo Spinoso EGI.eu/INFN
Computing models, facilities, distributed computing
Diskpool and cloud storage benchmarks used in IT-DSS
Managing Storage in a (large) Grid data center
Eos at 6,500 kilometres wide An Australian Experience
Distributed cross-site storage with single EOS end-point
Virtual laboratories in cloud infrastructure of educational institutions Evgeniy Pluzhnik, Evgeniy Nikulchev, Moscow Technological Institute
POW MND section.
Service Challenge 3 CERN
Direct Attached Storage and Introduction to SCSI
Christos Markou Institute of Nuclear Physics NCSR ‘Demokritos’
Russian Regional Center for LHC Data Analysis
Federated Data Storage System Prototype for LHC experiments
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
UK Status and Plans Scientific Computing Forum 27th Oct 2017
R&D for HL-LHC from the CWP
Ákos Frohner EGEE'08 September 2008
DCache things Paul Millar … on behalf of the dCache team.
CTA: CERN Tape Archive Overview and architecture
GGF15 – Grids and Network Virtualization
Cloud Computing Dr. Sharad Saxena.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
IPv6 update Duncan Rand Imperial College London
CS 295: Modern Systems Organizing Storage Devices
The LHCb Computing Data Challenge DC06
Presentation transcript:

Russian academic institutes participation in WLCG Data Lake project Andrey Kiryanov, Xavier Espinal, Alexei Klimentov, Andrey Zarochentsev

LHC Run 3 and HL-LHC Run 4 Computing Challenges Raw data volume We are here HL-LHC storage needs are a factor 10 above the expected technology evolution and flat funding. We need to optimize storage hardware usage and operational costs.

Motivation The HL-LHC will be a multi-Exabyte challenge where the anticipated storage and compute needs are a factor of ten above the projected technology evolution and flat funding. The WLCG community needs to evolve current models to store and manage data more efficiently. Technologies that will address the HL-LHC computing challenges may be applicable for other communities to manage large-scale data volumes (SKA, DUNE, CTA, LSST, BELLE-II, JUNO, etc). Co-operation is in progress.

Storage software emerged from HENP scientific community DPM – WLCG storage solution for small sites (T2s). Initially supported GridFTP+SRM, but now undergoing a reincarnation phase as DOME with HTTP/WebDAV/xrootd support as well. No tapes. dCache – a versatile storage system from DESY for both disks and tapes. Used by many T1s. xrootd – both a protocol and a storage system optimized for physics’ data access. Can be vastly extended by plug-ins. Used as a basis for ATLAS FAX and CMS AAA federations. EOS – based on xrootd, adds smart namespace and lots of extra features like automatic redundancy and geo-awareness. DynaFed – designed as a dynamic federation layer on top of HTTP/WebDAV-based storages. CASTOR – CERN’s tape storage solution, to be replaced by CTA. On top of that various data management solutions exist: FTS, Rucio, etc.

What is Data Lake Not another software or storage solution. It is a way of organizing a group of Data and Computing centers so that it can perform an effective data processing. A scientific community defines a “shape” of their Data Lake, which may be different for different communities. We see the Data Lake model as an evolution of the current infrastructure bringing reduction of the storage costs.

Data Lake

We’re not alone There are several storage-related R&D projects conducted in parallel: Data Carousel Data Lake Data Ocean (Google) Data Streaming All of them are in progress as a part of DOMA or/and IRIS- HEP global R&D for HL-LHC It is important to develop a coherent solution to address HL-LHC data challenges and to coordinate above and future projects

Requirements for a future WLCG data storage infrastructure Common namespace and interoperability Coexistence of different QoS Geo-awareness File transitioning based on namespace rules File layout flexibility Distributed redundancy Fast access to data, latency (>20 ms) compensation File R/W cache Namespace cache

WLCG Data Lake Prototype — EUlake Currently based on EOS Implies xrootd as a primary transfer protocol Other storage technologies and their possible interoperability are also considered Primary namespace server (MGM) is at CERN Deployment of a secondary namespace server at NRC “KI” is planned Due to EOS transition from in-memory namespace to QuarkDB multi- MGM deployment was unsupported for a while Storage endpoints run a simple EOS filesystem (FST) daemon Deployed at CERN, SARA, NIKHEF, RAL, JINR, PNPI, PIC and UoM perfSONAR endpoints are deployed at participating sites Performance tests (HC) are running continuouslys

Russian Federated Storage Project Started in 2015 EOS+dCache RUSSIA SPb Region SPbSU PNPI Moscow Region JINR NRC “KI” MEPhI SINP ITEP External Sites CERN

Russia in EUlake (1) Why? Extensive expertise in deployment and testing of distributed storages Russian institutes, including the ones that comprise NRC “KI”, are geographically distributed Network interconnection between Russian sites is constantly improving A similar prototype was successfully deployed on Russian sites (Russian Federated Storage Project) An appealing universal infrastructure may be useful not only for HL-LHC and HEP, but also for other experiments and fields of science relevant to us (NICA, PIK, XFEL) NRC “KI” equipment for EUlake is located at PNPI in Gatchina 10 Gbps connection, IPv6 ~100 TB of block storage Storage and Compute endpoints on VMs JINR equipment for EUlake is located in Dubna 10 Gbps connection ~16 TB of block storage Storage endpoints on VMs

Russia in EUlake (2) Manpower (NRC “KI” + JINR + SPbSU) Infrastructure deployment FSTs Hierarchical GeoTags Placement-related attributes Synthetic tests File I/O performance Metadata performance Real-life tests HammerCloud Monitoring

NRC “KI” + JINR international network infrastructure PNPI JINR

100 Gbps routes

Reliable back-end with redundancy NRC “KI” in EUlake Metadata request Primary Head Node Clients Disk Nodes CERN Redirection Data transfer Other Participants: JINR, PIC, NIKHEF, RAL, SARA, UoM Replication & Fall-back PNPI Disk Nodes Secondary Head Node VM hosts Reliable back-end with redundancy 10 Gbps Ceph Nodes x20, 128TB each 10 Gbps switch 160 Gbps Stack 2x10 Gbps trunk on each node Replication via a dedicated fabric

Highlights Why Ceph? Storage configuration Auxiliary infrastructure Deploying EOS on physical storage is perfectly suitable for CERN, but PNPI Data Centre is not a dedicated facility for HEP computing Ceph adds necessary flexibility in block storage management (we also use it for other purposes like VM images) Storage configuration We have started with Luminous but quickly moved to Mimic CephFS performance improved significantly in the new release We have four different “types” of Ceph storage exposed to EOS: CephFS with replicated data pool CephFS with Erasure Coded data pool Block device from a replicated pool Block device from an Erasure Coded pool Functional and performance tests are ongoing Auxiliary infrastructure Repository with stable EOS releases (CERN repo changes too fast, sometimes breaking the functionality) Web server with a visualization framework and a test results storage Compute nodes for HC tests

Ceph performance measurements Metadata performance of CephFS is much slower than of a dedicated RBD (this is expected) Block I/O performance is on par, but CPU usage is lower with CephFS

Ultimate goals Evaluate the fusion of local (Ceph) and global (EOS, dCache) storage technologies Figure out the strong and weak points Come out with a high-performance, flexible yet easily manageable storage solution for major scientific centers participating in multiple collaborations Further plans on testing converged solutions (Compute + Storage) Evaluate Data Lake as a storage platform for Russian scientific centers and major experiments NICA, XFEL, PIK Possibility to have dedicated storage resources with configurable redundancy in a global system Geolocation awareness and dynamic storage optimization Data relocation & replication with a proper use of fast networks Federated system with interoperable storage endpoints based on different solutions with a common transfer protocol

Synthetic file locality tests on EUlake The following combinations for layouts and placement policies were put in place and tested from a single client with geotag RU::PNPI: Layouts: Plain, Replica (2 stripes), RAIN (4+2 stripes) Placement policies: Gathered, Hybrid, Simple (based on client geotag) Expected results Availability of geo-local replicas should improve file read (stage-in) speed An ability to tie directories to local storages (FSTs) should improve write (stage-out) speed

100 MB file I/O performance tests with different layouts and placement policies (1) sys.forced.placementpolicy="gathered:RU": sys.forced.layout=plain no sys.forced.placementpolicy (based on client geotag) sys.forced.layout=plain Write Read Write Read Replica counts CERN::0513 1 CERN::HU 3 ES::PIC 2 RU::Dubna 46 RU::PNPI 48 Replica counts   CERN::HU 6 ES::PIC 1 RU::PNPI 93 sys.forced.placementpolicy="gathered:RU": sys.forced.layout=raid6 no sys.forced.placementpolicy (based on client geotag) sys.forced.layout=raid6 Read Write Read Replica counts RU::Dubna 400 RU::PNPI 200 Replica counts   CERN::0513 71 CERN::HU 129 ES::PIC 100 RU::Dubna 100 RU::PNPI 200

100 MB file I/O performance tests with different layouts and placement policies (2) no sys.forced.placementpolicy (based on client geotag) sys.forced.layout=replica sys.forced.placementpolicy="gathered:RU": sys.forced.layout=replica Write Read Write Read Replica counts   RU::Dubna 100 RU::PNPI 100 Replica counts   CERN::0513 13 CERN::HU 21 ES::PIC 35 RU::Dubna 19 RU::PNPI 112 sys.forced.placementpolicy="hybrid:RU": sys.forced.layout=replica Observed replica scattering (rebalancing) in a couple of days. Read is always redirected to the closest server. RAIN impacts I/O performance the most. Write Read Replica counts   CERN::HU 14 CERN::9918 20 ES::PIC 31 RU::Dubna 65 RU::PNPI 7

ES RU in EOS ES:SiteA ES:SiteB RU:SiteA RU:SiteB RU:SiteC group.X replica 2 + CTA group.Y plain + CTA group.Z CTA group.W replica 2 group.U raid6 RU:SiteB RU:SiteC

Summary EUlake is currently operational as a proof-of- сoncept Functional and real-life tests are ongoing Effort is being made on publishing various metrics into a centralized monitoring system Expansion of a network capacity between major scientific centers in Russia enables efficient data management for future experiments

Future plans Extensive testing of different types of QoS (possibly simulated) with different storage groups Exploit different caching schemes Test automatic data migration Evolve the infrastructure from a simple Proof-of- Concept to an infrastructure capable of measuring performance of future possible distributed storage models

Thank you! Acknowledgements This work was supported by the NRC "Kurchatov Institute" (№ 1608) The authors express appreciation to the computing centers of NRC "Kurchatov Institute", JINR and other institutes for provided resources Thank you!