Distributed storage, work status

Slides:



Advertisements
Similar presentations
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
Advertisements

Distributed Xrootd Derek Weitzel & Brian Bockelman.
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Distributed Tier1 scenarios G. Donvito INFN-BARI.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
OSIsoft High Availability PI Replication
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
HADOOP Dr. Silvio Pardi INFN-Naples.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
An Introduction to GPFS
BIG DATA/ Hadoop Interview Questions.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.
PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Jean-Philippe Baud, IT-GD, CERN November 2007
CernVM-FS vs Dataset Sharing
Dynamic Extension of the INFN Tier-1 on external resources
Workload Management Workpackage
SuperB – INFN-Bari Giacinto DONVITO.
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Introduction to Distributed Platforms
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
CSS534: Parallel Programming in Grid and Cloud
Silvio Pardi R&D Storage Silvio Pardi
Experiences with http/WebDAV protocols for data access in high throughput computing
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
StoRM Architecture and Daemons
Introduction to Data Management in EGI
CHAPTER 3 Architectures for Distributed Systems
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
Software Engineering Introduction to Apache Hadoop Map Reduce
A Survey on Distributed File Systems
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
The Top 10 Reasons Why Federated Can’t Succeed
GARRETT SINGLETARY.
Hadoop Technopoints.
THE GOOGLE FILE SYSTEM.
INFNGRID Workshop – Bari, Italy, October 2004
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Distributed storage, work status Giacinto Donvito INFN-Bari On behalf of the Distributed Storage Working Group SuperB Collaboration Meeting 02 June 2012

Outline Testing storage solution: Napoli activities Hadoop testing Fault tolerance solutions Patch for a distributed computing infrastructure Napoli activities Data access library Why we need it? How it should work? Work status Future Plans People involved Conclusions Only update are reported in this talk: there are others activities planned or ongoing that do not have update to be reported since last meeting SuperB Collaboration Meeting 02 June 2012

Testing Storage Solutions The goal is to test available storage solutions that: Fulfill SuperB requirements good performance also on pseudo random access Accomplish Computing centers need scalability in the order of PB and hundreds of client nodes a small TCO can benefit of a good support on a long term future Open Source could be preferred over proprietary solutions Wide community would be a guarantee for long term sustainability Provide posix-like access APIs by means of a ROOT supported protocols (most interesting seems to be: XRootD and HTTP) Also the copy functionality should be supported SuperB Collaboration Meeting 02 June 2012

Testing Storage Solutions Several software solution could fulfill all those requirements few of those are already known and used within HEP community we would test also solutions that are not already well known focusing our attention on those that could provide fault-tolerance to hw or sw malfunction At the moment we are focusing on two solutions: HADOOP (at INFN-Bari) GlusterFS (at INFN-Napoli) SuperB Collaboration Meeting 02 June 2012

Hadoop testing Use cases: Test executed: Scientific data analysis on cluster with CPU and storage on the same boxes High Availability of data access without expensive hardware infrastructure High Availability of data access service also in the event of a complete site failure Test executed: Metadata failures Data failures “Rack” awareness SuperB Collaboration Meeting 02 June 2012

HDFS Architecture SuperB Collaboration Meeting 02 June 2012

HDFS test Metadata Failure DataNode Failure NameServer is a single point of failure It is possible to have a Back-up snapshot DataNode Failure During read/write operation Testing FUSE access to HDFS file-system Testing HTTP access to HDFS file-system Rack failure resilience Farm failure resilience SuperB Collaboration Meeting 02 June 2012

HDFS test NameServer Failures Metadata Failure It is possible to make a back-up of the metadata the time between two dump is configurable It is easy and fast to recover from PrimaryNamenode major failure HTTP REQUEST SuperB Collaboration Meeting 02 June 2012

HDFS test NameServer Failures Metadata Failure While the NameNode process is active no problem is observed at the client side If the NameNode process is unavailable all clients retry a configurable number of time before failing The datanodes will reconnect to the NameNode as soon as it will be available It is easy to import metadata from BackupNameNode and only the metadata changed after last checkpoint will be lost It looks fairly easy to put in production an automatic fail-over cluster to provide HA solution SuperB Collaboration Meeting 02 June 2012

HDFS test DataNode Failures DataNode Failure while reading: In case of major disk/node failure In case of data corruption on a few data chunk Client do not see any failure And the system recovers and fixes the problem without human intervention DataNode Failure while writing: The client retries a configurable amount of times before failing each time the destination node is different it should be enough to have few DataNodes active in the cluster to avoid client failures SuperB Collaboration Meeting 02 June 2012

HDFS Rack Awareness host1 host2 host3 host4 host5 host6 host7 host8 SuperB Collaboration Meeting 02 June 2012

HDFS test Rack Awareness While writing: Data placement could (or could not) take into account the load of each machine (depending on the configuration) the test proves both of them works fine While reading: Data are read within the rack always if the load is not too high No configuration is possible here client sorgente #blocchi pccms45 (rack1) hdfs1 (rack3) 308/600 alicegrid11 (rack2) 292/600 pccms41 (rack2) 0/600 600/600 hdfs3 (rack3) 293/600 307/600 SuperB Collaboration Meeting 02 June 2012

HDFS test Miscellaneous Fuse testing: It works fine for reading We exploit an OSG version in order to avoid some bugs while using fuse to write files HTTP testing: HDFS provide a native WebDav interface Works quite well for reading Both reading and writing works well with standard Apache module and fuse HDFS mounted via FUSE SuperB Collaboration Meeting 02 June 2012

HDFS test Custom data placement policy In HDFS it is possible to change/configure the data placement policy We already implemented a new policy where three copies of the data is place in three different racks This provides the capability to survive also to two different racks down This policy increases the “cost“ of writing files In HEP environment the data are “write-once-read-many” but reduces the cost of reading files there are more CPU that are “close to the data” SuperB Collaboration Meeting 02 June 2012

HDFS test Custom data placement policy HDFS give the possibility to represent hierarchical topology of racks But this is not taken into account while writing/reading files each rack is considered only “a different rack” so both when reading and writing file the rack are considered as belonging to a flat topology If a node belonging to a given rack is moved in a different rack the system detect a violation of the policy It is not able to deal with this problem automatically The administrator need to format and re-insert the node SuperB Collaboration Meeting 02 June 2012

HDFS test Custom data placement policy Goal: achieve a data placement policy that will ensure the data availability also in the case of a complete farm go offline Replica 3  2 replicas on the same farm (different racks) 1 replica on another farm We have developed a new policy that is able to understand rack topology of an arbitrary number of level This is under test at the moment… and it is working!  we will ask Napoli to join the WAN test soon SuperB Collaboration Meeting 02 June 2012

HDFS test Next steps Increase the amount of resources involved into the test: We will soon build a big cluster at INFN-Bari, using production WN + PON dedicated machines The final test will comprise ~300 local nodes 5000 CPU/Cores >600TB of storage This cluster will be used to test: performance horizontal scalability and overall TCO over a long period of usage (6 months) We will soon start WAN data replication/access test using at least INFN-Napoli cluster Testing new version of HDFS: HDFS HA for NameNode (manual failover) HDFS Federation Performance SuperB Collaboration Meeting 02 June 2012

Napoli Activities We completed the preliminary tests on 10Gbit/s network and local I/O on a Clustered File System between 12 nodes through the use of GlusterFS. On-going works: Implementation of a Grid Farm with a GlusterFS shared Storage Area and SRM Interface through DPM or SToRM. Design a scenarios to test both performances and reliability of GlusterFS over WAN. A B GlusterFS WN1 WN2 WN3 WN .. WN12 GlusterFS CE Storm/DPM FrontEnd FS on Grid Through SRM interface and http access Internet GlusterFS DATA REPLICATION OVER WAN SuperB Collaboration Meeting 02 June 2012

Data Access Library Why we need it? It could be painful to access data within analysis application if no posix access is provided by computing centers there are number of library providing posix-like access in HEP community We need an abstraction layer to be able to use transparently Logical SuperB File Name, without knowledge about the mapping with the Physical File Name at a given site We need a single place/layer where we could implement reading optimization that would be transparently used by end users We need an easy way for reading files using protocols that are not already supported by ROOT SuperB Collaboration Meeting 02 June 2012

Data Access Library How we can use it? This work aim to provide a simple set of APIs: Superb_open, Superb_seek, Superb_read, Superb_close, … it will be needed to add a dedicated header file and the related library building the code with this library it will be possible to open seamless local or remote files exploiting several protocols: http(s), xrootd, etc Different level of caching will be automatically provided from the library in order to improve the performance of the analysis code Each site will be able to configure the library as it is needed to fulfill computing centers requirements SuperB Collaboration Meeting 02 June 2012

Data Access Library Work status At the moment we have a test implementation with http-based access Exploiting curl libraries It is possible to write an application linking this library and to open and read a list of files With a list of read operations It is possible to configure the size of the read buffer To optimize the performance in reading files over remote network The results of the first test will be available in few weeks from now… We already presented WAN data access test at CHEP using curl: https://indico.cern.ch/getFile.py/access?contribId=294&sessionId=4&resId=0&materialId=slides&confId=149557 SuperB Collaboration Meeting 02 June 2012

Data Access Library Future works We are already in contact with Philippe Canal for implement new features of ROOT I/O We will follow the development of the Distributed Computing Working Group to provide an seamless mapping of the Logical File Name to Local Physical File Name We will soon implement few example of a configuration for local storage system We need to now which ROOT version is more interesting for SuperB analysis users. We will start implementing memory based pre-fetch and caching mechanism within Superb Library SuperB Collaboration Meeting 02 June 2012

People involved Giacinto Donvito – INFN-Bari: NFSv4.1 testing Data Model HADOOP testing Elisa Manoni – INFN-Perugia: http & xrootd remote access Developing application code for testing remote data access Distributed Tier1 testing Paolo Franchini – INFN-CNAF: Silvio Pardi, Domenico del Prete, Guido Russo – INFN Napoli: Cluster set-up Claudio Grandi – INFN-Bologna: GlusterFS testing Stefano Bagnasco – INFN-Torino: SRM testing Gianni Marzulli – INFN-Bari: Domenico Diacono – INFN-Bari Developing data access library Armando Fella – INFN-Pisa: http remote access

Conclusions Collaborations and synergies with other (mainly LHC) experiments are increasing and improving First results from testing storage solution are quite promising We will need soon to increase the effort and the amount of (hw and human) resources involved Both locally and geographically distributed This will be of great importance in realizing a distributed Tier0 center among Italian computing centers We need to increase the development effort in order to provide a code-base for an efficient data access This will be one of the key elements in providing good performance to analysis applications SuperB Collaboration Meeting 02 June 2012