Distributed storage, work status

Distributed storage, work status
Giacinto Donvito INFN-Bari On behalf of the Distributed Storage Working Group SuperB Collaboration Meeting 02 June 2012

Outline Testing storage solution: Napoli activities
Hadoop testing Fault tolerance solutions Patch for a distributed computing infrastructure Napoli activities Data access library Why we need it? How it should work? Work status Future Plans People involved Conclusions Only update are reported in this talk: there are others activities planned or ongoing that do not have update to be reported since last meeting SuperB Collaboration Meeting 02 June 2012

Testing Storage Solutions
The goal is to test available storage solutions that: Fulfill SuperB requirements good performance also on pseudo random access Accomplish Computing centers need scalability in the order of PB and hundreds of client nodes a small TCO can benefit of a good support on a long term future Open Source could be preferred over proprietary solutions Wide community would be a guarantee for long term sustainability Provide posix-like access APIs by means of a ROOT supported protocols (most interesting seems to be: XRootD and HTTP) Also the copy functionality should be supported SuperB Collaboration Meeting 02 June 2012

Testing Storage Solutions
Several software solution could fulfill all those requirements few of those are already known and used within HEP community we would test also solutions that are not already well known focusing our attention on those that could provide fault-tolerance to hw or sw malfunction At the moment we are focusing on two solutions: HADOOP (at INFN-Bari) GlusterFS (at INFN-Napoli) SuperB Collaboration Meeting 02 June 2012

Hadoop testing Use cases: Test executed:
Scientific data analysis on cluster with CPU and storage on the same boxes High Availability of data access without expensive hardware infrastructure High Availability of data access service also in the event of a complete site failure Test executed: Metadata failures Data failures “Rack” awareness SuperB Collaboration Meeting 02 June 2012

HDFS Architecture SuperB Collaboration Meeting 02 June 2012

HDFS test Metadata Failure DataNode Failure
NameServer is a single point of failure It is possible to have a Back-up snapshot DataNode Failure During read/write operation Testing FUSE access to HDFS file-system Testing HTTP access to HDFS file-system Rack failure resilience Farm failure resilience SuperB Collaboration Meeting 02 June 2012

HDFS test NameServer Failures
Metadata Failure It is possible to make a back-up of the metadata the time between two dump is configurable It is easy and fast to recover from PrimaryNamenode major failure HTTP REQUEST SuperB Collaboration Meeting 02 June 2012

HDFS test NameServer Failures
Metadata Failure While the NameNode process is active no problem is observed at the client side If the NameNode process is unavailable all clients retry a configurable number of time before failing The datanodes will reconnect to the NameNode as soon as it will be available It is easy to import metadata from BackupNameNode and only the metadata changed after last checkpoint will be lost It looks fairly easy to put in production an automatic fail-over cluster to provide HA solution SuperB Collaboration Meeting 02 June 2012

HDFS test DataNode Failures
DataNode Failure while reading: In case of major disk/node failure In case of data corruption on a few data chunk Client do not see any failure And the system recovers and fixes the problem without human intervention DataNode Failure while writing: The client retries a configurable amount of times before failing each time the destination node is different it should be enough to have few DataNodes active in the cluster to avoid client failures SuperB Collaboration Meeting 02 June 2012

HDFS Rack Awareness host1 host2 host3 host4 host5 host6 host7 host8
SuperB Collaboration Meeting 02 June 2012

HDFS test Rack Awareness
While writing: Data placement could (or could not) take into account the load of each machine (depending on the configuration) the test proves both of them works fine While reading: Data are read within the rack always if the load is not too high No configuration is possible here client sorgente #blocchi pccms45 (rack1) hdfs1 (rack3) 308/600 alicegrid11 (rack2) 292/600 pccms41 (rack2) 0/600 600/600 hdfs3 (rack3) 293/600 307/600 SuperB Collaboration Meeting 02 June 2012

HDFS test Miscellaneous
Fuse testing: It works fine for reading We exploit an OSG version in order to avoid some bugs while using fuse to write files HTTP testing: HDFS provide a native WebDav interface Works quite well for reading Both reading and writing works well with standard Apache module and fuse HDFS mounted via FUSE SuperB Collaboration Meeting 02 June 2012

HDFS test Custom data placement policy
In HDFS it is possible to change/configure the data placement policy We already implemented a new policy where three copies of the data is place in three different racks This provides the capability to survive also to two different racks down This policy increases the “cost“ of writing files In HEP environment the data are “write-once-read-many” but reduces the cost of reading files there are more CPU that are “close to the data” SuperB Collaboration Meeting 02 June 2012

HDFS give the possibility to represent hierarchical topology of racks But this is not taken into account while writing/reading files each rack is considered only “a different rack” so both when reading and writing file the rack are considered as belonging to a flat topology If a node belonging to a given rack is moved in a different rack the system detect a violation of the policy It is not able to deal with this problem automatically The administrator need to format and re-insert the node SuperB Collaboration Meeting 02 June 2012

Goal: achieve a data placement policy that will ensure the data availability also in the case of a complete farm go offline Replica 3  2 replicas on the same farm (different racks) 1 replica on another farm We have developed a new policy that is able to understand rack topology of an arbitrary number of level This is under test at the moment… and it is working!  we will ask Napoli to join the WAN test soon SuperB Collaboration Meeting 02 June 2012

HDFS test Next steps Increase the amount of resources involved into the test: We will soon build a big cluster at INFN-Bari, using production WN + PON dedicated machines The final test will comprise ~300 local nodes 5000 CPU/Cores >600TB of storage This cluster will be used to test: performance horizontal scalability and overall TCO over a long period of usage (6 months) We will soon start WAN data replication/access test using at least INFN-Napoli cluster Testing new version of HDFS: HDFS HA for NameNode (manual failover) HDFS Federation Performance SuperB Collaboration Meeting 02 June 2012

Napoli Activities We completed the preliminary tests on 10Gbit/s network and local I/O on a Clustered File System between 12 nodes through the use of GlusterFS. On-going works: Implementation of a Grid Farm with a GlusterFS shared Storage Area and SRM Interface through DPM or SToRM. Design a scenarios to test both performances and reliability of GlusterFS over WAN. A B GlusterFS WN1 WN2 WN3 WN .. WN12 GlusterFS CE Storm/DPM FrontEnd FS on Grid Through SRM interface and http access Internet GlusterFS DATA REPLICATION OVER WAN SuperB Collaboration Meeting 02 June 2012

Data Access Library Why we need it?
It could be painful to access data within analysis application if no posix access is provided by computing centers there are number of library providing posix-like access in HEP community We need an abstraction layer to be able to use transparently Logical SuperB File Name, without knowledge about the mapping with the Physical File Name at a given site We need a single place/layer where we could implement reading optimization that would be transparently used by end users We need an easy way for reading files using protocols that are not already supported by ROOT SuperB Collaboration Meeting 02 June 2012

Data Access Library How we can use it?
This work aim to provide a simple set of APIs: Superb_open, Superb_seek, Superb_read, Superb_close, … it will be needed to add a dedicated header file and the related library building the code with this library it will be possible to open seamless local or remote files exploiting several protocols: http(s), xrootd, etc Different level of caching will be automatically provided from the library in order to improve the performance of the analysis code Each site will be able to configure the library as it is needed to fulfill computing centers requirements SuperB Collaboration Meeting 02 June 2012

Data Access Library Work status
At the moment we have a test implementation with http-based access Exploiting curl libraries It is possible to write an application linking this library and to open and read a list of files With a list of read operations It is possible to configure the size of the read buffer To optimize the performance in reading files over remote network The results of the first test will be available in few weeks from now… We already presented WAN data access test at CHEP using curl: SuperB Collaboration Meeting 02 June 2012

Data Access Library Future works
We are already in contact with Philippe Canal for implement new features of ROOT I/O We will follow the development of the Distributed Computing Working Group to provide an seamless mapping of the Logical File Name to Local Physical File Name We will soon implement few example of a configuration for local storage system We need to now which ROOT version is more interesting for SuperB analysis users. We will start implementing memory based pre-fetch and caching mechanism within Superb Library SuperB Collaboration Meeting 02 June 2012

People involved Giacinto Donvito – INFN-Bari:
NFSv4.1 testing Data Model HADOOP testing Elisa Manoni – INFN-Perugia: http & xrootd remote access Developing application code for testing remote data access Distributed Tier1 testing Paolo Franchini – INFN-CNAF: Silvio Pardi, Domenico del Prete, Guido Russo – INFN Napoli: Cluster set-up Claudio Grandi – INFN-Bologna: GlusterFS testing Stefano Bagnasco – INFN-Torino: SRM testing Gianni Marzulli – INFN-Bari: Domenico Diacono – INFN-Bari Developing data access library Armando Fella – INFN-Pisa: http remote access

Conclusions Collaborations and synergies with other (mainly LHC) experiments are increasing and improving First results from testing storage solution are quite promising We will need soon to increase the effort and the amount of (hw and human) resources involved Both locally and geographically distributed This will be of great importance in realizing a distributed Tier0 center among Italian computing centers We need to increase the development effort in order to provide a code-base for an efficient data access This will be one of the key elements in providing good performance to analysis applications SuperB Collaboration Meeting 02 June 2012

Distributed storage, work status

Similar presentations

Presentation on theme: "Distributed storage, work status"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed storage, work status

Similar presentations

Presentation on theme: "Distributed storage, work status"— Presentation transcript:

Similar presentations

About project

Feedback