Experience of a low-maintenance distributed data management system W.Takase 1, Y.Matsumoto 1, A.Hasan 2, F.Di Lodovico 3, Y.Watase 1, T.Sasaki 1 1. High Energy Accelerator Research Organization (KEK), Japan 2. University of Liverpool, UK 3. Queen Mary, University of London, UK 1
Contents KEK iRODS system – Running in production over 2 years – Rules enable to store file efficiently – Federation with QMUL iRODS applications – SCALA : Visualization tool for SCALA – iRODS XOR-based backup Summary 2
iRODS overview 3 Distributed data management system Client-server architecture Allows data management policies to be enforced on the server-side Provides interface to many different types of storage Client can access to iRODS via – i-commands : Commands-line utilities – iRODS Browser : Web interface
KEK iRODS Systems 4 iRODS servers – RHEL 5.6 – iRODS 2.5 ⇒ 3.2 – PostgreSQL – 2 years〜 4 iRODS Zone – KEK-T2K – KEK-MLF – KEKZone – demoKEKZone HPSS (High Performance Storage System) Disk System Storage resource
Data Management for T2K Tokai to Kamioka (T2K) Neutrino experimental group The experimental data is stored to KEK storage The group needed to provide an easy way to quickly access data collected to evaluate the quality of the data from outside of KEK iRODS provided the solution 5 content/uploads/t2kmap.gif
Data Management for T2K KEK-T2K Zone for the experimental group started operation from October 2010 Detected data are processed then transferred to KEK iRODS People in the group became to able to access the stored data easily and quickly – i-commands – iRODS Browser 6
iRODS Rules for KEK-T2K Zone Bundle and replicate the data 7 Client T2K data server T2K data server disk DB Disk system HPSS iRODS server iRODS server rodsweb file tar file Each experimental data file is small ( 〜 several MB) HPSS prefers large file
iRODS Rules for KEK-T2K Zone Response to request 8 disk DB Disk system HPSS Client iRODS server iRODS server rodsweb tar file file request T2K data server T2K data server
Federation with QMUL 9 Data replication among 2 sites Share each site data KEK-T2K Experimental data QMULZone Analytical data Federation
Amount of data in KEK-T2K 10 T2K group start the data taking on 22 nd Dec, 2011
SCALA : Visualization tool for iRODS 11 Statistical Charts And Log Analyzer iRODS lacked an interface for usage statistics and also for debugging problems We developed a web interface for visualizing iRODS status overview – Statistical Charts page – Log Analyzer page SCALA has been installed to KEK iRODS
SCALA Overview 12 iRODS Resource usage Log files Parse Summa rize Display SCALA Input : iRODS outputs Output : Visualized system daily status as charts Parsed table Summarized table Database
Statistical Charts Visualizes iRODS daily operational data 13
Log Analyzer User clicks an bar 3. User clicks an error message 4. Related log displayed 2. Error detail displayed Provides error debugging tool
Download SCALA 15
iRODS XOR-based backup Full file replication – Current method for reliable storage of data is replicate data – If disk fails or server fails still have a copy – Requires much storage space – Portion of the file becomes corrupt you have to replace the full file XOR-based backup Reduces the space with same robustness Splits file into some blocks and creates parity blocks If a block becomes corrupt you have to recreate only corrupted block 16
XOR-based backup: 100% recovery with any 2 servers fail 17 Full-File Replication uses 3 servers and needs 300GB XOR-based backup uses 4 servers but only needs 200GB iRODS rule enables automatic processing Server 1 Server 2 Server 3 Server 4 ABCD E = B + C F = C + D G = A + D H = A + B
XOR-based backup: Decoding flow 18 Server1Server2Server3Server4 ABCD E = B + CF = C + DG = A + DH = A + B
Summary KEK iRODS system has been running in production over 2 years iRODS gives a way to quickly and easily access data outside of KEK Rule of bundle and replicate the data leads to store files efficiently Federation with QMUL enables to share each data and backup SCALA is a visualizing tool and has been installed KEK iRODS – It leads to better management of the iRODS overall service XOR-based backup provides data reliability and less storage cost compared with replication – iRODS rule enables automatic processing 19
Thank you for your attention! Wataru 20