Presentation is loading. Please wait.

Presentation is loading. Please wait.

SRB system at Belle/KEK Yoshimi Iida CHEP 04, Interlaken 29 September 2004.

Similar presentations


Presentation on theme: "SRB system at Belle/KEK Yoshimi Iida CHEP 04, Interlaken 29 September 2004."— Presentation transcript:

1 SRB system at Belle/KEK Yoshimi Iida CHEP 04, Interlaken 29 September 2004

2 CHEP 04Yoshimi Iida, KEK2 Outline The Belle experiment at KEK What is SRB? SRB activities at KEK Transfer rate measurements with SRB SRB test beds at Belle SRB for Belle data processing Summary

3 CHEP 04Yoshimi Iida, KEK3 The Belle experiment As presented at the plenary session, Belle is an experiment at the KEK B-factory. Its goal is to study the origin of CP violation Belle now accumulates more than 1TB of raw data from the detector everyday The raw data, processed data accumulated so far exceed peta bytes This corresponds to 40GB/day of compressed hadronic data for final physics analyses Monte Carlo simulation data are generated at the rate of ~200GB/day The number of files is more than 10 million so far and grows every day (at the rate of 10,000/day)

4 CHEP 04Yoshimi Iida, KEK4 IHEP, Moscow IHEP, Vienna ITEP Kanagawa U. KEK Korea U. Krakow Inst. of Nucl. Phys. Kyoto U. Kyungpook Nat’l U. U. of Lausanne Jozef Stefan Inst. Aomori U. BINP Chiba U. Chonnam Nat’l U. Chuo U. U. of Cincinnati Ewha Womans U. Frankfurt U. Gyeongsang Nat’l U. U. of Hawaii Hiroshima Tech. IHEP, Beijing U. of Melbourne Nagoya U. Nara Women’s U. National Central U. Nat’l Kaoshiung Normal U. Nat’l Lien-Ho Inst. of Tech. Nat’l Taiwan U. Nihon Dental College Niigata U. Osaka U. Osaka City U. Panjab U. Peking U. Princeton U. Riken-BNL Saga U. USTC Seoul National U. Shinshu U. Sungkyunkwan U. U. of Sydney Tata Institute Toho U. Tohoku U. Tohuku Gakuin U. U. of Tokyo Tokyo Inst. of Tech. Tokyo Metropolitan U. Tokyo U. of A and T. Toyama Nat’l College U. of Tsukuba Utkal U. VPI Yonsei U. The Belle Collaboration 13 countries,  institutes, ~400 members

5 CHEP 04Yoshimi Iida, KEK5 Dell 36PCs (Pentium-III ~0.5GHz) Appro 113PCs (Athlon 1.67GHz×2) 320GHz 168GHz 470GHz NEC 84PCs (Xeon 2.8GHz×2) 768GHz 450GHz PC farm of several generations heterogeneous system from various vendors cost effectiveness 3 types of CPU (PenIII/Xeon/Athlon) Fujitsu 120PCs (Xeon 3.2GHz×2) Compaq 60PCs (Pentium-III 0.7GHz) Fujitsu 127PCs (Pentium-III 1.26GHz)

6 CHEP 04Yoshimi Iida, KEK6 Belle data must be distributed Belle has more than 200TB of real and Monte Carlo simulation data for final physics analyses As shown, the Belle collaboration consists of more than 57 institutes in 13 countries Collaborators want to share the data and analyze them at their own institutes About a half of Monte Carlo data are generated at outside institutions (not at KEK) Belle wants to simplify the management of data and files among collaborators The remote institutions want to exchange the data at their own pace and control their own resources

7 CHEP 04Yoshimi Iida, KEK7 What is SRB? “ The SDSC Storage Resource Broker (SRB) is client- server middleware that provides a uniform interface for connecting to heterogeneous data resources over a network and accessing unique or replicated data objects ” “ SRB, in conjunction with the Metadata Catalog (MCAT), provides a way to access data sets and resources based on their logical names or attributes rather than their names and physical locations ” http://www.npaci.edu/DICE/SRB/index.html

8 CHEP 04Yoshimi Iida, KEK8 SRB for distributed collaborators SRB provides access to data storage across local and wide-area networks With federated MCAT (or zoneSRB), each institution can share physical resources and logical collections, yet maintain more local control over those resources, data objects, and collections. SRB supports parallel I/O for larger size files, and “ Containers ” and/or “ Bulk load ” for smaller size files SRB supports the Globus Grid Security Infrastructure (GSI) as an optional method of authentication

9 CHEP 04Yoshimi Iida, KEK9 SRB activities at KEK The Computing Research Center (CRC) of KEK started experimenting SRB in collaboration with SLAC Computing Services SLAC had already been using SRB to replicate files between SLAC and IN2P3 Lyon The CRC has built several test beds and measured performance in data transfer Belle group, in particular, Australian Belle collaborators and KEK started working with the CRC and built SRB test beds Belle collaborators in Taiwan and Korea are now trying to join the efforts

10 CHEP 04Yoshimi Iida, KEK10 PostgreSQL DB2 HPSS enabled SRB server (single CPU) MCAT enabled SRB server (dual CPUs) SRB server (dual CPUs) MCAT enabled SRB server (dual CPUs) FC RAID 800GB HPSS 120TB Internet KEK network The CRC SRB test system KEK FW Giga Switch zone A zone B

11 CHEP 04Yoshimi Iida, KEK11 Performance measurement Measure two cases Mixed files 68 files, 928MB in total Max file size 101MB, Min file size 4.7kB Larger file 1GB Compare SRB commands and ftp/pftp Various transfer commands in SRB “ Bulk load ” and “ Container ” for the mixed files Parallel I/O for the larger file ftp for Unix file system, pftp for HPSS

12 CHEP 04Yoshimi Iida, KEK12 SRB transfer commands Sput Imports local files or directories into SRB space Sput -m Sets I/O mode to parallel I/O Sput -c container Imports local file into container A “ Container ” is a way to put together a lot of files into one larger file to improve performance. Sbload “ Bulk load ” It use a single call for registering up to several hundreds files with MCAT It use separate threads for registration and data transfer Sbload -c container “ Bulk load ” into container

13 CHEP 04Yoshimi Iida, KEK13 Machine configuration Direct comparison between zone A (HPSS) and zone B (UNIX file system) cases cannot be done in following measurements Machine configuration is different Single CPU vs. Dual CPUs Pentium 4 vs. Xeon DB2 vs. PostgreSQL in MCAT

14 CHEP 04Yoshimi Iida, KEK14 Transfer mixed files (preliminary) ResourceCommandsAverage rate (MB/sec) Unix file system (zone B) Sput -c14 Sbload12 Sbload -c30 ftp36 HPSS* (zone A) Sput -c14 Sbload3 Sbload -c11 (Pftp14) Among SRB commands “ Sbload - c ” is the fastest The “ Sput -c ” case gives the best result in HPSS case due to the characteristic of HPSS which is designed for storage of larger files * HPSS enabled server has single CPU while others have dual CPUs

15 CHEP 04Yoshimi Iida, KEK15 Transfer larger file (preliminary) ResourceCommandsAverage rate (MB/sec) Unix file system (zone A) Sput23 Sput -m29 ftp34 HPSS* (zone B) Sput7 Sput -m19 (Pftp17) “ Sput -m ” (parallel thread mode) is better than single * HPSS enabled server has single CPU while others have dual CPUs

16 CHEP 04Yoshimi Iida, KEK16 SRB transfer performance Performance could be better MCAT lookup time SRB takes the extra time that is required for the database query. We need the tuning (indices that are build) for MCAT HPSS interface Looks still not mature on Linux (originally on AIX). Further improvements are desired Measurements are done on congested KEK LAN for HPSS case

17 CHEP 04Yoshimi Iida, KEK17 MCAT The Belle SRB system SRB server SCSI-RAID KEK network B-Inet Tape Library NFS HSM-DISC HSM Server KEK-B-System B-Tnet GbE SRB client Router KEK FW MCAT federation SRB server at Melbourne U. MCAT enabled SRB server MCAT enabled SRB server at ANU Internet Belle FW

18 CHEP 04Yoshimi Iida, KEK18 Belle software with SRB Belle uses home grown analysis framework called BASF Belle has extended BASF to dynamically load I/O subsystems as C++ objects It was quite simple to add SRB support as a new I/O class using the following SRB client APIs srbConnect, srbObjOpen, srbObjCreate, srbObjStat, srbObjRead, srbObjWrite and srbObjClose We then tested and compared I/O performance using SRB, Belle ’ s own TCP/IP protocol, and NFS only in KEK

19 CHEP 04Yoshimi Iida, KEK19 BASF test results (preliminary) ProtocolResourceElapsed timeUtilization SRBLocal SCSI-RAID 10:2253.3% SRBRemote HSM(NFS) 13:1341.8% UNIX read Local SCSI-RAID 5:4490.0% Belle TCP/IP Remote HSM(NFS) 6:2486.1% The data used for this test is 6 files in 2.8GB total size Although the elapsed time when using SRB protocol is longer, CPU utilization is almost the same

20 CHEP 04Yoshimi Iida, KEK20 SRB for Belle processing It works well as it claims We have tested the mechanisms on a small scale test bed GSI, zones and federations look promising Each institution can manage its own resources Successful reading and writing remote data within the Belle software SRB is about 40% slower than Belle ’ s own TCP/IP transfer interface More detailed tests are necessary

21 CHEP 04Yoshimi Iida, KEK21 More tests and plans Federation among Australia, Taiwan, Korea and KEK will be established soon Quick (Bulk) registration of many files Scalability test for a single MCAT for zone synchronization multiple access to resources and MCAT several hundreds jobs run at a time accessing files File replica consistency and checks for the broken files in case of disk/network failure

22 CHEP 04Yoshimi Iida, KEK22 Summary SRB is now working in the Belle experiment Zone federation between Australia and KEK has been established SRB has been implemented into BASF, the Belle analysis software framework Preliminary performance measurements have done G. Moloney gives a talk in the different session about Australian experiences (ID:486)

23 CHEP 04Yoshimi Iida, KEK23 Acknowledgement SDSC (San Diego Supercomputer Center) S. Chen, G. Kremenek, A. Rajasekar and R. Moore, SLAC (Stanford Linear Accelerator Center) A. Hasan and W. Kroeger University of Melbourne G. Moloney ANU (Australian National University) S. McMahon and J. Smillie IHEP (Institute of High Energy Physics) Ma Mei Fujitsu S. Honma, H. Kuraishi and T. Nakajima IBM K. Ishikawa and S. Yamamoto KEK (High Energy Accelerator Research Organization) I. Adachi, N. Katayama, S. Kawabata, Ma Mei, A. Manabe, T. Sasaki, S. Y. Suzuki, S. Yashiro and Y. Watase SuperSINET supported by National Institute of Informatics

24 For backups

25 CHEP 04Yoshimi Iida, KEK25 The CRC SRB machine specification HPSS enabledMCAT enabledSRB serverMCAT enabled CPU Pentium4 2.8GHz Xeon 2.8GHz ×2 Xeon 2.8GH ×2 Memory512MB Disc40GB36GB OSRH Linux 7.3RH Linux 7.2 RH Linux AS v3 SRBv3.1.0 Globus Toolkit v2.2.4 V2.4.3 DB×DB2 v8.1× PostgreSQL 7.4.2 SRB Resource HPSS(client library v4.5):2TB ×× FIBERNET RAID :800GB

26 CHEP 04Yoshimi Iida, KEK26 The Belle SRB machine specification MCAT enabledSRB serverSRB clientSRB server CPU 500MHz (SPARC64 GP) x4 Pentium III 1266MHz x2 500MHz (SPARC64 GP) x4 Memory2GB512MB2GB OSSolaris 7RH Linux 7.2Solaris 7RH Linux 8 SRBv3.1 v3.1 (client)V3.1 Globus Toolkit v2.4.3 V2.4.3 DBPostgreSQL --- SRB Resource HSM-DISKSCSI RAID×

27 CHEP 04Yoshimi Iida, KEK27 Bulk load (unload) and Container “ Bulk load ” (unload) “ Bulk load ” (unload) is designed to greatly improve the efficiency of ingesting a large number of small files by 1. registering up to several hundreds files with MCAT with a single call instead of the normal mode of registering one file at a time 2. use of separate threads for registration and data transfer “ Container ” A “ Container ” is a way to put together a lot of files into one large file to improve performance.

28 CHEP 04Yoshimi Iida, KEK28 Federated (multiple) MCAT system SRB zone An SRB Zone (or zone for short) consists of one or more SRB servers along with one MCAT-enabled server Federated MCAT The Federated MCAT implementation allows users to access resources and data across zones

29 CHEP 04Yoshimi Iida, KEK29 Logical file system Single SRB system / - container/ - home/ - styles/ - trash/ - zoneA/ - container/ - home/ - srbUserA.domain/ - srbUserB.domain/ data.txt - styles/ - trash/ Federated SRB system / - container/ - home/ - styles/ - trash/ - zoneA/ - container/ - home/ - styles/ - trash/ - zoneB :

30 CHEP 04Yoshimi Iida, KEK30 Belle plan Continue to experiment among several remote institutions larger scale tests involving thousands of files totaling tens of Tera bytes integration with the hierarchical storage management system Belle uses SONY Peta-site and Peta-serve Disaster recovery scenarios Belle uses cheep IDE based RAID systems Ask users to analyze SRB data files


Download ppt "SRB system at Belle/KEK Yoshimi Iida CHEP 04, Interlaken 29 September 2004."

Similar presentations


Ads by Google