A data Grid test-bed environment in Gigabit WAN with HPSS in Japan A.Manabe, K.Ishikawa+, Y.Itoh+, S.Kawabata, T.Mashimo*, H.Matsumoto*, Y.Morita, H.Sakamoto*, T.Sasaki, H.Sato, J.Tanaka*, I.Ueda*, Y.Watase S.Yamamoto+, S.Yashiro KEK *ICEPP +IBM Japan
Outline ICEPP Computer facility and its roll in ATLAS Japan. ICEPP-KEK Nordu-Grid Testbed. GridFTP and GSI enabled pftp (GSI-pftp) Data transfer performance. 2003/3/27 CHEP03
ICEPP(International Center for Elementary Particle Physics) ICEPP located in Univ.of Tokyo Central computer facility of ATLAS Japan. ICEPP started to install PC farm from 2002 and joined ATLAS DC1 in the last summer. ICEPP PC farm Computing Nodes 39 nodes PenIII 1.4G x 2cpu 108 nodes Xeon 2.8G x 2cpu will be installed in this month. 2.5TB NFS server 2.2GHz Xeon LTO Library 2003/3/27 CHEP03
ICEPP-KEK Grid Testbed Object To configure R&D test bed for ATLAS Japan regional center hierarchy. ICEPP (Tokyo) ATLAS Tier 1 regional center KEK (Tsukuba) Supposed to be an ATLAS Tier2 local center. Short term requirement Archiving ~3 TB data produced at ATLAS DC1 to mass storage and open it to ATLAS members in KEK. 2003/3/27 CHEP03
ICEPP-KEK Grid Test Bed Hardware Hardwares ICEPP Computing Elements 4 nodes with 8cpus. KEK 50 nodes with 100 cpus. HPSS servers 2 disk movers. HPSS system (shared with general users in KEK.) 2003/3/27 CHEP03
Test bed – KEK side Fujitsu TS225 50 nodes 120TB HPSS storage PenIII 1GHz x 2CPU 120TB HPSS storage Using for basic storage service to general users in KEK 5 Tape movers with 20 3590 drives (share use) 2 disk movers with SSA Raid disks (dedicated use for this test bed) 2003/3/27 CHEP03
ICEPP-KEK Grid Testbed Network 1 GbE connection over Super-SINET between ICEPP PC farm, KEK PC farm and HPSS servers in single subnet. RTT ~ 4ms / quality is quite good. 2003/3/27 CHEP03
Japan HEP network backbone on “Super-SINET” Project of Advanced Science and Technologies Super SINET Super SINET : Research 10Gbps Backbone GbE / 10GbE Bridges for peer-connection Connecting major HEP Univ. and institues in Japan. Operation of Optical Cross Connect (OXC) for fiber / wavelength switching Operational from 4th January, 2002 to end of March, 2005 Will be merged in Photonic SINET after April, 2005 2003/3/27 CHEP03
KEK ~60km ICEPP
GRID testbed environment with HPSS through GbE-WAN HPSS servers NorduGrid - grid-manager - gridftp-server Globus-mds Globus-replica PBS server ICEPP KEK SE HPSS 120TB CE SE 0.2TB ~ 60km NorduGrid - grid-manager - gridftp-server Globus-mds PBS server CE 6CPUs CE 100 CPUs PBS clients PBS clients 1Gbps 100Mbps User PCs 2003/3/27 CHEP03
ICEPP-KEK Grid Testbed software Globus 2.2.2 Nordugrid 0.3.12 + PBS 2.3.16 HPSS 4.3+ GSI enabled pftp (GSI-pftp) 2003/3/27 CHEP03
NorduGrid NorduGrid (The Nordic Test bed for Wide Area Computing and Data Handling) http://www.nordugrid.org “The NorduGrid architecture and tools” presented by A.Waananen et al. @ CHEP03 2003/3/27 CHEP03
Why NorduGrid Natural Application of GLOBUS toolkit for PBS. PBS clients do NOT need Globus/NorduGrid installation. We installed NG/Globus to just 3 nodes. (ICEPP CE,KEK CE, KEK HPSS SE) but can use more than 60nodes. Simple, but sufficient functionality. Actually used at ATLAS DC in Nordic states. Good start for basic regional center functionality test. 2003/3/27 CHEP03
HPSS as NorduGrid Storage Element HPSS does not speak ‘Globus’. We need something GridFTP for HPSS In design phase at Argonne Lab. Some are also being developed? (SDSC?) GSI enabled pftp (GSI-pftp) developed at LBL. GSI-pftp is not a GridFTP. But…. 2003/3/27 CHEP03
GSI-pftp as NorduGrid SE Both Gridftp and GSI-pftp are a kind of ftp, only extended protocols are not common. GridFTP GSI-pftp SPAS,SPOR,ETET, ESTO, SBUF, DCAU PBSZ,PCLO,POPN,PPOR,PROT,PRTR,PSTO AUTH,ADAT and other RFC959 2003/3/27 CHEP03
GSI-pftp as NorduGrid SE Protocols for parallel transfer and buffer management are different. DCAU (Data Channel Authentication) is unique for Gridftp. But it is option of user. GSI-pftpd and Grid-ftp client can successfully communicate each other excepting parallel transfer. 2003/3/27 CHEP03
Sample XRSL &(executable=gsim1) (arguments="-d") (inputfiles= (”data.in" "gsiftp://dt05s.cc.kek.jp:2811/hpss/ce/chep/manabe/data2")) (stdout=datafiles.out) (join=true) (maxcputime="36000") (middleware="nordugrid") (jobname="HPSS access test") (stdlog="grid_debug") (ftpThreads=1) 2003/3/27 CHEP03
Players In HPSS Disk Mover HPSS server (Disk Cache) x3 CE Computing Element in ICEPP/KEK Shared by many users Tape: 3590 (14MB/s 40GB) Disk mover GSIpftp Server x3 Tape movers CE (Gridftp client) 2CPU Power3 375MHz AIX 4.3 HPSS 4.3 Globus 2.0 x3 2CPU PenIII 1GHz RedHat 7.2 Globus 2.2 2CPU Power3 375MHz AIX 4.3 HPSS 4.3 Disk mover 2003/3/27 CHEP03
Possible HPSS Configuration 1 KEK ICEPP Super-SINET 1GbE HPSS Server Disk Mover Computing Element 60km SP Switch 150MB/s Put ‘disk mover (cache)’ near HPSS server. Cache should be near to consumer but ‘disk mover’ is far from CE. Get high-performance of SP switch. 2003/3/27 CHEP03
Possible HPSS Configuration 2 ICEPP KEK Super-SINET 1GbE Computing Element Computing Element LAN 1GbE HPSS Server Disk Mover Put ‘remote disk mover(cache)’ near CE. Fast access between CE and cached files. If access to the same file from KEK side CE, long detour happen. 2003/3/27 CHEP03
Possible HPSS configuration 3 KEK ICEPP Computing Element Computing Element HPSS Hierarchy 3 HPSS Hierarchy 2 HPSS Server To avoid long access delay for CE in KEK, disk layer can be divided into two hierarchy. But complicated configuration is it. HPSS Hierarchy 1 2003/3/27 CHEP03
Possible HPSS Configuration 1 KEK ICEPP Super-SINET 1GbE HPSS Server Disk Mover 60km Computing Element LAN WAN LAN 1GbE Current Setup Computing Element 2003/3/27 CHEP03
Performance Basic Network performance. HPSS Client API performance. pftp client - pftp server performance. Gridftp client - pftp server performance. 2003/3/27 CHEP03
Basic Network Performance RTT~4ms packet loss free. MTU=1500 CPU/NIC is bottleneck. Max TCP Buffer Size(256k) in HPSS servers cannot changed. (optimized for IBM SP switch) LAN WAN
Basic network performance on Super-sinet Network transfer with # of TCP session >4 TCP session gets MAX transfer speed. If enough TCP buffer size ~1 session get almost MAX speed. 600 Client Buffer size = 1MB WAN 400 Aggregate Tx speed (MBit/s) Client Buffer size = 100KB 200 ICEPP client KEK HPSS mover Buffer size HPSS mover = 256kB 2 4 6 8 10 # of TCP session
Disk mover disk performance HPSS SSA raw disk performance read/write ~ 35/100 MB/s PC farm’s disk performance. Read/write ~ 30-40MB/s
HPSS Client API LAN HPSS disk <-> CE memory WAN
HPSS Client API NW latency impacts to file transfer speed. Max. raw TCP speed was almost same, but data transfer speed became 1/2 in RTT~4ms WAN. The reason is not clear yet. But frequent communication between HPSS core server and HPSS client exists? (every chunk size (=4MB) ?) write overhead at single buffer transfer was bigger than read. 64MB buffer size was enough for RTT=~4ms network. 2003/3/27 CHEP03
pftp-pftp HPSS mover disk -> Client 80 LAN to client /dev/null 60 WAN Transfer speed (MB/s) 40 KEK client 20 ICEPP client ICEPP client Pwidth 2 4 6 8 10 # of file transfer in parallel
pftp-pftp ‘get’ performance We compared GSI pftp-pftp transfer with normal kerb-pftp-pftp. Both had equivalent transfer speed. Same as in Client-API transfer, even with enough buffer size, transfer speed in WAN is 1/2 of that in LAN. Simultaneous multiple file transfer (>4) gain aggregate transfer bandwidth. We had 2 disk movers with 2 disk paths each (2x2=4) Single file transfer with multiple TCP session (pftp function (command=pwidth)) was not effective for RTT=4ms network with enough FTP buffer. 2003/3/27 CHEP03
80 60 40 20 2 4 6 8 10 pftp-pftp HPSS mover disk Client to /dev/null Aggregate Transfer speed (MB/s) KEK client (LAN) 40 ICEPP client(WAN) 20 Ftp buffer=64MB to client disk client disk speed 35-45MB/s 2 4 6 8 10 # of file transfer in parallel Client disk speed @ KEK = 48MB/s Client disk speed @ ICEPP=33MB/s
pftp-pftp get performance (2) Even if each component (disk, network) has good performance. Total staging performance becomes bad. If access is done in serial way. 640Mbps=80MB/s CPU CPU 40MB/s 100MB/s Total speed = 1/( 1/100 + 1/80 + 1/40) = 21MB/s 2003/3/27 CHEP03
HPSS ‘get’ with Tape Library pftp-pftp get performance Thanks to HPSS multi file transfer between tape and disk hierarchy, and enough number of tape drives, we could get speed up in multiple file transfer even if data was in tapes. 300 tape off drive tape in drive Data in Tape 200 Elapsed Time (sec) Data in disk cache 100 data was on HPSS mover disk data was in HPSS mover mounted tape data was in HPSS mover unmounted tape 2 4 6 8 10 # of file transfer in parallel
GSI-pftp ‘put’ performance 1 file N files Aggregate N files N files 1 file file (pwidth)
Gridftp client and GSI-pftp server disk mver (!=pftpd) client pftp-pftpd disk mver (=pftpd) client gridftp-pftpd disk mver (!=pftpd) client gridftp-pftpd
GSI-pftpd with Gridftp client It works ! But less secure than Gridftp-Gridftpd (omit data path authentication) In our environment, GridFTP parallel TCP transfer is not needed. With multiple disk mover, all data transfer go through single pftpd server. (if use with Gridftp client) 2003/3/27 CHEP03
Path difference pftp - pftpd Gridftp – GSI-pftpd Disk mover x3 Tape mover Disk mover pftp Server CE (pftp client) x3 pftp - pftpd Tape mover Disk mover pftp Server CE (gridftp client) x3 Gridftp – GSI-pftpd
Summary ICEPP and KEK configured NorduGrid test bed with HPSS storage server over High speed GbE WANetwork. Network latency affected HPSS data transfer speed. ‘GSI-pftpd’ developed by LBL is successfully adopted to the interface between NorduGrid and HPSS. But it has room for performance improvement with multi-disk movers. 2003/3/27 CHEP03
Summary HPSS parallel mechanism (multi-disk/tape servers) was effective for utilize High-speed middle-range distance network bandwidth. 2003/3/27 CHEP03