Computer System Replacement at KEK K. Murakami KEK/CRC
Outline Overview Introduction of New Central Computing System (KEKCC) CPU Storage Operation Aspects 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 2
Overview 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 3
Computing Facility at KEK 2 System Super Computer System Central Computer System Linux cluster Support for IT infrastructure (mail / web) Both system are now under replacement Rental System System replacement by every 3-5 years International Bidding Cycle of RFI / RFP, System introduction for 2 years 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 4
KEK supercomputer system KEKSC is now in service / fully installed soon. For large scale numerical simulations System-A is running Sep 2011—Jan 2012 System-A+B: March 2012– System-A: Hitachi SR16000 model M1 Power7, 54.9 TFlops, 14TB memory 56 nodes: 960GFlops, 256GB/node Automated parallelization on single node (32 cores) System-B: IBM Blue Gene/Q 6 racks (3 from Mar 2012, 3 from Oct 2012) 1.258PFlops, 96TB in total Rack: 1024 nodes, 5D torus network 209.7TFlops, 16TB memory Scientific subjects Large-scale simulation program ( /Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 5
New Central Computer System 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 6 Central Computer System (KEKCC) B-Factory Computer System new KEKCC Rental period will end in next Feb. Service-in on Apr/2012
New KEKCC 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 7
Features of New KEKCC Main Contractor : 3.5 years rental system (until Aug/2015) 4000 cores CPU Linux cluster (SL5) Interactive / Batch servers Grid (gLite) deployed Storage system for BIG data 7PB disk storage (DDN) Tape library with max. capacity of 16 PB High-speed I/O, High scalability 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 8
CPU Work server & Batch server Xeon 5670 (2.93 GHz / 3.33 GHz TB, 6core) 282 nodes : 4GB /core 58 nodes : 8GB /core 2 CPU/node : 4080 cores Interconnect InfiniBand 4xQDR (4GB/s), RDMA Connection to storage system Job scheduler LSF (ver. 8) Scalability up to 1M jobs Grid deployment gLite Work server as Grid-UI, Batch server as Grid-WN 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 9 IBM System x iDataPlex
Disk System 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 10 DDN SFA10000 DDN SFA10K x 6 Capacity : 1152TB x 6 = 6.9 PB (effective) Throughput: 12 GB/s x 6 used for GPFS and GHI GPFS file system Parallel file system Total throughput : > 50 GB/s Optimized for massive access number of file servers no bottle-neck interconnect, RDMA-enabled Separation of meta-data area large block size Performance >500MB/s for single file I/O in benchmark test
Tape System 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 11 Tape Library Max. capacity : 16 PB Tape Drive TS1140 : 60 drives latest enterprise drive We do not use LTO because of less reliability. Only two venders, IBM or StorageTek Tape Media JC : 4TB, 250 MB/s JB : 1.6TB (repack), 200 MB/s Magnetic body produced by Fuji Film is used for both IBM and StorageTek media. IBM TS3500 IBM TS1140
HSM (Hierarchical Storage Management) HPSS Disk (first layer) + Tape (second layer) Experience in former KEKCC Improvements from former system Increase of tape drives Improvement on tape drive I/O speed Enforcement on interconnect (10GbE, IB) Performance improvement on staging area (capacity, access speed) Integration with GPFS file system (GHI) GHI (GPFS-HPSS interface) : New! GPFS as staging area Perfect coherence with GPFS access (POSIX I/O) no HPSS client API instead of current VFS interface high performance I/O of GPFS 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 12
GHI Data Flow Mover #1 Mover #2 Mover #3 Mover #4 Tape Lib CORE Server Lab-LAN SAN Switch HPSS Disk LAN GPFS NSD#1 GPFS NSD#2 GPFS NSD#3 SAN Switch GPFS Disk Lab LAN Linux Cluster write read /Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting
Internal Cloud Service Motivation Requirements of specific system experiments, groups, community test for new operating system Efficient resource management (servers on demand) PAAS-type of service Cloud middleware Platform IFS + IFS adaptive cluster (coherence with LSF) In future, open solution (e.g. Openstack) Provisioning tools KVM (VM solution) xCAT (system reinstallation by node) Virtualization technology, not yet enough… CPU virtualization is ok, but I/O virtualization is not yet enough. Technology choice : 10GbE or IB taking into accounts of virtualization technology. (nPAR, SR-IOV) 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 14
Operation Aspects 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 15
Effect of 3.11 Earthquake Earthquake Intensity at KEK (Tsukuba) 6- in Japanese scale / 7 max VIII in MMI (Modified Mercalli) scale Hardware damage was minimal. Some racks waved. Some HDDs were broken, minimal data loss UPS was no helpful. Introduce automatic shutdown mechanism within UPS alive especially for disk system. Crisis of Electricity Supply Accident of Fukushima nuclear power plant Many (almost) nuclear power plants are off-line due to investigation of stress test. Potential risk of blackout on summer day-time Political electricity saving 30 % power cut compelled Electricity rate will be raised by about 15%. 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 16
Electricity Saving in New System Saving Energy Products IBM iData-Plex (high intensity, high cooling efficiency (-40%)) Power Unit Efficiency (>80 PLUS Silver) Tape is green device. Disk system is not eco. No MAID risk on failure, data transfer rate (grid access) Electrical Power Visualization Electrical consumption of all components is monitored. IBM System Director Intelligent PDU Power clamp meter Power capping IBM Active Energy Manager Power capping for servers controlling CPU frequency Max. power consumption can be set to 220 W – 350 W / server 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 17
Challenges to the Future Facility Electricity new system 350 KW +400 KW air cooling current PUE > 2.x Mega-W scale in next Cooling Water cooling Space New building Data center container Data management BIG (Exascale) data management EByte in near future Data copy at every system replacement 5PB in current, 20PB in next,... Strategy for tape / library (IBM / StorageTek) Development of tape generation is too rapid. 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 18
Summary Computer Facility at KEK Super computer system Central computer (KEKCC) system (Linux cluster) migrated system (former KEKCC + B-Factory CC) Service-in from Apr./2012 4000-cores CPU Linux cluster (Scientific Linux 5.6) Grid environment (gLite) Storage System 7PB DDN storage / GPFS file system 16PB capacity tape library HPSS (GPFS-HPSS Interface) as HSM High-speed access, high scalability for BIG data Challenges to the Future How to design next system 2012/Mar/14FJPPL (KEK/CRC - CC/IN2P3) meeting 19