KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK
CONTENTS KISTI GSDC Overview Tier-1 operations Summary HEPiX Spring 2015 Workshop 2
KISTI GSDC OVERVIEW HEPiX Spring 2015 Workshop 3
KISTI Location South Korea KISTI 30 Government Research Institutes 11 Public Research Institutes 29 Non-profit Organizations 7 Universities Daedeok R&D Innopolis HEPiX Spring 2015 Workshop 4 Rare Isotope Accelerator (To be constructed)
KISTI GSDC Government funding research institute for IT founded in people working for National Information Service (distribution & analysis), Supercomputing and Networking Operating Supercomputing and NREN Infrastructure Supercomputer: TFlops at peak (14 th ranked at Top500 in 2009; 201 st now) NREN Infrastructure: KREONet2 Domestic: Seoul ←(100G)→ Daejeon International: HK ←(10G)→ Chicago/Seattle (Member of GLORIAD) 5 KISTI (Korea Institute of Science and Technology Information) History of GSDC 7 years of the experience running grid computing centre with the collaboration with the ALICE experiment and WLCG GSDC (Global Science experiment Data hub Center) Government funding project to promote research experiment providing computing power and storage HEP: ALICE, CMS, Belle, RENO Others: LIGO, Bioinformatics Running Data-Intensive Computing Facility 18 staffs: sysadmin, experiment support, external-relation, administration Total 5,500 cores, 4,000 TB disk and 1,500 TB tape storage GSDC Facility ALICE T2 operation start Formation of GSDC ALICE T2 Test-bed ALICE T1 Test-bed KISTI Analysis Facility ALICE T1 candidate Full T1 for ALICE CMS T HEPiX Spring 2015 Workshop
GSDC System Overview HEPiX Spring 2015 Workshop PB Torque/MAUI 3,000 slots ALICE T1, Belle, RENO HTCondor 2,000 slots CMS T3, LIGO, KIAF Public Private 1.5 PB 2.5 PB IBM TSM/GPFS HITACHI USP/VSP EMC HITACHI HNAS EMC ICILON 4 Spine switches 74 Leaf switches 500+ Servers in 22 racks 14 Storage racks 4 tape frames 40 RACKS!!!!
System Management Services are defined at Puppet (manifests, profiles) Stash is used for Puppet code management Nodes are created/provisioned via Foreman with Puppet classes Any VMs are managed by the Red Hat solution Centralized authN/authZ are provided via IPA (SSO to be implemented) JIRA helps to track issues and to manage project Confluence is a useful tool for documentation and sharing HEPiX Spring 2015 Workshop 7 Project Issue tracking Puppet code management (via Git) Documentation & Space Node definition Provisioning Manifests Profiles v3.7.4
TIER-1 OPERATIONS HEPiX Spring 2015 Workshop 8
KISTI, 4.88% Jobs Aug 2014Feb 2015 ~ 2500 ~ 100 (Queued Agents) Proxy failure due to KISTI-CERN network down Automatic backup routing established afterwards Linux Kernel security patch HEPiX Spring 2015 Workshop 9 2,688 concurrent jobs = 28 kHS06 84 nodes, 32 (logical) cores per node, 10.5 HS06/core 2015 pledges Stable and smooth running Down of KISTI-CERN 2G link in September Linux kernel security patch before Christmas in 2014 Completed 2.3M jobs in the last 6 months
Storage Disk: 1000TB Usage > 50% Managed by XRootD Tape: 1500TB 310 TB RAW data (p-Pb from ALICE) Available tape buffer = 400 TB Raw data on tape buffer for fast access Managed by XRootD 99% Availability (Last 6 Months) for R/W 3 Years history (KISTI_GSDC::SE2) ←Apr 2012Feb 2015→ HEPiX Spring 2015 Workshop 10
Site Availability/Reliability SepOctNovDecJanFeb Reliability100 Availability HEPiX Spring 2015 Workshop % Reliable for the last 6 month (from Sep-2014 to Feb-2015 ) Monthly Target for Reliability of ALICE test: 97% On track for a stable and reliable site Participating in weekly WLCG operations meetings (2 times (Mon/Thu) per week) : reporting operation-related issues
Plan Additional resources will be procured ~900 CPU cores (Ivy-bridge) ~700 TB disks (NAS/SAN) 2016 pledges (31k HS06) for ALICE will be made by the end of this year Elasticsearch-Logstash-Kibana will be deployed to monitor the whole system HEPiX Spring 2015 Workshop 12
KISTI-CERN Network 10Gbps Upgrade As-Is: To-Be: 2G KREONET2 + 2G SURFnet Dedicated Circuit 10G + 10G SURFnet Contracted provider will allocate the dedicated circuit 10G. 31 April HEPiX Spring 2015 Workshop 13
감사합니다 HEPiX Spring 2015 Workshop 14