GRID in JINR and participation in the WLCG project Korenkov Vladimir LIT, JINR
T.Strizh (LIT, JINR) Tier Structure of GRID Distributed Computing: Tier-0/Tier-1/Tier-2/Tier-3 2 Tier-0 (CERN): accepts data from the CMS Online Data Acquisition and Trigger System archives RAW data the first pass of reconstruction and performs Prompt Calibration data distribution to Tier-1 Tier-1 (11 centers): receives a data from the Tier-0 data processing (re- reconstruction, skimming, calibration etc) distributes data and MC to the other Tier-1 and Tier-2 secure storage and redistribution for data and MC Tier-2 (>200 centers): simulation user physics analysis
T.Strizh (LIT, JINR) Some history 1999 – Monarc Project 1999 – Monarc Project Early discussions on how to organise distributed computing for LHCEarly discussions on how to organise distributed computing for LHC EU DataGrid project EU DataGrid project middleware & testbed for an operational gridmiddleware & testbed for an operational grid – LHC Computing Grid – LCG – LHC Computing Grid – LCG deploying the results of DataGrid to provide adeploying the results of DataGrid to provide a production facility for LHC experiments – EU EGEE project phase – EU EGEE project phase 1 starts from the LCG gridstarts from the LCG grid shared production infrastructureshared production infrastructure expanding to other communities and sciencesexpanding to other communities and sciences – EU EGEE-II – EU EGEE-II Building on phase 1Building on phase 1 Expanding applications and communities …Expanding applications and communities … – EU EGEE-III – EU EGEE-III EGI-InSPIRE EGI-InSPIRE CERN
T.Strizh (LIT, JINR) Russian Data Intensive Grid infrastructure (RDIG) RDIG Resource Centres: – ITEP – JINR-LCG2 (Dubna) – RRC-KI – RU-Moscow-KIAM – RU-Phys-SPbSU – RU-Protvino-IHEP – RU-SPbSU – Ru-Troitsk-INR – ru-IMPB-LCG2 – ru-Moscow-FIAN – ru-Moscow-MEPHI – ru-PNPI-LCG2 (Gatchina) – ru-Moscow-SINP - Kharkov-KIPT (UA) - BY-NCPHEP (Minsk) - UA-KNU -- UA-BITP The Russian consortium RDIG (Russian Data Intensive Grid), was set up in September 2003 as a national federation in the EGEE project. Now the RDIG infrastructure comprises 17 Resource Centers with > kSI2K CPU and > 4500 TB of disc storage.
T.Strizh (LIT, JINR) CPU: 2560 Core Total performance HEPSpec06 Disk storage capacity 1800 TB Growth of the JINR Tier2 resources in JINR Tier2 Center Availability and Reliability = 99% HEPSpec06 ) More than 7,5 millions tasks and more than 130 millions Normalised CPU time ( HEPSpec06 ) were executed during July July 2012 Performance Disk storage capacity
T.Strizh (LIT, JINR) Scheme of the CICC network connections
T.Strizh (LIT, JINR) Support of VOs and experiments Now CICC JINR as a grid-site of the global grid infrastructure supports computations of 10 Virtual Organizations (alice, atlas, biomed, cms, dteam, fusion, hone, lhcb, rgstest and ops), and also gives a possibility of using the grid- resources for the CBM and PANDA experiments.
T.Strizh (LIT, JINR) JINR monitoring main page JINR LAN Infrastructure Monitoring is one of the most important services. The Network monitoring information system (NMIS) with web-based graphical interface – provides the operational monitoring processes and monitors the mosthttp://litmon.jinr.ru important network elements: JINR LAN routers and switches, remote routers and switches, cooling units, DNS servers, dCache servers, Oracle servers, RAIDs, WLCG servers, working nodes, etc. More than 350 network nodes are in round-the-clock monitoring.
T.Strizh (LIT, JINR) 9 Country Normalized CPU time per Country LHC VO (July July 2012) All Country - 3,808,660,150 Russia- 95,289,002 (2.5%) ,108,813,533 Russia - 45,385,987
T.Strizh (LIT, JINR) 1010 Russia Normalized CPU time per SITE LHC VO (July July 2012)
Contribution of RDMS CMS sites – 4% (16,668,792 kSI2K ) Contribution of JINR – 52% CMS Computing Support at the JINR CMS T2 center at JINR: CPU1240kSI2k Slots495jobs Storage Disk376TB All CMS Sites CMS Sites in Russia JINR Statistics for 12 last months: 08/ /2012
T.Strizh (LIT, JINR) JINR Monitoring and Analysis Remote Centre Monitoring of detector systems Data Monitoring / Express Analysis Shift Operations (except for run control) Communications of JINR shifter with personal at CMS Control Room (SX5) and CMS Meyrin centre Communications between JINR experts and CMS shifters Coordination of data processing and data management Training and Information 12 Ilya Gorbunov and Sergei Shmatov “RDMS CMS Computing”, AIS2012, Dubna, May 18, 2012
T.Strizh (LIT, JINR) Atlas Computing Support at JINR Atlas at JINR (Statistics for 12 last months: ) : CPU time (HEPSpec06): 51,948,968 (52%) Jobs : 4,885,878 (47%) Atlas Sites in Russia
T.Strizh (LIT, JINR) System of remote access in real time (SRART) for monitoring and quality assessment of data from the ATLAS at JINR At present the system of remote access in real time is debugged on real data of the ATLAS experiment. One of the most significant results of the team TDAQ ATLAS at LIT during the last few years was the participation in the development of the project TDAQ ATLAS at CERN. The system of remote access in real time (SRART) for monitoring and quality assessment of data from the ATLAS at JINR was put in operation.
T.Strizh (LIT, JINR) ALICE Computing Support at JINR Seven RDIG sites (IHEP, ITEP, JINR, PNPI, RRC-KI, SPSU and Troitsk) in last 6 months contribute 4.19% to whole CPU of ALICE; -”- 9.18% to whole DONE Jobs number in last 6 months Delivered Efficiency Job statistics Group CPU Wall CPU/Wall Assigned Completed Efficiency RDIG % % In July 2012
T.Strizh (LIT, JINR) SE usage More than 1PB of disk space are allocated today by seven RDIG sites for ALICE. More than 1PB of disk space are allocated today by seven RDIG sites for ALICE. ~744 TB of this disk space is used today. ALICE management asks about increase disk capacity as a speed of new data taken and development is much faster than speed of disk cleaning. ~744 TB of this disk space is used today. ALICE management asks about increase disk capacity as a speed of new data taken and development is much faster than speed of disk cleaning. Example of CPU usage share between seven RDIG sites. Example of CPU usage share between seven RDIG sites.
T.Strizh (LIT, JINR) Aggregate SE traffic Summary SE in traffic in 6 months was = 943 TB, out was = 9.6PB
T.Strizh (LIT, JINR) The tasks of the Russia & JINR in the WLCG (2012 years): Task 1. MW (gLite) testing (supervisor O. Keeble) Task 1. MW (gLite) testing (supervisor O. Keeble) Task 2. LCG vs Experiments (supervisor I. Bird) Task 2. LCG vs Experiments (supervisor I. Bird) Task 3. LCG monitoring (supervisor J. Andreeva) Task 3. LCG monitoring (supervisor J. Andreeva) Task 4. Tier3 monitoring (supervisor J. Andreeva, A.Klementov) Task 4. Tier3 monitoring (supervisor J. Andreeva, A.Klementov) Task 5/6. Genser/ MCDB ( supervisor W. Pokorski) Task 5/6. Genser/ MCDB ( supervisor W. Pokorski) Task 7. Tier1 (supervisor I. Bird) Task 7. Tier1 (supervisor I. Bird) Worldwide LHC Computing Grid Project (WLCG) The protocol between CERN, Russia and JINR on participation in LCG Project was approved in MoU on Worldwide LHC Computing Grid (WLCG) signed by Russia and JINR in October, 2007
T.Strizh (LIT, JINR) 19 The tasks of the JINR in the WLCG: WLCG-infrastructure support and development at JINR; WLCG-infrastructure support and development at JINR; participation in WLCG middleware testing/evaluation, participation in WLCG middleware testing/evaluation, grid monitoring and accounting tools development; grid monitoring and accounting tools development; Development software for HEP applications; Development software for HEP applications; MCDB development; MCDB development; support of Users, Virtual Organization (VO) and application; User & Administrator training and education; User & Administrator training and education; support of JINR Member States in the WLCG activities; support of JINR Member States in the WLCG activities; Worldwide LHC Computing Grid Project (WLCG)
T.Strizh (LIT, JINR) 20 RDIG monitoring&accounting Monitored values CPUs - total /working / down/ free / busy Jobs - running / waiting Storage space - used / available Network - Available bandwidth Accounting values Number of submitted jobs Used CPU time Totally sum in seconds Normalized (with WNs productivity) Average time per job Waiting time Totally sum in seconds Average ratio waiting/used CPU time per job Physical memory Average per job JINR CICC Monitoring – allows to keep an eye on parameters of Grid sites' operation in real time Accounting - resources utilization on Grid sites by virtual organizations and single users
T.Strizh (LIT, JINR) Architecture
22 monitoring system FTS current stateanalysis of errorsfunctionality and optimization reliability and performance A monitoring system is developed which provides a convenient and reliable tool for receiving detailed information about the FTS current state and the analysis of errors on data transfer channels, maintaining FTS functionality and optimization of the technical support process. The system could seriously improve the FTS reliability and performance. FTS (FILE TRANSFER SERVICE) MONITORING FOR WORLDWIDE LHC COMPUTING GRID (WLCG) PROJECT
T.Strizh (LIT, JINR) The Worldwide LHC Computing Grid (WLCG)
T.Strizh (LIT, JINR) ATLAS DQ2 Deletion service During the 2010 year the works on development of the deletion service for ATLAS Distributed Data Management (DDM) system were performed During the 2010 year the works on development of the deletion service for ATLAS Distributed Data Management (DDM) system were performed Works started at the middle of April At the end August new version of Deletion Service was tested for set of sites and from November of 2010 for all sites managed by DQ2 Works started at the middle of April At the end August new version of Deletion Service was tested for set of sites and from November of 2010 for all sites managed by DQ2 Development comprises the building of new interfaces between parts of deletion service (based on the web service technology), creating new database schema, rebuilding the deletion service core part, development of extended interfaces with mass storage systems and extension of the deletion monitoring system. Development comprises the building of new interfaces between parts of deletion service (based on the web service technology), creating new database schema, rebuilding the deletion service core part, development of extended interfaces with mass storage systems and extension of the deletion monitoring system. Deletion service maintained by JINR specialists. Deletion service maintained by JINR specialists.
T.Strizh (LIT, JINR) Tier 3 sites monitoring project Traditional LHC Distributed Computing Tier-0 (CERN) → Tier-1 → Tier-2 Additional → Tier-3 Needs → the global view of the LHC computing activities The LIT participates in the development of a software suite for Tier-3 sites monitoring A virtual testbed has been created at JINR which allows simulation of various Tier3 clusters and solutions for data storage. Tier-0 Tier-1 Tier-2 Tier-3 CERN Analysis Facility Raw Raw/AOD/ESD AOD DPD Interactive analysis plots, fits, toy MC, studies, … 12 Sites Worldwide 120+ Sites Worldwide development host 2 WNs Headnode + WN PBS (Torque) 2 servers Manager (redirector) XRootD 2 WNs Headnode PROOF 2 WNs Headnode + WN Condor 2 WNs Headnode Oracle Grid Engine 2 servers Manager (redirector) XRootD OSS Client Lustre MDS Ganglia web frontend
T.Strizh (LIT, JINR) Tier 3 sites monitoring project (2011) Tier-3 sites consist of resources mostly dedicated for the data analysis by the geographically close or local scientific groups. Set of Tier 3 sites can be joined to federation. Tier-3 sites consist of resources mostly dedicated for the data analysis by the geographically close or local scientific groups. Set of Tier 3 sites can be joined to federation. Many Institutes and National Communities built (or have plans to build) Tier-3 facilities. Tier-3 sites comprise a range of architectures and many do not possess Grid middleware, which would render application of Grid monitoring systems useless. Many Institutes and National Communities built (or have plans to build) Tier-3 facilities. Tier-3 sites comprise a range of architectures and many do not possess Grid middleware, which would render application of Grid monitoring systems useless. Joined effort of ATLAS, BNL (USA), JINR and CERN IT (ES group) Joined effort of ATLAS, BNL (USA), JINR and CERN IT (ES group) Objectives for Tier3 monitoring Objectives for Tier3 monitoring Monitoring of Tier 3 site.Monitoring of Tier 3 site. Monitoring of Tier 3 sites federation.Monitoring of Tier 3 sites federation. Monitoring of Tier 3 site Monitoring of Tier 3 site Detailed monitoring of the local fabric (overall cluster or clusters monitoring, monitoring each individual node in the cluster, network utilization)Detailed monitoring of the local fabric (overall cluster or clusters monitoring, monitoring each individual node in the cluster, network utilization) Monitoring of the batch system.Monitoring of the batch system. Monitoring of the mass storage system (total and available space, number of connections, I/O performance)Monitoring of the mass storage system (total and available space, number of connections, I/O performance) Monitoring of VO computing activities at a siteMonitoring of VO computing activities at a site Monitoring of Tier 3 sites federation Monitoring of Tier 3 sites federation Monitoring of the VO usage of the Tier3 resources in terms of data transfer and job processing and the quality of the provided service based on the job processing and data transfer monitoring metrics.Monitoring of the VO usage of the Tier3 resources in terms of data transfer and job processing and the quality of the provided service based on the job processing and data transfer monitoring metrics.
T.Strizh (LIT, JINR) xRootD monitoring architecture Implementation of the XROOTD monitoring foresees 3 levels of hierarchy: site, federation, global. Implementation of the XROOTD monitoring foresees 3 levels of hierarchy: site, federation, global. Monitoring on the site level is implemented in the framework of the Tier3 monitoring project and is currently under validation by the ATLAS pilot sites. Monitoring on the site level is implemented in the framework of the Tier3 monitoring project and is currently under validation by the ATLAS pilot sites. The site collector reads xrootd summary and detailed flows, process them in order to reformat into event-like information and publishes reformatted info to MSG. At the federation level data consumed from MSG is processed using hadoop/mapreduce to correlate events which are related to the same transfer and to generate federation level statistics which becomes available via UI and APIs. The UI is similar to the UI of the Global WLCG transfer Dashboard, therefore most of the UI code can be reused. Finally messages similar to FTS transfer messages describing every particular transfer are published from the federation to MSG to be further consumed by Global WLCG transfer Dashboard. The site collector reads xrootd summary and detailed flows, process them in order to reformat into event-like information and publishes reformatted info to MSG. At the federation level data consumed from MSG is processed using hadoop/mapreduce to correlate events which are related to the same transfer and to generate federation level statistics which becomes available via UI and APIs. The UI is similar to the UI of the Global WLCG transfer Dashboard, therefore most of the UI code can be reused. Finally messages similar to FTS transfer messages describing every particular transfer are published from the federation to MSG to be further consumed by Global WLCG transfer Dashboard. 27
T.Strizh (LIT, JINR) xRootD monitoring: the schema of information flow 28
T.Strizh (LIT, JINR) Data transfer global monitoring system 29
T.Strizh (LIT, JINR) 30 Training courses on grid technologies Training courses on grid technologies for users (basic idea and skills), for users (basic idea and skills), for system administrators (grid sites deployment), for system administrators (grid sites deployment), for grid applications and services developers. for grid applications and services developers. Evaluation of different implementations of grid middleware and testing particular grid services. Evaluation of different implementations of grid middleware and testing particular grid services. Grid services development and porting existing applications into grid environment. Grid services development and porting existing applications into grid environment. Grid training and education (T-infrastructure)
T.Strizh (LIT, JINR) 31 T-infrastructure implementation All services are deployed on VMs (OpenVZ) All services are deployed on VMs (OpenVZ) Main parts: Main parts: three grid sites on gLite middleware, three grid sites on gLite middleware, GT5 testbed, GT5 testbed, desktop grid testbed based on BOINC, desktop grid testbed based on BOINC, testbed for WLCG activities. testbed for WLCG activities. Running since 2006 Running since 2006
T.Strizh (LIT, JINR) JINR Grid-infrastructure for training and education – first step towards construction of the JINR Member States grid-infrastructure Consists of three grid sites at JINR and one at each of the following sites: Institute of High-Energy Physics - IHEP (Protvino), Institute of High-Energy Physics - IHEP (Protvino), Institute of Mathematics and Information Technologies AS of Republic of Uzbekistan - IMIT (Tashkhent, Uzbekistan), Institute of Mathematics and Information Technologies AS of Republic of Uzbekistan - IMIT (Tashkhent, Uzbekistan), Sofia University "St. Kliment Ohridski" - SU (Sofia, Bulgaria), Sofia University "St. Kliment Ohridski" - SU (Sofia, Bulgaria), Bogolyubov Institute for Theoretical Physics - BITP (Kiev, Ukraine), Bogolyubov Institute for Theoretical Physics - BITP (Kiev, Ukraine), National Technical University of Ukraine "Kyiv Polytechnic Institute" - KPI (Kiev, Ukraine). National Technical University of Ukraine "Kyiv Polytechnic Institute" - KPI (Kiev, Ukraine). Letters of Intent with Moldova “MD-GRID”, Mongolia “Mongol- Grid”, Kazakhstan Project with Cairo University
T.Strizh (LIT, JINR) 33 Participation in GridNNN project Grid support for Russian national nanotechnology network Grid support for Russian national nanotechnology network To provide for science and industry an effective access to the distributed computational, informational and networking facilities To provide for science and industry an effective access to the distributed computational, informational and networking facilities Expecting breakthrough in nanotechnologies Expecting breakthrough in nanotechnologies Supported by the special federal program Supported by the special federal program Main points Main points based on a network of supercomputers (about 15-30) based on a network of supercomputers (about 15-30) has two grid operations centers (main and backup) has two grid operations centers (main and backup) is a set of grid services with unified interface is a set of grid services with unified interface partially based on Globus Toolkit 4 partially based on Globus Toolkit 4
T.Strizh (LIT, JINR) 34 GridNNN infrastructure 10 resource centers at the moment in different regions of Russia RRC KI, «Chebyshev» (MSU), IPCP RAS, CC FEB RAS, ICMM RAS, JINR, SINP MSU, PNPI, KNC RAS, SPbSU RRC KI, «Chebyshev» (MSU), IPCP RAS, CC FEB RAS, ICMM RAS, JINR, SINP MSU, PNPI, KNC RAS, SPbSU
T.Strizh (LIT, JINR) Russian Grid Network Goal — to make a computational base for hi-tech industry and science Goal — to make a computational base for hi-tech industry and science Using the network of supercomputers and original software created within recently finished GridNNN project Using the network of supercomputers and original software created within recently finished GridNNN project Some statistics Some statistics 19 resource centers 19 resource centers 10 virtual organizations 10 virtual organizations 70 users 70 users more than 500'000 more than 500'000 tasks processed
T.Strizh (LIT, JINR) Project: Model of a shared distributed system for acquisition, transfer and processing of very large- scale data volumes, based on Grid technologies, for the NICA accelerator complex Terms: Cost: federal budget - 10 million rubles, extrabudgetary sources - 25% of the total cost Leading executor: LIT JINR Co-executor: VBLHEP JINR MPD data processing model ( from “The MultiPurpose Detector – MPD Conceptual Design Report v. 1.4 ”)
T.Strizh (LIT, JINR) 37 WLCG Tier1 in Russia Proposal to create the LCG Tier-1 center in Russia: official letter by A. Fursenko, Minister of Science and Education of Russia, has been sent to CERN DG R. Heuer in March 2011 Proposal to create the LCG Tier-1 center in Russia: official letter by A. Fursenko, Minister of Science and Education of Russia, has been sent to CERN DG R. Heuer in March 2011 The corresponding points were included to the agenda of Russia – CERN 5x5 meeting in 2011 The corresponding points were included to the agenda of Russia – CERN 5x5 meeting in 2011 serving all four experiments: ALICE, ATLAS, CMS and LHCb serving all four experiments: ALICE, ATLAS, CMS and LHCb ~10% of the total existing Tier1 resources (without CERN) ~10% of the total existing Tier1 resources (without CERN) increase by 30% on each year increase by 30% on each year draft planning (proposal under discussion) to have prototype in the end of 2012 and full resources in 2014 to be ready for the start of the next LHC session draft planning (proposal under discussion) to have prototype in the end of 2012 and full resources in 2014 to be ready for the start of the next LHC session
T.Strizh (LIT, JINR) JINR Resources JINR-LCG2 Tier2 April 2012Dec. 2012Dec CPU (HEP-SPEC06) Disk(Tbytes) CMS ATLAS ALICE Tape (Tbytes) JINR-CMS Tier1 Dec Prototype Sep Start CPU (HEP-SPEC06) Disk(Tbytes) CMS Tape (Tbytes)
T.Strizh (LIT, JINR) 39 Frames for Grid cooperation of JINR Worldwide LHC Computing Grid (WLCG); EGI-InSPIRE EGI-InSPIRE RDIG Development Global data transfer monitoring system for WLCG infrastructure CERN-RFBR project “Global data transfer monitoring system for WLCG infrastructure” NASU-RFBR project “Development and support of LIT JINR and NSC KIPT grid- infrastructures for distributed CMS data processing of the LHC operation” NASU-RFBR project “Development and support of LIT JINR and NSC KIPT grid- infrastructures for distributed CMS data processing of the LHC operation” BMBF grant “Development of the Grid-infrastructure and tools to provide joint investigations performed with participation of JINR and German research centers” “Development of Grid segment for the LHC experiments” was supported in frames of JINR-South Africa cooperation agreement; Development of Grid segment at Cairo University and its integration to the JINR GridEdu infrastructure Development of Grid segment at Cairo University and its integration to the JINR GridEdu infrastructure JINR - FZU AS Czech Republic Project “The GRID for the physics experiments” JINR - FZU AS Czech Republic Project “The GRID for the physics experiments” JINR-Romania cooperation Hulubei-Meshcheryakov programme (MD-GRID, RENAM) JINR-Moldova cooperation (MD-GRID, RENAM) (Mongol-Grid) JINR-Mongolia cooperation (Mongol-Grid) JINR-Slovakia cooperation Kazakhstan JINR- Kazakhstan cooperation (ENU Gumelev) Russian Grid Network” Project “Russian Grid Network”
T.Strizh (LIT, JINR) WEB-PORTAL “GRID AT JINR” – “ГРИД В ОИЯИ”: A new informational resource has been created at JINR: web-portal “GRID AT JINR”. The content includes the detailed information on the JINR grid-site and JINR’s participation in grid projects: КОНЦЕПЦИЯ ГРИД ГРИД-технологии ГРИД-проекты Консорциум РДИГ ГРИД-САЙТ ОИЯИ Инфраструктура и сервисы Схема Статистика Поддержка ВО и экспериментов ATLAS CMS CBM и PANDA HONE Как стать пользователем ОИЯИ В ГРИД-ПРОЕКТАХ WLCG ГридННС EGEE Проекты РФФИ Проекты ИНТАС СКИФ-ГРИД ТЕСТИРОВАНИЕ ГРИД-ПО СТРАНЫ-УЧАСТНИЦЫ ОИЯИ МОНИТОРИНГ И АККАУНТИНГ RDIG-мониторинг dCache-мониторинг Dashboard FTS-мониторинг Н1 МС-мониторинг ГРИД-КОНФЕРЕНЦИИ GRID NEC ОБУЧЕНИЕ Учебная грид-инфраструктура Курсы и лекции Учебные материалы ДОКУМЕНТАЦИЯ Статьи Учебные материалы НОВОСТИ КОНТАКТЫ