Статус ГРИД-кластера ИЯФ СО РАН.

Slides:



Advertisements
Similar presentations
BINP/GCF Status Report Jan 2010
Advertisements

Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
Bondyakov A.S. Institute of Physics of ANAS, Azerbaijan JINR, Dubna.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Planning and Designing Server Virtualisation.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
+ discussion in Software WG: Monte Carlo production on the Grid + discussion in TDAQ WG: Dedicated server for online services + experts meeting (Thusday.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
Virtualization for the LHCb Online system CHEP Taipei Dedicato a Zio Renato Enrico Bonaccorsi, (CERN)
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Performance analysis comparison Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
LIT participation LIT participation Ivanov V.V. Laboratory of Information Technologies Meeting on proposal of the setup preparation for external beams.
Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number Federated Cloud Update.
13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague
SuperB – Naples Site Dr. Silvio Pardi. Right now the Napoli Group is employed in 3 main tasks relate the computing in SuperB Fast Simulation Electron.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
CERN Openlab Openlab II virtualization developments Havard Bjerke.
Accessing the VI-SEEM infrastructure
Title of the Poster Supervised By: Prof.*********
GRID OPERATIONS IN ROMANIA
The Beijing Tier 2: status and plans
Grid site as a tool for data processing and data analysis
Operations and plans - Polish sites
Moroccan Grid Infrastructure MaGrid
Building a Virtual Infrastructure
Enrico Bonaccorsi, (CERN) Loic Brarda, (CERN) Gary Moine, (CERN)
Update on Plan for KISTI-GSDC
Luca dell’Agnello INFN-CNAF
The INFN TIER1 Regional Centre
Welcome! Thank you for joining us. We’ll get started in a few minutes.
Grid Computing.
NGS Oracle Service.
PES Lessons learned from large scale LSF scalability tests
Conditions Data access using FroNTier Squid cache Server
Virtualization in the gLite Grid Middleware software process
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
GGF15 – Grids and Network Virtualization
Grid Canada Testbed using HEP applications
OS Virtualization.
Managing Clouds with VMM
Public vs Private Cloud Usage Costs:
ETHZ, Zürich September 1st , 2016
Presentation transcript:

Статус ГРИД-кластера ИЯФ СО РАН. Использование внешних вычислительных ресурсов. А. Сухарев Рабочее совещание «Physics & Computing in ATLAS» МИФИ, 27 января 2011 г.

BINP LCG Farm CPU: 40 cores (100 kSI2k) | 200 GB RAM HDD: 25 TB raw (22 TB visible) on arrays 3 TB on PVFS2 (distributed) Platform: Scientific Linux 5 on x86_64 Virtualization: XEN, KVM

2010 Plans Getting OK for all the SAM tests Confirm the stability of operations for 1-2 weeks Upscale the number of WNs to the production level (up to 32 CPU cores = 80 kSI2k max) Ask ATLAS VO admins to install the experimental software on the site Test the site for ability to run ATLAS production jobs Check if the 110 Mbps SB RAS channel is capable to carry the load of 80 kSI2k site Get to production with ATLAS VO Suspended until 2011Q1: network infrastructural problems are to be solved first.

Future Prospects (2011-...) Major upgrade of the BINP/GCF hardware focusing on the storage system capacity and performance Up to 550 ТB of online (HDD) storage (and 680 CPU cores) Switch SAN fabric Solving the problem with NSK-MSK connectivity for the LCG site Dedicated VPN to MSK-IX seem to be the best solution Start getting the next generation hardware this year 8x increase of CPU cores density (up to 16 cores/1U) Adding DDR IB (20 Gbps) network to the farm 8 Gbps FC based SAN 2x increase of storage density (up to 10 TB/1U)

Available Computing Resources Siberian Supercomputer Center (SSCC) at the Institute of Computational Mathematics & Mathematical Geophysics (ICM&MG): 17 TFlops combined performance achieved after the recent upgrade (2010Q3) Novosibirsk State University (NSU) Supercomputer Center (NUSC): 1280 physical CPU cores, 13.4 TFlops (since 2010Q2) 10 GbE primary link + 2x auxiliary 1 Gbps links deployed SKIF-Cyberia at Tomsk State University: 9 TFlops >> 30 TFlops

External Connectivity Schema for the Research Groups of BINP

Novosibirsk State University Supercomputer http://nusc.ru http://nsu.ru

BINP-NSU Networking Schema

The Task Goal: Run common HEP software on a supercomputer Utilization of InfiniBand interconnect Batch system integration Method: Virtualization Which platform? (VMware, Xen, KVM) Proof of concept: KEDR experiment real data processing and simulation Fixed platform & environment (SLC3 on i686) Already working on BINP/GCF using Xen

Virtualization Platform VMware Poor usability (In My Opinion) Need local storage Xen Already tested on BINP/GCF (uptime > 2 years) Custom kernel InfiniBand usage causes problems KVM No support until September 2010 It is not a problem to convert VM's disk image from a virtualization platform to another.

Xen Test Run (September 2010) Several 8-core physical nodes were used exclusively, Xen-specific Linux kernel was installed No batch integration No InfiniBand utilization VM's disk images located on BINP/GCF disk array 1 month stability obtained with up to 120 KEDR VM working completely like common physical KEDR computers

Technically BINP can make use of all 1280 physical CPUcores of NUSC. KVM Usage No exclusive physical nodes, common kernel, no IB problems Batch system integration, i. e. a VM starts as an ordinary batch system job All VM's network traffic (including access to disk images) is routed via IB Technically BINP can make use of all 1280 physical CPUcores of NUSC. Recent successful runs: 509 single-core KEDR VMs 512 double-core KEDR VMs (with HyperThreading enabled on physical nodes)

Results The VM-based solution works fine for the real life example of particle detector experiment: Long term VM stability obtained: >1 month at NUSC, >1 year at BINP Grid Farm (lower limits) Most of the underlying implementation details are hidden for the users No changes were needed for detector offline reconstruction / simulation software and/or its execution environment Yet, the solution is not completely automated (KEDR & NUSC batch system integration is not done yet) Technically BINP can make use of all 1280 pysical CPU cores (2048 HT cores = ~20 kHEP-SPEC06) at NUSC for the short periods of time (1-2 weeks) and 256 pysical CPU cores (~2.5 kHEP-SPEC06) on a regular basis (prospected typical gLite WN uptime - 1 month)

Next Steps The solution obtained for KEDR applies to all particle experiments currently running at BINP as well as for LCG. Solve network infrastructure problems Prepare VMs for gLite WNs to be deployed at NUSC Get to production with ATLAS VO using NUSC computing resources Expand to other Siberian supercomputers (SSCC)

Thank you