Pete Gronbech, Kashif Mohammad and Vipul Davda

Slides:



Advertisements
Similar presentations
1 Dynamic DNS. 2 Module - Dynamic DNS ♦ Overview The domain names and IP addresses of hosts and the devices may change for many reasons. This module focuses.
Advertisements

Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.
Leveraging WinPE and Linux Preboot for Effective Provisioning Jonathan Richey | Director of Development | Altiris, Inc.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
Oxford Site Update HEPiX Sean Brisbane Tier 3 Linux System Administrator March 2015.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
Chapter 2: Installing and Upgrading to Windows Server 2008 R2 BAI617.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.
MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
O.S.C.A.R. Cluster Installation. O.S.C.A.R O.S.C.A.R. Open Source Cluster Application Resource Latest Version: 2.2 ( March, 2003 )
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
Linux Exercise. Download and Install the latest CentOS version and latest Ubuntu/Fedora OS. Configure a unique Host Name and a permanent IP Address for.
Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.
Cisco Confidential © 2012 Cisco and/or its affiliates. All rights reserved. 1 Cisco UCS Director – Carmel (5.0) Ravikumar Pisupati Senior Manager, Engineering.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
(ITI310) By Eng. BASSEM ALSAID SESSIONS 9: Dynamic Host Configuration Protocol (DHCP)
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
2: Operating Systems Networking for Home & Small Business.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
Overview of cluster management tools Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.
April 1st, 2009 Cobbler Provisioning Made Easy Jasper Capel.
BY: SALMAN 1.
Windows 2012R2 Hyper-V and System Center 2012
Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010
Chapter 1 Introducing Windows Server 2012/R2
Puppet and Cobbler for the configuration of multiple grid sites
Status of BESIII Distributed Computing
Pete Gronbech GridPP Project Manager April 2016
BY: SALMAN.
Belle II Physics Analysis Center at TIFR
ATLAS Cloud Operations
Heterogeneous Computation Team HybriLIT
Open Source Toolkit for Turn-Key AI Cluster (Introduction)
Stuart Wild. Particle Physics Group Meeting, January 2010.
Oxford Site Report HEPSYSMAN
CREAM-CE/HTCondor site
Diskless Remote Boot Linux
TYPES OF SERVER. TYPES OF SERVER What is a server.
Manchester HEP group Network, Servers, Desktop, Laptops, and What Sabah Has Been Doing Sabah Salih.
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Deploy OpenStack with Ubuntu Autopilot
Introduction of Week 3 Assignment Discussion
OPNFV Arno Installation & Validation Walk-Through
DHCP, DNS, Client Connection, Assignment 1 1.3
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Scaling Puppet and Foreman for HPC
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
SUSE Linux Enterprise Desktop Administration
Grid Management Challenge - M. Jouvin
The EU DataGrid Fabric Management Services
Instructor Materials Chapter 5: Windows Installation
HEPSYSMAN Summer th May 2019 Chris Brew Ian Loader
RHUL Site Report Govind Songara, Antonio Perez,
Kashif Mohammad VIPUL DAVDA
Ste Jones John Bland Rob Fay
Presentation transcript:

Pete Gronbech, Kashif Mohammad and Vipul Davda Oxford Site Report HEPSYSMAN 2019 Pete Gronbech, Kashif Mohammad and Vipul Davda

Grid Hardware Compute Nodes ARC-CE/HTCondor Manager Node DPM Storage 138 Servers ~3460 Logical CPUs 66 Supermicro 36 Dell 36 Lenovo/IBM CentOS 7 ARC-CE/HTCondor Manager Node DPM Storage 33 SE pool nodes - SL6 1 Head node - CentOS 7

Building Servers Fully automated (close, but not quite): Building Cobbler to install base OS Puppet (Version 4) - Configuration management ~60 Modules “Inhouse” modules are on local Gitlab server for source management. Building Assign an IP address to a hostname Add MAC address, IP address and hostname to the DHCP server Add hostname to Cobbler – Name, Puppet profile, MAC address and IP address Boot

Grid Batch System HTCondor with ARC CE SL6 Migrated all compute nodes to CentOS 7 Retired SL6 ARC CE and 11 old compute nodes on 3rd May 2019 CentOS 7 t2arc00: Production CE attached to ~3460 logical CPUs

Storage DPM Storage ~800TBytes Head node running DPM version 1.12.0 on CentOS 7 SE pool nodes running DPM 1.12.0 on SL6 There is still an issue with hanging semaphore A weekly cron job to remove hanging semaphore DOME upgrade is in progress

CernVM File System Issue Cvmfs mount can go stale: Fewer jobs arrive, so the cvmfs file system needs reloading manually Auto Check Script A python script running daily under cron, to check the cvmfs file system and to reload it if errors are detected

Monitoring Inhouse python script using HTCondor Python bindings

PP Local Hardware Compute Nodes HTCondor Manager Node Storage SL6 - 37 physical servers 29 Supermicro 8 Dell CentOS 7 – 7 physical servers ~192 logical CPUs 2 Supermicro 5 Dell HTCondor Manager Node CentOS 7 server Storage Gluster 13 physical servers, CentOS 7 ~930TBytes NFS 13 physical servers, SL6/CentOS 7 ~450TBytes Interactive Servers Two SL6 interactive server One CentOS 7 interactive server

PP Local Batch Systems Torque and Maui on SL6 HTCondor on CentOS 7 The plan is to retire and migrate the compute nodes to CentOS 7 HTCondor on CentOS 7 Fully configured by Puppet Now there are few active users

Gluster FS RedHat supported product RedHat developers are active on the mailing list Atlas has ~630TBytes storage and more than 100 million files LHCb has ~300TBytes storage Issues: Large numbers of small size files slow down operation Error logging is not helpful for diagnosis

Useful tool: Agedu Since there are millions of files on the gluster FS, agedu command takes weeks to complete.

Atomic and Laser Storage/Interactive server Two Worker Nodes NFS ~55TBytes storage Two Worker Nodes 32 Cores Slurm Batch System

Desktops/laptops Supported OS: Windows Desktops/Laptops and MAC Laptops - supported by others in the group Ubuntu Desktops/Laptops

Linux Desktop Delivery System Linux Desktop Flavours SL6 (2) CentOS 7 (<10) Ubuntu 18.04 – (~330) Desktop Environment Gnome Building a desktop is fully automated Step 1: Register the MAC address and the hostname in the asset management Step 2: Add the hostname to the Cobbler Step 3: Boot

Linux Desktop Delivery System Step 1: Asset Management System Register MAC address and hostname Network subnet Operating system Linux: Ubuntu (CPLXConfig3,UEFI) Location Owner Serial Number Warranty etc… Wait for email confirmation

Linux Desktop Delivery System Step 2: System Provisioning – Cobbler Register the hostname in cobbler Follow naming convention, pplxdtnnn Cobbler manages: ISO images PXE boot tftp repo Cobbler only installs vanilla OS plus: Deploys ssh keys to allow for node administration Sets node's facts (role, sub department, and research group) query --ksmeta options from Cobbler Installs Puppet agent and repo Runs Cobbler triggers for: copy hostkeys copy puppet certs Clean up disable netboot Any other configuration aspect is handled by Puppet.

Linux Desktop Delivery System Remote reinstall Cobbler Koan – “kickstart over a network” When koan is run, modifies the grub to netboot it requests install information from the cobbler server it then kicks off installations koan works ~90% of the time

Recent Purchases Two Fortinet 601E 10G/bit firewalls. to be installed and configured Ten Dell PowerEdge C6420 few issues when installing

Questions?