Fabric Infrastructure LCG Review November 18 th 2003 CERN.ch.

Slides:



Advertisements
Similar presentations
GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Advertisements

CERN – BT – 01/07/ Cern Fabric Management -Hardware and State Bill Tomlin GridPP 7 th Collaboration Meeting June/July 2003.
26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Project Management Summary Castor Development Team Castor Readiness Review – June 2006 German Cancio, Giuseppe Lo Presti, Sebastien Ponce CERN / IT.
19/06/2002WP4 Workshop - CERN WP4 - Monitoring Progress report
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Towards automation of computing fabrics... – n° 1 Towards automation.
German Cancio – WP4 developments Partner Logo WP4-install plans WP6 meeting, Paris project conference
Network Redesign and Palette 2.0. The Mission of GCIS* Provide all of our users optimal access to GCC’s technology resources. *(GCC Information Services:
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
ASIS et le projet EU DataGrid (EDG) Germán Cancio IT/FIO.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
The CERN Computer Centres October 14 th 2005 CERN.ch.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: November 2011.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
Planning the LCG Fabric at CERN openlab TCO Workshop November 11 th 2003 CERN.ch.
Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.
EDG WP4: installation task LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
German Cancio – WP4 developments Partner Logo System Management: Node Configuration & Software Package Management
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer, Progress Sonic.
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
Computer Centre Upgrade Status & Plans Post-C5, June 27 th 2003 CERN.ch.
G. Cancio, L. Cons, Ph. Defert - n°1 October 2002 Software Packages Management System for the EU DataGrid G. Cancio Melia, L. Cons, Ph. Defert. CERN/IT.
Maite Barroso – WP4 Barcelona – 13/05/ n° 1 -WP4 Barcelona- Closure Maite Barroso 13/05/2003
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
CERN.ch 1 Issues  Hardware Management –Where are my boxes? and what are they?  Hardware Failure –#boxes  MTBF + Manual Intervention = Problem!
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
Phase II Purchasing LCG PEB January 6 th 2004 CERN.ch.
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer Progress Sonic.
Lemon Monitoring Presented by Bill Tomlin CERN-IT/FIO/FD WLCG-OSG-EGEE Operations Workshop CERN, June 2006.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Network design Topic 6 Testing and documentation.
Fabric Management with ELFms BARC-CERN collaboration meeting B.A.R.C. Mumbai 28/10/05 Presented by G. Cancio – CERN/IT.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Computer Centre Upgrade Status & Plans Post-C5, October 11 th 2002 CERN.ch.
Vault Reconfiguration IT DMM January 23 rd 2002 Tony Cass —
Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Partner Logo Olof Bärring, WP4 workshop 10/12/ n° 1 (My) Vision of where we are going WP4 workshop, 10/12/2002 Olof Bärring.
Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
Fabric Management: Progress and Plans PEB Tim Smith IT/FIO.
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
System Monitoring with Lemon
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
LEMON – Monitoring in the CERN Computer Centre
WP4-install status update
Status and plans of central CERN Linux facilities
German Cancio CERN IT .quattro architecture German Cancio CERN IT.
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
Presentation transcript:

Fabric Infrastructure LCG Review November 18 th 2003 CERN.ch

CERN.ch 2 Agenda  Space, Power & Cooling  CPU & Disk Server Purchase  Fabric Automation

CERN.ch 3 Agenda  Space, Power & Cooling  CPU & Disk Server Purchase  Fabric Automation

CERN.ch 4 Space & Power Requirements  CPU & Disk Servers only. Maximum power available in 2003: 500kW.

CERN.ch 5 Upgrade Timeline  The power/space problem was recognised in 1999 and an upgrade plan developed after studies in 2000/1.  Cost: 9.3MCHF, of which 4.3MCHF is for the new substation. –Vault upgrade was on budget. Substation civil engineering is overbudget (200KCHF), but there are potential savings in the electrical distribution. –Still some uncertainty on overall costs for air-conditioning upgrade.

CERN.ch box PCs 105kW U PCs 288kW 324 disk servers 120kW(?) Future Machine Room Layout 18m double rows of racks 12 shelf units or 36 19” racks 9m double rows of racks for critical servers Aligned normabarres

CERN.ch 7 Upgrade Milestones Vault converted to machine room01/11/02 Right half of machine room emptied to the vault01/08/03 15/08/03 Substation civil engineering starts01/09/03 15/08/03 Substation civil engineering finishes01/03/04 Substation electrical installation starts01/04/04 Substation commissioned07/01/05 Elec. distrib. on RH of machine room upgraded02/02/04 Left half of machine room emptied02/08/04 Elec. distrib. on LH of machine room upgraded01/06/05 Machine room HVAC upgraded01/03/05 New 800kW UPS for physics installed03/04/06 Current UPS area remodelled01/01/07 2 nd 800kW UPS added02/04/07 3 rd 800kW UPS added01/04/08 On Schedule Progress acceptable Capacity will be installed to meet power needs.

CERN.ch 8 … and what are those needs?  Box power has increased from ~100W in 1999 to ~200W today. –And, despite promises from vendors, electrical power demand seems to be directly related to Spec power:

CERN.ch 9 Space and Power Summary  Building infrastructure will be ready to support installation of production offline computing equipment from January  The planned 2.5MW capacity will be OK for 1 st year at full luminosity, but there is concern that this will not be adequate in the longer term.  Our worst case scenario is a load of 4MW in –Studies show this can be met in B513, but more likely solution is to use space elsewhere on the CERN site. –Provision of extra power would be a 3 year project. We have time, therefore, but still need to keep a close eye on the evolution of power demand.

CERN.ch 10 Agenda  Space, Power & Cooling  CPU & Disk Server Purchase  Fabric Automation

CERN.ch 11 The challenge  Even with the delayed start up, large numbers of CPU & disk servers will be needed during : –At least 2,600 CPU servers »1,200 in peak year; c.f. purchases of 400/batch today –At least 1,400 disk servers »550 in peak year; c.f. purchases of 70/batch today –Total budget: 24MCHF  Build on our experiences to select hardware with minimal total cost of ownership. –Balance purchase cost against long term staff support costs, especially for »System management (see next section of this talk), and »Hardware maintenance.

CERN.ch 12 Acquisition Milestones — I  Agree CPU and Disk service architecture by 1 st June 2004 (WBS & ).  i.e. make the decisions on which hardware minimises the TCO: –CPU: White box vs 1U vs blades; install or ready packaged –Disk: IDE vs SAN; level of vendor integration  and also the operation model –Can we run IDE disk servers at full nominal capacity? »Impact on disk space available or increased cost/power/…  Total Cost of Ownership workshop organised by the openlab, 11/12 th November.

CERN.ch 13 Acquisition Milestones — II  Agreement with SPL on acquisition strategy by December (WBS ). –Essential to have early involvement of SPL division given questions about purchase policy—likely need to select multiple vendors to ensure continuity of supply. –A little late, mostly due to changes in CERN structure.  Issue Market Survey by 1 st July 2004 ( ) –Based on our view of the hardware required, identify potential suppliers. –Input from SPL important in preparation of the Market Survey to ensure adequate qualification criteria for the suppliers. »Overall process will include visits to potential suppliers.  Finance Committee Adjudication in September 2005 ( )

CERN.ch 14 Acquisition Summary  Preparation for Phase II CPU & Disk server acquisition is a complex process with a tight schedule.  Significant effort will be required to ensure that equipment meeting our specifications is installed on the required timescale.  The major risk is that equipment does not meet our specifications. –H/W quality is problem today, especially for disk servers. –Careful definition of the qualification criteria for suppliers is essential.

CERN.ch 15 Agenda  Space, Power & Cooling  CPU & Disk Server Purchase  Fabric Automation

CERN.ch 16 ELFms  The ELFms Large Fabric management system has been developed over the past few years to enable tight and precise control over all aspects of the local computing fabric.  ELFms comprises –The EDG/WP4 quattor installation & configuration tools –The EDG/WP4 monitoring system, Lemon, and –LEAF, the LHC Era Automated Fabric system

CERN.ch 17 ELFms architecture Node Configuration System Monitoring System Installation System Fault Mgmt System

CERN.ch 18 configuration architecture GUI CLI Pan XML Server Module SQL/LDAP/HTTP Server Module SQL/LDAP/HTTP N V A AP I CDB Installation... CCM Cache Node Configuration Data Base (CDB) Configuration Information store. The information is updated in transactions, it is validated and versioned. Pan Templates are compiled into XML profiles Server Modules Provide different access patterns to Configuration Information Configuration Information is stored in the local cache. It is accessed via NVA-API HTTP + notifications nodes are notified about changes of their configuration nodes fetch the XML profiles via HTTP Pan Templates with configuration information are input into CDB via GUI & CLI pan  All in production in spring for RH7 migration. Thus no milestones.

CERN.ch 19 installation architecture CCM SPMA NCM Components Cdispd NCM Registration Notification SPMA SPMA.cfg CDB nfs http ftp Mgmt API ACL’s Client Nodes SWRep Servers cache Packages (rpm, pkg) packages (RPM, PKG) PXE DHCP Mgmt API ACL’s Installation server DHCP handling KS/JS PXE handling KS/JS generator Node Install CCM Node (re)install? WBS ; due 01/08/03, in full production from 18/07/03 WBS ; due 03/11/03, in full production from 12/11/03

CERN.ch 20 status  quattor is in complete control of our farms.  We are already seeing the benefits in terms of –ease of installation—10 minutes for LSF upgrade, –speed of reaction—ssh security patch installed across all lxplus & lxbatch nodes within 1 hour of availability, and –homogeneous software state across the farms.  quattor development is not complete, but future developments are desirable features, not critical issues. –Growing interest from elsewhere—good push to improve documentation and packaging! –Ported to Solaris by IT/PS  EDG/WP4 has delivered as required.

CERN.ch 21 Lemon Overview Measurement Repository (MR) Monitored nodes Sensor Monitoring Sensor Agent (MSA) Cache Consumer Local Consumer Sensor Consumer Global Consumer Monitoring Sensor Agent Calls plug-in sensors to sample configured metrics Stores all collected data in a local disk buffer Sends the collected data to the global repository Plug-in sensors Programs/scripts that implements a simple sensor- agent ASCII text protocol A C++ interface class is provided on top of the text protocol to facilitate implementation of new sensors The local cache Assures data is collected also when node cannot connect to network Allows for node autonomy for local repairs Transport Transport is pluggable. Two protocols over UDP and TCP are currently supported where only the latter can guarantee the delivery Measurement Repository The data is stored in a database A memory cache guarantees fast access to most recent data, which is normally what is used for fault tolerance correlations Database Repository API SOAP RPC Query history data Subscription to new data Database Proprietary flat-file database Oracle Open source interface to be developed

CERN.ch 22 Lemon Overview Measurement Repository (MR) Monitored nodes Sensor Monitoring Sensor Agent (MSA) Cache Consumer Local Consumer Sensor Consumer Global Consumer Monitoring Sensor Agent Calls plug-in sensors to sample configured metrics Stores all collected data in a local disk buffer Sends the collected data to the global repository Plug-in sensors Programs/scripts that implements a simple sensor- agent ASCII text protocol A C++ interface class is provided on top of the text protocol to facilitate implementation of new sensors The local cache Assures data is collected also when node cannot connect to network Allows for node autonomy for local repairs Transport Transport is pluggable. Two protocols over UDP and TCP are currently supported where only the latter can guarantee the delivery Measurement Repository The data is stored in a database A memory cache guarantees fast access to most recent data, which is normally what is used for fault tolerance correlations Database Repository API SOAP RPC Query history data Subscription to new data Database Proprietary flat-file database Oracle Open source interface to be developed MSA in production for over 15 months, together with sensors for performance and exception metrics for basic OS and specific batch server items. Focus now is on integrating existing monitoring for other systems, especially disk and tape servers, into the Lemon framework.

CERN.ch 23 Lemon Overview Measurement Repository (MR) Monitored nodes Sensor Monitoring Sensor Agent (MSA) Cache Consumer Local Consumer Sensor Consumer Global Consumer Monitoring Sensor Agent Calls plug-in sensors to sample configured metrics Stores all collected data in a local disk buffer Sends the collected data to the global repository Plug-in sensors Programs/scripts that implements a simple sensor- agent ASCII text protocol A C++ interface class is provided on top of the text protocol to facilitate implementation of new sensors The local cache Assures data is collected also when node cannot connect to network Allows for node autonomy for local repairs Transport Transport is pluggable. Two protocols over UDP and TCP are currently supported where only the latter can guarantee the delivery Measurement Repository The data is stored in a database A memory cache guarantees fast access to most recent data, which is normally what is used for fault tolerance correlations Database Repository API SOAP RPC Query history data Subscription to new data Database Proprietary flat-file database Oracle Open source interface to be developed UDP transport in production. Up to 500metrics/s delivered with 0% loss; 185metrics/s recorded at present.

CERN.ch 24 Lemon Overview Measurement Repository (MR) Monitored nodes Sensor Monitoring Sensor Agent (MSA) Cache Consumer Local Consumer Sensor Consumer Global Consumer Monitoring Sensor Agent Calls plug-in sensors to sample configured metrics Stores all collected data in a local disk buffer Sends the collected data to the global repository Plug-in sensors Programs/scripts that implements a simple sensor- agent ASCII text protocol A C++ interface class is provided on top of the text protocol to facilitate implementation of new sensors The local cache Assures data is collected also when node cannot connect to network Allows for node autonomy for local repairs Transport Transport is pluggable. Two protocols over UDP and TCP are currently supported where only the latter can guarantee the delivery Measurement Repository The data is stored in a database A memory cache guarantees fast access to most recent data, which is normally what is used for fault tolerance correlations Database Repository API SOAP RPC Query history data Subscription to new data Database Proprietary flat-file database Oracle Open source interface to be developed RDBMS repository required to enable detailed analysis of performance issue over long periods. Extensive comparison of EDG/WP4 OraMon and PVSS with Oracle back end in 1H03. OraMon selected for production use in June and has been running in production since 1st September. Records 185 metrics/s (16M metrics per day).

CERN.ch 25 Lemon Overview Measurement Repository (MR) Monitored nodes Sensor Monitoring Sensor Agent (MSA) Cache Consumer Local Consumer Sensor Consumer Global Consumer Monitoring Sensor Agent Calls plug-in sensors to sample configured metrics Stores all collected data in a local disk buffer Sends the collected data to the global repository Plug-in sensors Programs/scripts that implements a simple sensor- agent ASCII text protocol A C++ interface class is provided on top of the text protocol to facilitate implementation of new sensors The local cache Assures data is collected also when node cannot connect to network Allows for node autonomy for local repairs Transport Transport is pluggable. Two protocols over UDP and TCP are currently supported where only the latter can guarantee the delivery Measurement Repository The data is stored in a database A memory cache guarantees fast access to most recent data, which is normally what is used for fault tolerance correlations Database Repository API SOAP RPC Query history data Subscription to new data Database Proprietary flat-file database Oracle Open source interface to be developed Lemon weakness is in the display area. We have no “system administrator” displays and operators still use the SURE system. Milestone is late (systems monitor displays due 1 st October), but this has no adverse impact on services. The required repository API is now available (C, Perl, TCl & PHP), but we have chosen to concentrate effort on derived metrics which deliver service improvements (and relate to milestone , due 01/03/04).

CERN.ch 26 Lemon Summary  The sensors, MSA and OraMon repository are running in production, and running well. –Status of, e.g. lxbatch, nodes has much improved since initial sensor/MSA deployment.  We are now working on derived metrics, created by a sensor which acts upon values read from the repository and, perhaps, elsewhere. –e.g. Maximum [system load|/tmp usage|/pool usage] on batch nodes running jobs for [alice|atlas|cms|lhcb]. –Aim is to reduce independent monitoring of (and thus load on) our nodes & LSF system. Needs input from users.  Lack of displays a concern, but just a mild concern. To be addressed in 2004 after EDG wrap-up.  Again, much appreciation of solid work within EDG/WP4.

CERN.ch 27 LEAF Components  HMS Hardware Management System  SMS State Management System  Fault Tolerance

CERN.ch 28 HMS  HMS tracks systems through steps necessary for, e.g., installations & moves. –a Remedy workflow interfacing to ITCM, PRMS and CS group as necessary. –used to manage the migration of systems to the vault. –now driving installation of 250+ systems. So far, no engineer level time has been required. Aim is to keep it this way right through to production!  As for quattor, no WBS milestones given status in Spring. –Important developments now to include more detail on hardware components, especially for disk servers.

CERN.ch 29 SMS — WBS &  “Give me 200 nodes, any 200. Make them like this. By then.” For example –creation of an initial RH10 cluster –(re)allocation of CPU nodes between lxbatch & lxshare or of disk servers.  Tightly coupled to Lemon to understand current state and CDB—which SMS must update.  Analysis & Requirements documents have been discussed. –Led to architecture prototype and define of interfaces to CDB.  Work delayed a little by HMS priorities, but we still hope to see initial SMS driven state change by the end of the year (target was October).

CERN.ch 30 Fault Tolerance  We have started testing the Local Recovery Framework developed by Heidelberg within EDG/WP4.  Simple recovery action code (e.g. to clean up filesystems safely) is available.  We confidently expect initial deployment at this level by the end of the year (target was March 04).

CERN.ch 31 Fault Tolerance Long Term  Real Fault Tolerance requires actions such as reallocating a disk server to CMS “because one of theirs has failed and they are the priority users at present”.  Such a system could also –reallocate a disk server to CMS because this looks a good move given the LSF job mix. –reconfigure LSF queue parameters at night or weekend based on the queue.  Essentially, this is just a combination of derived metrics and SMS. –Concentrate on building the basic elements for now and start with simple use cases when we have the tools.

CERN.ch 32 HSM Operators Interface  This ought to be a GUI on top of an intelligent combination of quattor, Lemon & LEAF.  We want to be able to (re)allocate disk space (servers) to users/applications as necessary.  These are CDB and SMS issues. –But there are strong requirements on Castor and the stager if we want to do this. Fortunately, many of the necessary stager changes are planned for next year.  Initial development (milestone ) has started with extension of CDB to cover definition of, e.g., staging pools.

CERN.ch 33 Fabric Infrastructure Summary  The Building Fabric will be ready for start of production farm installation in January –But there are concerns about a potential open ended increase of power demand.  CPU & disk server purchase complex –Major risk is poor quality hardware and/or lack of adequate support from vendors.  Computing Fabric automation is well advanced. –Installation and configuration tools are in place. –The essentials of the monitoring system, sensors and the central repository, are also in place. Displays will come. More important is to encourage users to query our repository and not each individual node. –LEAF is starting to show real benefits in terms of reduced human intervention for hardware moves.