Download presentation
Presentation is loading. Please wait.
Published byTiffany Patrick Modified over 9 years ago
1
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP 2003 24.03.2003
2
3/24/2003Thorsten Kleinwort CERN-IT 2 Overview The Linux Clusters at the CERN CC Recent achievements to improve manageability Installation Configuration Monitoring Collaboration with EDG (WP4) Maintenance of the clusters The batch system LSF Steps towards LHC Computing References
3
3/24/2003Thorsten Kleinwort CERN-IT 3 Introduction The computing facilities in the CERN Computer Center: Decommissioned non Linux platforms, apart from some Suns Merged private clusters into two big, shared clusters: LXPLUS for interactive use (~80 nodes) LXBATCH as batch farm (~700 nodes) All commodity hardware (towers, Dual CPU), but divers (CPU speed, disk sizes and #, memory,…) The current OS is RedHat Linux, we are in the transition from 6.1 to 7.3, around 70% is done
4
3/24/2003Thorsten Kleinwort CERN-IT 4 The CERN Computer Center:
5
3/24/2003Thorsten Kleinwort CERN-IT 5 Recent achievements… Moving from RedHat 6.1 to 7.3 Revised and rewrote existing installation and maintenance tools, because the requirements have changed: Focusing on Linux Using well established tools/protocols/languages (RPM, HTTP, XML,…) Standard adherence (LSB, init scripts,…) Separated installation & configuration: Identified all parts of installation Identified all sources of configuration information
6
3/24/2003Thorsten Kleinwort CERN-IT 6 Installation The system is installed with kickstart The installation is completely automatic Software installation: RPM RPM is the tool of choice: Allows easy install/update/uninstall Version control Additional software in RPMs, ours as well as software from others (e.g. CASTOR, EDG, LCG…) (Post-) Installation split up in components: One RPM per component for installation Configuration is done per component as well
7
3/24/2003Thorsten Kleinwort CERN-IT 7 Configuration Configuration of the system: We enhanced SUE with a configuration interface Identified all sources of configuration information: First step: Make this information available through one interface (CCConfig) Next step: Work on the unification and merging of the different data sources behind it (ongoing)
8
3/24/2003Thorsten Kleinwort CERN-IT 8 Configuration II Using the EDG WP4 configuration tool: Pan & CDB (Configuration Data Base) for describing hosts: Pan is a very flexible language for describing host configuration information: Expressed in templates (ASCII) Allows includes (inheritance) Pan is compiled into XML, inside CDB XML is downloaded and the information provided by CCConfig, which is the high level API
9
3/24/2003Thorsten Kleinwort CERN-IT 9 Monitoring Adoption of EDG WP4 monitoring: Has replaced old self made & grown alarm scripts Still relying on old Alarm system (SURE): Will be replaced, either by the WP4 tool or by a commercial tool (PVSS) The monitoring information is stored in database: With a user-API for queries Eliminate the need for client access
10
3/24/2003Thorsten Kleinwort CERN-IT 10 Maintenance Machines must be ‘updatable’: Updating a machine must lead to the same result as a new install Rpmupdate: Based on RPMT, a transactional RPM which allows updates, installs, and uninstalls at the same time Will be superseded by the EDG WP4 tool: SPMA Notification mechanism: No automatic/periodic upgrade Change mechanism triggered to run on the nodes
11
3/24/2003Thorsten Kleinwort CERN-IT 11 The batch system (LSF) Current version: LSF 4.2 LSF 5.1 is evaluated at the moment No multi-clusters any more We introduced Fairshare, for a better utilization of the unused capacities: Experiments have guaranteed shares of the batch capacity If unused, they can be used by others No more available, but unusable resources We oversubscribe our hosts (3 jobs per dual CPU) Close collaboration with the provider, Platform, they benefit from our big farm, we benefit from their help and will to implement our requirements
12
3/24/2003Thorsten Kleinwort CERN-IT 12 Other improvements Secure installations: Each node has its own GPG key pair to exchange secure information: eg. for SSH keys, (encrypted) root password Intervention rundown: Allow a scheduled reboot on batch nodes, when they have finished batch jobs, e.g. for new kernel or other software installs Server Cluster: Serves the RPMs, the configuration information, etc. Several machines, selected by ‘dynamic DNS aliases’
13
3/24/2003Thorsten Kleinwort CERN-IT 13 Going to Grid Computing Merging EDG/VDT middleware into a large scale production farm Enlarging our batch capacity by 400 nodes in April Early contribution to LCG 1 by this summer LXBATCH fully integrated by Q4/2003 Close collaboration with EDG (WP4) and LCG will continue
14
3/24/2003Thorsten Kleinwort CERN-IT 14 Conclusions Redone the Linux installation for RH 7.3 Clearer concepts, new tools Streamlined it with EDG WP4 tools Continuous collaboration with EDG WP4 and LCG Facing and implementing the needs for the Grid Computing
15
3/24/2003Thorsten Kleinwort CERN-IT 15 References CERN-IT/FIO: http://it-div-fio.web.cern.ch/it-div-fio/ http://it-div-fio.web.cern.ch/it-div-fio/ EDG: http://eu-datagrid.web.cern.ch/eu-datagrid/ WP4: http://hep-proj-grid-fabric.web.cern.ch/hep- proj-grid-fabric/http://eu-datagrid.web.cern.ch/eu-datagrid/ LCG: http://lcg.web.cern.ch/LCG/http://lcg.web.cern.ch/LCG/ SUE: http://proj-sue.web.cern.ch/proj-sue/http://proj-sue.web.cern.ch/proj-sue/ LCFG: http://www.lcfg.org/http://www.lcfg.org/ LSF (Platform Computing): http://www.platform.com/http://www.platform.com/ PVSS:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.