Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

Slides:



Advertisements
Similar presentations
The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.
Advertisements

“Managing a farm without user jobs would be easier” Clusters and Users at CERN Tim Smith CERN/IT.
LAL Site Report Michel Jouvin LAL / IN2P3
Understand Virtualized Clients Windows Operating System Fundamentals LESSON 2.4.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
BMC Control-M Architecture By Shaikh Ilyas
Automating Linux Installations at CERN G. Cancio, L. Cons, P. Defert, M. Olive, I. Reguero, C. Rossi IT/PDP, CERN presented by G. Cancio.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
HEPiX Orsay 27 th April 2001 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 27 th April 2001 HEPiX 2001, Orsay.
UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group.
Managing Mature White Box Clusters at CERN LCW: Practical Experience Tim Smith CERN/IT.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
19-May-2003 Solaris service: Status and plans at CERN Ignacio Reguero IT / Product Support / Unix Infrastructure Presented by Manuel Guijarro.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
F. Rademakers - CERN/EPLinux Certification - FOCUS Linux Certification Fons Rademakers.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Software repository replication using the ASIS Local Copy Manager IT/DIS/OSE, CERN ASIS Team Presented by: German Cancio
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
4-8 th October 1999CERN Site Report, HEPiX SLAC. A.Silverman CERN Site Report HEPNT/HEPiX October 1999 SLAC Alan Silverman CERN/IT/DIS.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.
HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
26/4/2001LAL Site Report - HEPix - LAL 2001 LAL Site Report HEPix – LAL Apr Michel Jouvin
RAL Site report John Gordon ITD October 1999
Public Batch and Interactive Services on Linux FOCUS — July 1 st 1999 Tony Cass —
HepNT - January 15, 1997 : PCSF Frederic Hemmer IT/PDP 1 PCSF - A Pentium ® /Windows NT ® Based simulation farm Frederic Hemmer CERN IT/PDP.
14 th April 1999CERN Site Report, HEPiX RAL. A.Silverman CERN Site Report HEPiX April 1999 RAL Alan Silverman CERN/IT/DIS.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
November 10, 1999PHENIX CC-J Updates in Nov.991 PHENIX CC-J Updates in Nov New Hardware - N.Hayashi / RIKEN November 10, 1999 PHENIX Computing Meeting.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
LAL Site Report Michel Jouvin LAL / IN2P3
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Lisa Giacchetti AFS: What is everyone doing? LISA GIACCHETTI Operating Systems Support.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
1/11/2000LAL Site Report - HEPix - JLab 2000 LAL Site Report HEPix – Jlab Nov Michel Jouvin
1 july 99 Minimising RISC  General strategy - converge on PCs with Linux & NT to avoid wasting manpower in support teams and.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Dave Newbold, University of Bristol14/8/2001 Testbed 1 What is it? First deployment of DataGrid middleware tools The place where we find out if it all.
Virtual Server Server Self Service Center (S3C) JI July.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
UK GridPP Tier-1/A Centre at CLRC
UK Testbed Status Testbed 0 GridPP project Experiments’ tests started
PES Lessons learned from large scale LSF scalability tests
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
Presentation transcript:

Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015

8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration

8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration

8 December 2015Thorsten Kleinwort IT/PDP/IS Function CERN IT/PDP-IS responsible for: Central Unix based batch & interactive platforms: LXPLUS, LXBATCH, RSPLUS, DXPLUS, HPPLUS Installation, maintenance & support Dedicated clusters for several experiments (batch & interactive): Different setups, different HW, user mgmt… Individual configurations

8 December 2015Thorsten Kleinwort IT/PDP/IS Function

8 December 2015Thorsten Kleinwort IT/PDP/IS Function LEP Experiments: ‘Old’ Experiments,all kind of legacy platforms: leave until 2003, freezing earlier not practical Non-LEP Experiments: Transition to Linux/Solaris ASAP Merge experiment clusters into LXBATCH/LXPLUS: Reduce diversity More efficient use of shared resources

8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration

8 December 2015Thorsten Kleinwort IT/PDP/IS Software In the past: All Unix flavours Now: Mainly Linux (RedHat) Solaris as 2 nd platform: Check software for platform dependencies Enhanced debugging/development tools on Solaris AFS for software/homedir/scratch Started recently to investigate OpenAFS RFIO for data access: we want to avoid NFS

8 December 2015Thorsten Kleinwort IT/PDP/IS Software: Installation Kickstart & Jumpstart (Linux & Solaris): For basic system installation SUE: For post installation & configuration ASIS: For software installation in /usr/local: now whole ASIS (~3GB) is local LSF

8 December 2015Thorsten Kleinwort IT/PDP/IS Software: Batch LSF with Multicluster option: Interactive nodes: submission hosts (cluster) Batch nodes: execution hosts (cluster) Some interactive nodes have night/weekend queues On public cluster (LXBATCH): Dedicated resources for experiments Some clusters are “cross linked”, e.g. submission from a dedicated cluster to LXBATCH Open question of scalability

8 December 2015Thorsten Kleinwort IT/PDP/IS Software: LSF Multicluster Submit Cluster: Execution Cluster: LXPLUSLXBATCH Queue:1nd cms_1nd CMS_CLUSTERCMS_BATCH cms_queue

8 December 2015Thorsten Kleinwort IT/PDP/IS Software: Batch Shared batch facility requirements: If dedicated resource is unused, it should be available for others On the other hand, allocation of dedicated nodes ASAP, if needed Queues/Resources should be controlled by UNIX groups rather than users to handle huge number and frequently changing users “Wish list” for LSF in preparation, to send to Platform Computing

8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration

8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware All kind of legacy HW in clusters: IBM, SGI, DEC, HP… Now concentrating on Intel PC running Linux (on both client & server side) Sun (Solaris) as 2 nd HW platform: Building development cluster SUNDEV RISC decommissioning in progress

8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware: RISC Decommissioning

8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware: Intel PC Still utilize boxes: Financial rules & difficult TCO definition for rack mounted solutions But plans to go to rack-mounted solutions in the future Intel PCs: differences on each offer: (1 or 2 disks; 2,4,8,12,20,30 GB) Experiments buying equipment: Broadens diversity

8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware

8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware On the server/service side: Going from RISC/SCSI to Intel/EIDE: Mirrored 1.5TB 20x75GB EIDE disks servers Testing RAID 5 All Tape Services are now on PCs AFS servers are now on SUNs: Experimenting with AFS scratch on Linux

8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration

8 December 2015Thorsten Kleinwort IT/PDP/IS Management Currently: Merging clusters into LXPLUS/LXBATCH Aligning individual setups into global ones Continue RISC decommissioning: Restrict usage to LEP Experiments Transferring users to public facilities Face rapidly growing number of clients Automate & optimise

8 December 2015Thorsten Kleinwort IT/PDP/IS Management Starting Testbed (Intel/Linux Dual PCs) In 2000 ~ 100 machines In 2001 ~ 200 machines In addition: LHC Test facility Testbed for the DataGrid Project It will grow over the next two years to reach a significant fraction of the LHC scale by 2003

8 December 2015Thorsten Kleinwort IT/PDP/IS Testbed Schedule

8 December 2015Thorsten Kleinwort IT/PDP/IS Management Collaboration with DataGrid: WP4 (Computing Fabric): Installation Task Configuration Task Monitoring Task We contribute to WP4 and want to benefit from it Talk by Philippe Defert on DataGrid

8 December 2015Thorsten Kleinwort IT/PDP/IS Management New internal projects started: User account management: “How to manage /etc/passwd, /etc/groups,…” Investigate central service (LDAP) Accounting: How to control access & usage of shared facilities by different groups Security: Increase the host based security by checking the integrity of the system

8 December 2015Thorsten Kleinwort IT/PDP/IS Outlook Reducing diversity of HW/SW Continue merging of clusters Facing growing number of PCs Starting internal projects Benefit from DataGrid WP4 Going for LHC: prepare now to be ready when it starts