Download presentation
Presentation is loading. Please wait.
Published bySpencer Bryan Modified over 9 years ago
1
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015
2
8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration
3
8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration
4
8 December 2015Thorsten Kleinwort IT/PDP/IS Function CERN IT/PDP-IS responsible for: Central Unix based batch & interactive platforms: LXPLUS, LXBATCH, RSPLUS, DXPLUS, HPPLUS Installation, maintenance & support Dedicated clusters for several experiments (batch & interactive): Different setups, different HW, user mgmt… Individual configurations
5
8 December 2015Thorsten Kleinwort IT/PDP/IS Function
6
8 December 2015Thorsten Kleinwort IT/PDP/IS Function LEP Experiments: ‘Old’ Experiments,all kind of legacy platforms: leave until 2003, freezing earlier not practical Non-LEP Experiments: Transition to Linux/Solaris ASAP Merge experiment clusters into LXBATCH/LXPLUS: Reduce diversity More efficient use of shared resources
7
8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration
8
8 December 2015Thorsten Kleinwort IT/PDP/IS Software In the past: All Unix flavours Now: Mainly Linux (RedHat) Solaris as 2 nd platform: Check software for platform dependencies Enhanced debugging/development tools on Solaris AFS for software/homedir/scratch Started recently to investigate OpenAFS RFIO for data access: we want to avoid NFS
9
8 December 2015Thorsten Kleinwort IT/PDP/IS Software: Installation Kickstart & Jumpstart (Linux & Solaris): For basic system installation SUE: For post installation & configuration ASIS: For software installation in /usr/local: now whole ASIS (~3GB) is local LSF
10
8 December 2015Thorsten Kleinwort IT/PDP/IS Software: Batch LSF with Multicluster option: Interactive nodes: submission hosts (cluster) Batch nodes: execution hosts (cluster) Some interactive nodes have night/weekend queues On public cluster (LXBATCH): Dedicated resources for experiments Some clusters are “cross linked”, e.g. submission from a dedicated cluster to LXBATCH Open question of scalability
11
8 December 2015Thorsten Kleinwort IT/PDP/IS Software: LSF Multicluster Submit Cluster: Execution Cluster: LXPLUSLXBATCH Queue:1nd cms_1nd CMS_CLUSTERCMS_BATCH cms_queue
12
8 December 2015Thorsten Kleinwort IT/PDP/IS Software: Batch Shared batch facility requirements: If dedicated resource is unused, it should be available for others On the other hand, allocation of dedicated nodes ASAP, if needed Queues/Resources should be controlled by UNIX groups rather than users to handle huge number and frequently changing users “Wish list” for LSF in preparation, to send to Platform Computing
13
8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration
14
8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware All kind of legacy HW in clusters: IBM, SGI, DEC, HP… Now concentrating on Intel PC running Linux (on both client & server side) Sun (Solaris) as 2 nd HW platform: Building development cluster SUNDEV RISC decommissioning in progress
15
8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware: RISC Decommissioning
16
8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware: Intel PC Still utilize boxes: Financial rules & difficult TCO definition for rack mounted solutions But plans to go to rack-mounted solutions in the future Intel PCs: differences on each offer: (1 or 2 disks; 2,4,8,12,20,30 GB) Experiments buying equipment: Broadens diversity
17
8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware
18
8 December 2015Thorsten Kleinwort IT/PDP/IS Hardware On the server/service side: Going from RISC/SCSI to Intel/EIDE: Mirrored 1.5TB 20x75GB EIDE disks servers Testing RAID 5 All Tape Services are now on PCs AFS servers are now on SUNs: Experimenting with AFS scratch on Linux
19
8 December 2015Thorsten Kleinwort IT/PDP/IS Cluster Configuration Update and LSF Status FunctionSoftware Hardware Management Cluster Configuration
20
8 December 2015Thorsten Kleinwort IT/PDP/IS Management Currently: Merging clusters into LXPLUS/LXBATCH Aligning individual setups into global ones Continue RISC decommissioning: Restrict usage to LEP Experiments Transferring users to public facilities Face rapidly growing number of clients Automate & optimise
21
8 December 2015Thorsten Kleinwort IT/PDP/IS Management Starting Testbed (Intel/Linux Dual PCs) In 2000 ~ 100 machines In 2001 ~ 200 machines In addition: LHC Test facility Testbed for the DataGrid Project It will grow over the next two years to reach a significant fraction of the LHC scale by 2003
22
8 December 2015Thorsten Kleinwort IT/PDP/IS Testbed Schedule
23
8 December 2015Thorsten Kleinwort IT/PDP/IS Management Collaboration with DataGrid: WP4 (Computing Fabric): Installation Task Configuration Task Monitoring Task We contribute to WP4 and want to benefit from it Talk by Philippe Defert on DataGrid
24
8 December 2015Thorsten Kleinwort IT/PDP/IS Management New internal projects started: User account management: “How to manage /etc/passwd, /etc/groups,…” Investigate central service (LDAP) Accounting: How to control access & usage of shared facilities by different groups Security: Increase the host based security by checking the integrity of the system
25
8 December 2015Thorsten Kleinwort IT/PDP/IS Outlook Reducing diversity of HW/SW Continue merging of clusters Facing growing number of PCs Starting internal projects Benefit from DataGrid WP4 Going for LHC: prepare now to be ready when it starts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.