24-28 May 2004HEPiX Spring Meeting Edinburgh1 Overview of Solaris issues at CERN By Ignacio Reguero. Presented by Manuel Guijarro CERN IT-PS-UI.

Slides:



Advertisements
Similar presentations
GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Advertisements

26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
LAL Site Report Michel Jouvin LAL / IN2P3
SUS Feature Pack for SMS Michel Jouvin LAL / IN2P3
21 October 2003CERN IT-PS-UI Solaris status and plans1 Solaris status and plans HEPIX Autumn 2003 Ignacio Reguero, Michel Manent, Carlos Ungil presented.
Welcome to Middleware Joseph Amrithraj
Database System Concepts and Architecture
WSUS Presented by: Nada Abdullah Ahmed.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
Highly Available Central Services An Intelligent Router Approach Thomas Finnern Thorsten Witt DESY/IT.
ASIS et le projet EU DataGrid (EDG) Germán Cancio IT/FIO.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 10: Server Administration.
Web hosting services at CERN Alex Lossent – CERN IT/IS Hepix Fall 2005.
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
VMware vCenter Server Module 4.
Understanding and Managing WebSphere V5
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>

Oracle Application Server 10g (9.0.4) Recommended Topologies Pavana Jain.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
Module 2 Part I Introduction To Windows Operating Systems Intro & History Introduction To Windows Operating Systems Intro & History.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
LAL Site Report Michel Jouvin LAL / IN2P3
EDG WP4: installation task LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO
SMS 2003 Deployment and Managing Windows Security Rafal Otto Internet Services Group Department of Information Technology CERN 26 May 2016.
13 th May 2004LINUX, which LINUX?1 Presentation to the AB/CO Technical Committee – Linux as the Future Console O/S Alastair Bland, 13 th May 2004.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
19-May-2003 Solaris service: Status and plans at CERN Ignacio Reguero IT / Product Support / Unix Infrastructure Presented by Manuel Guijarro.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
NiceFC and CMF Introduction Ivan Deloose IT-IS Custom Windows Services for Controls Applications.
F. Rademakers - CERN/EPLinux Certification - FOCUS Linux Certification Fons Rademakers.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
LAL Site Report Michel Jouvin LAL / IN2P3
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
G. Cancio, L. Cons, Ph. Defert - n°1 October 2002 Software Packages Management System for the EU DataGrid G. Cancio Melia, L. Cons, Ph. Defert. CERN/IT.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Update on Windows 7 at CERN & Remote Desktop.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
Module 2 Part I Introduction To Windows Operating Systems Intro & History Introduction To Windows Operating Systems Intro & History.
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
Module 9 User Profiles and Social Networking. Module Overview Configuring User Profiles Implementing SharePoint 2010 Social Networking Features.
HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
ASIS + RPM: ASISwsmp German Cancio, Lionel Cons, Philippe Defert, Andras Nagy CERN/IT Presented by Alan Lovell.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.

Lemon Computer Monitoring at CERN Miroslav Siket, German Cancio, David Front, Maciej Stepniewski Presented by Harry Renshall CERN-IT/FIO-FS.
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
CERN IT Department CH-1211 Genève 23 Switzerland M.Schröder, Hepix Vancouver 2011 OCS Inventory at CERN Matthias Schröder (IT-OIS)
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
System Monitoring with Lemon
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
WP4-install status update
Status and plans of central CERN Linux facilities
German Cancio CERN IT .quattro architecture German Cancio CERN IT.
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Presentation transcript:

24-28 May 2004HEPiX Spring Meeting Edinburgh1 Overview of Solaris issues at CERN By Ignacio Reguero. Presented by Manuel Guijarro CERN IT-PS-UI

24-28 May 2004HEPiX Spring Meeting Edinburgh2 Agenda Solaris 9 Certification –Compilers –Open Software Quattor (EDG WP4) deployment Sun Blade Server tests –N1 Management LEMON Monitoring on Solaris Long term plans for Solaris Support

24-28 May 2004HEPiX Spring Meeting Edinburgh3 Solaris 9 Certification (1) Certification: Formal test of all software used at CERN on the new system –In cooperation with software owners –Using Refsol9 reference machine Not much visible OS change –We have the OS running since about one year However replacement of ASIS and SUE legacy environment with Quattor –Completely replacing system management framework –Large undertaking The certification was foreseen for beginning 2004 –Delay due to underestimation of the amount of work required to implement Solaris specific packages and components It will be launched on 1 st June

24-28 May 2004HEPiX Spring Meeting Edinburgh4 Solaris 9 Certification (2): Compilers Proposed to forum-solaris-certification list GCC –default –As gcc-alt –There is some question on whether we would like to have as default rather than Sun Compilers –Default Sun ONE Studio 8 (Sun C++ 5.5) –Other alternative Sun compilers available in AFS Sun ONE Studio 7 (Sun C++ 5.4) Sun WorkShop 6 update 2 (Sun C++ 5.3) Sun WorkShop 6 update 1 (Sun C++ 5.2) (default in Solaris 8)

24-28 May 2004HEPiX Spring Meeting Edinburgh5 Solaris 9 Certification (3): Open Software As more Open Source software distributed and supported by Sun: Perl, Bash,… Proposing change of Open Software policy –If possible, use software from Sun Setting compatibility links in /usr/local –Only if special requirements exist make software ourselves. For instance DBI and other Perl modules Version requirements, such as GCC Mozilla 1.6 browser (instead of Netscape recomm. by Sun) –We will take into account compatibility with Linux when possible Though this is a moving target

24-28 May 2004HEPiX Spring Meeting Edinburgh6 Quattor deployment on Solaris (1) Quattor is the Fabric Management toolkit –Already manages over 2000 nodes of a Linux farm in the CERN Computing Centre Sun funded a visitor at CERN to implement Quattor on Solaris –Using Solaris packages rather than RPMs –This work has been presented to Sun, HPC Consortium, SC2003 We plan to use Quattor to manage all Solaris systems from Solaris 9 onwards –Including desktop systems Behaviour with unmanaged software

24-28 May 2004HEPiX Spring Meeting Edinburgh7 Quattor deployment on Solaris (2): Summary for System administrators –Central Configuration DataBase (CDB) Stores all configuration information as well as Software packages to be installed –Both applications and system A cache manager provided for the client accessing the DB –Allows disconnected operation –To avoid dependency on the DB server or on the network The configuration database is linked to the network installation server –The Jumpstart profile is to be generated from the database

24-28 May 2004HEPiX Spring Meeting Edinburgh8 Quattor deployment on Solaris (3): Summary for System administrators –Node Configuration Manager (NCM) For configuration components They have single action: configure and unconfigure They access Configuration DB through the cache manager –SPMA software distributor (package level) Replaces ASIS software distribution (file level) For Linux it uses RPMs, for Solaris implemented with Solaris PKG Allows to install packages from various SW repositories Several protocols supported: HTTP, file system (AFS), FTP, etc.

24-28 May 2004HEPiX Spring Meeting Edinburgh9 Quattor (EDG WP4) deployment on Solaris (4): Summary for System administrators CDB Host REPOSITORY Host xm l pa n PKG NCM target.cf SPMA

24-28 May 2004HEPiX Spring Meeting Edinburgh10 Quattor deployment on Solaris (5): Ongoing work Implementation of Solaris NCM Components from existing SUE features –First priority for server configurations in the computing centre Mostly done Except LSF –Validating the whole lot Graphical User interface –For delegation –In machines outside of the computing centre –We had a proof of concept prototype –Now working on a more general interface In close cooperation with Quattor project as it touches CDB Access Control

24-28 May 2004HEPiX Spring Meeting Edinburgh11 Sun Blade Server tests (1) Sun Blade server 1600 –Packaged farm –Fits in 3 units of a 19 rack –SSC Controller with gigabit switch that manages up to 16 CPUs Several Gigabit Ethernet external connections VLAN with 16 Gigabit Ethernet Interface Protection attack by Packet Filter configuration Console through Serial Port for each Blade 12 X 650MHz UltraSPARC-IIe 4 Intel Compatible CPUs –AMD Athlon XP-M 1.2GHz Other Specialized Blades supported on hardware level –SSL Encryptor –Load Balancer

24-28 May 2004HEPiX Spring Meeting Edinburgh12 Sun Blade Server 1600

24-28 May 2004HEPiX Spring Meeting Edinburgh13 Sun Blade Server 1600 system chassis SSC0 (active) SSC1 (standby) Switch Fabric External Switch x.x (ce0) x.x (ce1) x.x (ce0) x.x (ce1) Slot 0……s15 Blades 0…….15

24-28 May 2004HEPiX Spring Meeting Edinburgh14 Sun Blade Server tests (2) Implemented fully automated network installation (DHCP) using Jumpstart from SUNINST0 Nodes have been used for development of Quattor on Solaris 9 But main interest is to test N1 management

24-28 May 2004HEPiX Spring Meeting Edinburgh15 N1 Management (1) Sun N1 Provisioning server 3.0 Blade Edition –Automates configuration and deployment of different kinds of blades using system images Assignment may vary according to a schedule or other input – dynamic management of clusters –Images can also be used to deliver applications and data Our interest to compare N1 with EDG WP4 Quattor functionality Question: could N1 manage heterogeneous farms outside the Blade server scope?

24-28 May 2004HEPiX Spring Meeting Edinburgh16 N1 Management (2) Also case study proposed by SAS(DB) group Test case proposed: To use BS1600 machines for Oracle Application Server –To use the new version of OAS that will facilitate dynamic allocation of nodes

24-28 May 2004HEPiX Spring Meeting Edinburgh17 N1 Management (3) Long list of technical problems found –Had to install Upgrade 1 of the Sun N1 Provisioning server 3.0 Blade Edition to avoid bugs of the initial section –Requires at least one dedicated server not foreseen Control Plane Server + Image Server External to the blade server –We have to go back to Solaris 8 –Local Oracle 8 installation required When public Oracle DB service exists Running Oracle 9 –Precise model of Gigabit Ethernet card required For VLAN support Gigabit interface of V210 and V240 not supported Had to acquire Syskonnect Gigabit card –Precise network Switch models required Fortunately not required for single SB1600 configuration

24-28 May 2004HEPiX Spring Meeting Edinburgh18 N1 Management (4) More technical problems –The N1 installation hang after updated the Database parameters… Resource layers subnets and VLANs, Control Center Application server, Blade system chassis Normally, this install should take around 1 hour and half Sun later told us that N1 installation has to be done by Sun Professional Services –You are supposed to pay –A large part of the documentation seems to be internal to Sun only

24-28 May 2004HEPiX Spring Meeting Edinburgh19 N1 Management (5) Conclusion for Sun N1 Provisioning server 3.0 Blade Edition –The product is complex and not well finished –You are only supposed to use it through the Sun Professional Services organisation –System images for system management are only interesting if you have a large number of identical nodes (we do not) The Jumpstart model fits better our needs as HW/SW differences are solved by the Sun Installer –Confirmed that no support for nodes outside of Blade Server is foreseen Agreed with our DB colleagues that we are not interested in this product

24-28 May 2004HEPiX Spring Meeting Edinburgh20 N1 Management (6) On the other hand attended demo of Sun N1 Service Provisioning System 4.1 Totally different product –Looks good technically –Has higher level with a scope similar to Quattor –Nice GUI –Supports several types of package objects including user defined –Supports RH Linux and Windows as well as Solaris DB and us agreed on the interest of having a look, at least to compare to Quattor But Sociological Problems –Sun tries to sell it with per node fees with a Professional Services model –Sun has not been able/willing to give us access to the product for over two months

24-28 May 2004HEPiX Spring Meeting Edinburgh21 N1 Management (7) Currently Studying the implementation of the Oracle Application Server with Quattor packages and components –If we get the Sun product we will compare it with Quattor –Otherwise we will go ahead with Quattor

24-28 May 2004HEPiX Spring Meeting Edinburgh22 LEMON Monitoring on Solaris (1) Migrating from UIMON to Lemon MSA –Work done by Piotr Kolet (Fellow in IT/PS/UI) To align with IT/FIO developments –Use the Computing Centre Infrastructure –Achieve Linux and Solaris data integration Have to implement missing parts –Recovery Action –Solaris Specific metrics Targeting production by this summer

24-28 May 2004HEPiX Spring Meeting Edinburgh23 LEMON Monitoring on Solaris (2) Porting MSA to Solaris – already done Porting internal sensor – several bugs fixed Porting Linux metrics to Solaris – routines with strong OS dependencies –Several metrics have to be still rewritten or fixed Already sending data to central Oracle repository (metrics numbers and names have to be the same for all platforms) Results can be viewed on Lemon Status Page (

24-28 May 2004HEPiX Spring Meeting Edinburgh24 LEMON Monitoring on Solaris (3) Recovery actions framework (to be done) –Part of CmDaemon framework Subset of UIMON features need to be implemented –Notification granularity –Active and monitoring time customizing –Smart Recovery Action launch (specific number of times, execution timeout, avoiding concurrently running) Recovery decision made based on CMDaemon correlation unit

24-28 May 2004HEPiX Spring Meeting Edinburgh25 Long term plans for Solaris Support (1) Up to now second platform for LHC physics –For validation purposes only –SUNDEV facility for physics development Total population of 663 Active nodes –Data from network database

24-28 May 2004HEPiX Spring Meeting Edinburgh26 Long term plans for Solaris Support (2) Current Main Users –Accelerator Sector (including LHC construction) (60+ nodes) –AS (AIS) + Oracle DB servers in general (70+ nodes) –CMS (150+ nodes) –AFS (60+ nodes) –CAE + PH/MIC (Electronics development) (130+nodes) –Network monitoring (Spectrum SW (Nick Trikoupis)) (4 nodes) –Remedy (2 nodes) –EST Survey Group (8 nodes) –(Old) Mail Servers + Listbox (4 nodes) –SUNDEV (Physics) (10 nodes) –SUNPARC (Engineering)(8 nodes) –LICMAN (License Servers) (8 nodes)

24-28 May 2004HEPiX Spring Meeting Edinburgh27 Long term plans for Solaris Support (3) Critical services, including most DB servers and electronics design being run on Solaris However, most physics done on Linux PCs –And it seems that interest of the physics community in Solaris is diminishing Problems with C++ support No interesting Sun desktops Uncertain future of the company –The fashionable platform is Apple MAC Nice Laptops In the IT POW action item on SUNDEV Reduction –Ongoing discussion with Les Robertson (LCG)

24-28 May 2004HEPiX Spring Meeting Edinburgh28 Long term plans for Solaris Support (4) A likely scenario coming out of the discussion would be A downsizing of SUNDEV –Using a reduced number of nodes or smaller nodes –Recycling the current nodes for DB serving A downsizing of Solaris Support –By defining a Service Level Agreement with a more precise scope –For instance Only support installation server and Quattor automated management Regular calls to be handled exclusively by the desktop contract and/or directly by Sun –In order to free one FTE

24-28 May 2004HEPiX Spring Meeting Edinburgh29 SUNDEV

24-28 May 2004HEPiX Spring Meeting Edinburgh30 Questions? Unix Infrastructure section: