Hostnames used in CERN IT data centres AI forum 9 th of January 2014 Procurement team IT CF/FPP.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
Naming, Addressing, & Discovery
Test process essentials Riitta Viitamäki,
Using the Self Service BMC Helpdesk
Chapter 19: Network Management Business Data Communications, 4e.
(NHA) The Laboratory of Computer Communication and Networking Network Host Analyzer.
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Configuration Management IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Chapter 3.1 Teams and Processes. 2 Programming Teams In the 1980s programmers developed the whole game (and did the art and sounds too!) Now programmers.
Week:#14 Windows Recovery
Configuration Management IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong.
Database Design Concepts Info 1408 Lecture 2 An Introduction to Data Storage.
Troy Eversen | 19 May 2015 Data Integrity Workshop.
Understanding Networks Charles Zangla. Network Models Before I can explain how connections are made from across the country, I would like to provide you.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Software Dependability CIS 376 Bruce R. Maxim UM-Dearborn.
Batch Import/Export/Restore/Archive
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Zhechka Toteva IT/DI-SM
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Connecting to Network. ♦ Overview ► A network connection is required to communicate with other computers when they are in a network. Network interface.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
COMP1321 Digital Infrastructure Richard Henson February 2014.
Configuration Management (CM)
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
Module 2: Installing and Maintaining ISA Server. Overview Installing ISA Server 2004 Choosing ISA Server Clients Installing and Configuring Firewall Clients.
Moodle (Course Management Systems). Managing Your class In this Lecture, we’ll cover course management, including understanding and using roles, arranging.
Lecture 8 – Cookies & Sessions SFDV3011 – Advanced Web Development 1.
Experience with procuring, deploying and maintaining hardware at remote co-location centre CHEP’13 14 th October 2013 Afroditi XAFI, Alain GENTIT, Anthony.
Presentation on Preventive Maintenance
Interrupts By Ryan Morris. Overview ● I/O Paradigm ● Synchronization ● Polling ● Control and Status Registers ● Interrupt Driven I/O ● Importance of Interrupts.
Databases. What is a database?  A database is used to store data. The word DATA is actually Latin for FACTS. A database is, therefore, a place, or thing.
01/13/051 Cheap, Easy Virtual Hosts for Web-Based Services Richard L. Goerwitz III.
Agilent Technologies Copyright 1999 H7211A+221 v Capture Filters, Logging, and Subnets: Module Objectives Create capture filters that control whether.
CERN.ch 1 Issues  Hardware Management –Where are my boxes? and what are they?  Hardware Failure –#boxes  MTBF + Manual Intervention = Problem!
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
Database Management Systems (DBMS)
Name Resolution in Network Management A Quick Look at the Aspects of Naming and Name Resolution.
Cluster Consistency Monitor. Why use a cluster consistency monitoring tool? A Cluster is by definition a setup of configurations to maintain the operation.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Automatic server registration and burn-in framework HEPIX’13 28.
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
CERN IT Department CH-1211 Genève 23 Switzerland PES 1 Ermis service for DNS Load Balancer configuration HEPiX Fall 2014 Aris Angelogiannopoulos,
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
ITM © Port,Kazman 1 ITM 352 Cookies. ITM © Port,Kazman 2 Problem… r How do you identify a particular user when they visit your site (or any.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Hardware failures Wayne Salter on behalf of Olof B ärring.
1 Work Orders. 2 Generating a Work Order There are two methods to generating a Work Order in the WYNNE STSTEM. First method: Option 11 – 12 – 13 * Open.
CSCI 6962: Server-side Design and Programming Shopping Carts and Databases.
Software. Because databases can get very big, it is important to decide exactly what is going to be stored in each field. Fields can be text, number,
FACTORS AFFECTING THE EFFICIENCY OF DATA PROCESSING SYSTEMS.
Software Test Plan Why do you need a test plan? –Provides a road map –Provides a feasibility check of: Resources/Cost Schedule Goal What is a test plan?
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Starter To complement our notes and learning from last lesson (Topic 10 Introducing Large ICT Systems: Features of Large ICT Systems), fold your piece.
This was written with the assumption that workbooks would be added. Even if these are not introduced until later, the same basic ideas apply Hopefully.
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Seminar Using the Cisco Technical Support Website.
COMP1321 Digital Infrastructure Richard Henson March 2016.
LubeWatch® Oil Analysis Program
Self Healing and Dynamic Construction Framework:
Introduction to Computers
Oracle Solaris Zones Study Purpose Only
Systems Design Chapter 6.
MAX Warranty Tracking Vince Stefanetti, Exact MAX Americas Trainer.
Training Module Introduction to the TB9100/P25 CG/P25 TAG Customer Service Software (CSS) Describes Release 3.95 for Trunked TB9100 and P25 TAG Release.
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
Chapter 11: Printers IT Essentials v6.0 Chapter 11: Printers
Database management systems
Presentation transcript:

Hostnames used in CERN IT data centres AI forum 9 th of January 2014 Procurement team IT CF/FPP

Outline Motivation Lessons from past –Quattor  Puppet sanitization opportunity Possible workarounds with host aliases Hostnames used in CERN IT data centres - 2

What we want to achieve Three distinct and independent goals –name is generated automatically by the node itself –name is unique forever Name was not used before and will not be used again –name is useful for those who need to deal with hardware On-site repair services at CERN & Wigner Procurement team Hostnames used in CERN IT data centres - 3

Automated host naming Goal: name is generated automatically by the node itself when it is powered up for the first time It can’t be: –depending on its location (room, rack, U) –relating to any service (e.g. batch) it may host in future –relating to functional element (lxhwproc01..) It can be: –completely random, and/or –based on some local feature that is somehow unique and can always be retrieved locally on the host, e.g. BMC Field Replaceable Unit (FRU) information, provided that: It’s not overwritten The repair technician has an established procedure to transfer original information if BMC is changed Constructing the name from MAC address or invariable s/n of a component (e.g. mainboard) is not a good idea because they may change Hostnames used in CERN IT data centres - 4

All new deliveries (>2012) Hostnames used in CERN IT data centres - 5 Sticker at rear of chassis Output from ‘ipmitool fru’ Vendors required to print custom labels with CERN order reference and an unique s/n set BMC FRU ‘Product serial’ and ‘Asset Tag ’ Unfortunately ‘cd n006225ts-2’ is too long for NETBIOS Random character (skip ‘l’, ’i’, ‘o’ and ‘z’) Suggested by a CMS user: ‘p e26469’ Compromise: ‘p ’, where ‘p’ for physical and last part is random

Unique names Goal: every hostname is unique forever LANDB assures a registered name is unique at any point in time However, nothing prevents reuse of names of decommissioned hosts –For instance: headnode of ‘pony’ service is always called ‘lxpony01’ and inherited when hardware is replaced Most history records are keyed by (DNS) name –Can lead to serious confusions and as result historic data is unreliable and therefore useless for h/w problem analysis –Past events or incidents recorded in SNOW, Lemon, syslog, etc. may refer to different hardware Hostnames used in CERN IT data centres - 6

Useful names Goal: name is useful for those who need to deal with hardware Purchase order as part of the name allows convenient grouping –Technician can immediately tell from hostname which stock (out of 50+) to use for repairs –Failure analysis: systematic hardware issues are often related to a delivery E.g. firmware bug or defective component batch Example: p crashed during power re-cabling at Wigner –Search for p * in SNOW and Lemon gives all other nodes affected in the same batch Hostnames used in CERN IT data centres - 7

Failure analysis example Hostnames used in CERN IT data centres - 8 Metric 6104: IPMI SEL Log Metric 9001: uptime Correlation of IPMI SEL entries with uptime <10000 Quick diagnose: on the almost identical batches p * and p * only nodes in the first batch crashed during the Wigner re-cabling intervention (a SNOW query gives same info)

Lessons from past Cdb combined information from –Delivery spreadsheets (MAC, s/n) –Spreadsheets from the rack mounting (rack, U-pos) –EDH (contract id == purchase order) –CDB-SQL warranty table –LANDB (ip, gateway, netmask) –Hardware discovery (CPU, RAM, HDD, RAID,…) LEAF tools and procedures for maintaining the consistency upon changes (e.g. rename) –Complex rollback when something failed in the middle –Risk for information degradation from software bugs and human errors Hostnames used in CERN IT data centres - 9

Quattor  Puppet campaign Moving hosts from Quattor to Puppet is an opportunity for patching information and restore consistency Found so far: –0.5% wrong s/n –4% with missing interfaces (especially IPMI) in LANDB. Expect 10% for older deliveries –0.5% hardware issues –8 (out of1400) hosts with wrong location Our conclusions: –A lot of information is manually gathered, entered and thoroughly checked once when equipment is received and installed –There is an inevitable risk for information degradation over time due to subsequent changes  Maximize automation and information discovery  Minimize the need for subsequent changes Hostnames used in CERN IT data centres - 10

Workaround with host aliases Can’t use DNS to list aliases –DNS mapping is one-way: alias (CNAME)  address (A) –There is nothing stored in DNS that goes the other direction for aliases However, LANDB can –getMyDeviceInfo() SOAP call Runs on local host and requires no authentication Hostnames used in CERN IT data centres - 11

LANDB & DNS Hostnames used in CERN IT data centres - 12 Device name is unknown to DNS but will usually correspond to one of the interface names below Interface names are recorded in DNS address (A) records Aliases are recorded in DNS Canonical Name (CNAME) records

Adding an alias in LANDB Hostnames used in CERN IT data centres - 13 Don’t remove the existing CD…-… alias when you add your own alias

Calling getMyDeviceInfo() Hostnames used in CERN IT data centres - 14 Perhaps not so pretty but seems to work Could it be wrapped somehow into a Puppet fact adding aliases to /etc/hosts? Puppet uses Ruby so it’s installed by default… #!/usr/bin/ruby -w require 'soap/rpc/driver' NAMESPACE = 'urn:NetworkService' URL = ' begin $stderr.reopen("/dev/null", "w") driver = SOAP::RPC::Driver.new(URL, NAMESPACE) # Add remote sevice methods driver.add_method('getMyDeviceInfo') # Call remote service method for getting the device information myInfo = driver.getMyDeviceInfo() # Initialize the hostname to the device name, to have a fallback in case we can't identity a proper alias hostname = myInfo['DeviceName'] # Get aliases for the main interface for interface in myInfo['Interfaces'] if interface['Name'].downcase == hostname.downcase # We identify the main interface, that matching the device name if interface['IPAliases'] # Do we have any aliases? for ipAlias in interface['IPAliases'] if myInfo['SerialNumber'] == nil || ipAlias.downcase != myInfo['SerialNumber'].downcase # If the alias is not matching the serial number hostname = ipAlias # We take it as the hostname break # And we break out of the for loop end break # Once we got to the main interface we break end puts hostname.downcase # We output the new hostname rescue => err puts err.message end

Use host alias example Hostnames used in CERN IT data centres - 15

Questions/comments Hostnames used in CERN IT data centres - 16