L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Frederic Bernard Claire Tamper. Last Meeting conclusion (30/10/09)
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Supervision of Production Computers in ALICE Peter Chochula for the ALICE DCS team.
UNIX Chapter 01 Overview of Operating Systems Mr. Mohammad A. Smirat.
Introduction to Network Administration. Objectives.
Operating systems This work is licensed under a Creative Commons Attribution-Noncommercial- Share Alike 3.0 License. Skills: none IT concepts: popular.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
Clara Gaspar, May 2010 The LHCb Run Control System An Integrated and Homogeneous Control System.
Building an Application Server for Home Network based on Android Platform Yi-hsien Liao Supervised by : Dr. Chao-huang Wei Department of Electrical Engineering.
Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Automatic Software Testing Tool for Computer Networks ARD Presentation Adi Shachar Yaniv Cohen Dudi Patimer
Applying Distributed Systems concepts to SCADA By Padmanabha Kamath.
Clara Gaspar, November 2012 Experiment Control System LS1 Plans…
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Using PVSS for the control of the LHCb TELL1 detector emulator (OPG) P. Petrova, M. Laverne, M. Muecke, G. Haefeli, J. Christiansen CERN European Organization.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
An Introduction to IBM Systems Director
09/11/20061 Detector Control Systems A software implementation: Cern Framework + PVSS Niccolo’ Moggi and Stefano Zucchelli University and INFN Bologna.
Module 7: Fundamentals of Administering Windows Server 2008.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
Update on Database Issues Peter Chochula DCS Workshop, June 21, 2004 Colmar.
XXVI Workshop on Recent Developments in High Energy Physics and Cosmology Theodoros Argyropoulos NTUA DCS group Ancient Olympia 2008 ATLAS Cathode Strip.
ALICE DCS Meeting.- 05/02/2007 De Cataldo, Franco - INFN Bari - 1 ALICE dcsUI Version 3.0 -dcsUI v3.0 is ready and will be soon posted on the ACC site.
ALICE DCS Workshop - 14/03/2006 De Cataldo, CERN CH and INFN Bari - 1 Standardization of the DCS control panels The ACC is elaborating a set of panels.
André Augustinus 10 October 2005 ALICE Detector Control Status Report A. Augustinus, P. Chochula, G. De Cataldo, L. Jirdén, S. Popescu the DCS team, ALICE.
ALICE, ATLAS, CMS & LHCb joint workshop on
Agilent Technologies Copyright 1999 H7211A+221 v Capture Filters, Logging, and Subnets: Module Objectives Create capture filters that control whether.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
OPERATING SYSTEM - program that is loaded into the computer and coordinates all the activities among computer hardware devices. -controls the hardware.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
ALICE Use of CMF (CC) for the installation of OS and basic S/W OPC servers and other special S/W installed and configured by hand PVSS project provided.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Status of Farm Monitor and Control CERN, February 24, 2005 Gianluca Peco, INFN Bologna.
5-Oct-051 Tango collaboration status ICALEPCS 2005 Geneva (October 2005)
Manchester University Tiny Network Element Monitor (MUTiny NEM) A Network/Systems Management Tool Dave McClenaghan, Manchester Computing George Neisser,
Chapter 06: System Software. Definition  Master program  Controls all hardwares connected to computer  Collection of programs Users Application software.
Monitoring and Managing Server Performance. Server Monitoring To become familiar with the server’s performance – typical behavior Prevent problems before.
Status of the new NA60 “cluster” Objectives, implementation and utilization NA60 weekly meetings Pedro Martins 03/03/2005.
Clara Gaspar, July 2005 RTTC Control System Status and Plans.
Source Controller software Ianos Schmidt The University of Iowa.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
Connecting LabVIEW to EPICS network
Configuration database status report Eric van Herwijnen September 29 th 2004 work done by: Lana Abadie Felix Schmidt-Eisenlohr.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Update on Farm Monitor and Control Domenico Galli, Bologna RTTC meeting Genève, 14 april 2004.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
© 2002, Cisco Systems, Inc. All rights reserved..
Clara Gaspar, February 2010 DIM A Portable, Light Weight Package for Information Publishing, Data Transfer and Inter-process Communication.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
- My application works like a dream…does it. -No prob, MOON is here. F
PVSS an industrial tool for slow control
Useful Tools for Testing
Hands-On Microsoft Windows Server 2008
Controlling a large CPU farm using industrial tools
How SCADA Systems Work?.
Oracle Solaris Zones Study Purpose Only
The LHCb Run Control System
HC Hyper-V Module GUI Portal VPS Templates Web Console
Philippe Vannerem CERN / EP ICALEPCS - Oct03
Tools for the Automation of large distributed control systems
CCNA 4 v3.1 Module 6 Introduction to Network Administration
Presentation transcript:

L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011

 Large nature of the experiment: ◦ Large number of PCs:  1747 PCs (55 Windows, 1692 Linux)  34 Virtual Machines ◦ Different system architectures  Need to monitor and maintain infrastructure operation parameters reliably  Central tool to globally monitor the infrastructure ICALEPCS, October 2011

 Heterogeneous nature of the computer infrastructure required a layered approach to monitor different sets of parameters: ◦ Hardware Layer – The hardware infrastructure ◦ Operating System Layer – The software infrastructure/environment ◦ PVSS Infrastructure – The supervisory control software running the experiment ◦ Application Layer – Specific software needed for system control and operation ICALEPCS, October 2011

 To monitor the parameters for the different monitoring layers a set of tools was developed: ◦ Monitoring and Control Servers (FMC Tools):  Gather and publish computer parameters  Publish data based on DIM (Distributed Information Management)  Operating system dependent (Linux/Windows) ◦ Graphical User Interface and “aggregation” Tools  Provide a centralized interface ICALEPCS, October 2011

 OS dependent: ◦ Slightly different approach according to the OS (Linux/Windows) ◦ Linux:  Each of the monitoring servers must be running on the node to be monitored which publish the node data ◦ Windows:  A server on a central PC connects to the windows nodes, gathers the data and publishes the data for all of the nodes ICALEPCS, October 2011

 Hardware Layer: ◦ OS independent monitoring/control:  IPMI Server – gathers and controls PCs power status via the IPMI (Inteligent Platform Management Interface) interface  Virtual Machine Server – gathers and controls Virtual Machines power status ◦ OS dependent:  Memory Server – monitors memory usage  CPU Info Server – gathers CPU information  CPU Stat Server – gathers CPU usage  File System Server – gathers FS usage  Network Interface Server – monitors network traffic statistics ICALEPCS, October 2011

 Operating System Layer: ◦ OS server – gathers operating system and kernel information  PVSS Infrastructure Layer: ◦ PVSS pmon process - A process monitor agent linked to each PVSS Project that runs independently from it. This agent monitors and publishes the state of PVSS processes. It can also act on these processes (start/stop/reset)  Application Layer: ◦ Process Monitor Server - gathers info on the running processes on each PC ◦ Task Manager Server - gathers the data for the running processes on each node and is also able to start processes on these nodes. Note: For the Windows systems central starting/stopping of processes is not “yet” implemented ICALEPCS, October 2011

 Based on PVSS SCADA system and DIM  Developed within the JCOP (Joint Controls Project) framework group at CERN  FwFMC – Subscribes to the DIM services and commands published by the FMC Servers and provides a GUI for easy interaction and monitoring  FwSystemOverview – Reutilizes the data subscribed from FwFMC and presents it in synoptic panels ◦ Adds PVSS Project monitoring and control capabilities ◦ Adds grouping capabilities  PVSS provides easy data archiving and alarm handling capabilities ICALEPCS, October 2011

 1747 monitored PCs: ◦ 55 Windows / 1692 Linux ◦ 34 are Virtual Machines ◦ 1470 are for the HLT Farm ◦ 167 controls PCs running PVSS Projects ◦ 110 for infrastructure support, webservers, reconstruction, …  All PCs power control available (IPMI/VM Servers)  Memory and CPU monitoring only the most sensitive ones  All PVSS projects and individual managers are monitored  Processes and Task Manager Servers runnig on all the control PCs ICALEPCS, October 2011

 LHCb developed the interface using 2 hierarchies: ◦ Hosts monitoring and control ◦ PVSS Projects monitoring and control ICALEPCS, October 2011  Hierarchies divided by sub-detector and function: ◦ Easy to check global state of the system ◦ Easy to check state of a particular system ◦ Allows individual sub- detector access to their infrastructure

 Host Management: ◦ Easily switch ON/OFF Hosts ◦ Check CPU and memory usage ◦ Check File System usage ◦ Monitor single processes and services memory usage ◦ Possibility to act globally on a group (power wise)  Great for recovering the system after a power-cut! ICALEPCS, October 2011

 PVSS Project Management: ◦ Based on PVSS Pmon calls over TCP/IP ◦ Have a global overview of the state of the individual PVSS projects and managers:  Statistics about projects running/stopped  Statistics about number of managers blocked/abnormaly stopped ◦ Detect mismatches between expected configuration and configuration running ◦ Act on PVSS projects and managers like logging on the local machines ◦ Act Globally on a group (Start/Stop/Restart Projects ICALEPCS, October 2011

 Globally Manage PVSS Managers ◦ Filter managers by type, system, state, options… ◦ Start/stop filtered managers ◦ Change filtered managers startup properties  Great for updates to control software running from repositories! ICALEPCS, October 2011

 Very elegant and user friendly central management tool  Easy and complete monitoring  Global overview and fine control: ◦ PCs status ◦ PVSS Projects and managers ◦ Applications processes and services  Monitor different systems with the same interface  Expandable to other TCP enabled devices ICALEPCS, October 2011