Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.

Similar presentations


Presentation on theme: "Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland."— Presentation transcript:

1 Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland

2 Outline  Introduction to LHCb Online system  LHCb online network  Network management based on SCADA system  Summary Guoming Liu 2 ICALEPCS2009

3 LHCb online system  LHCb is one of the large particle physics experiments on LHC at CERN  Online system is one of the infrastructures for LHCb, providing IT services for the entire experiment  Three major components:  Data Acquisition (DAQ) Transfers the event data from the detector front-end electronics to the permanent storage  Timing and Fast Control (TFC) Provides fast clock and drives all stages of the data readout of the LHCb detector between the front-end electronics and the online processing farm  Experiment Control System (ECS), Controls and monitors all parts of the experiment Guoming Liu 3 ICALEPCS2009

4 LHCb online system Guoming Liu 4 ICALEPCS2009 Control and Monitoring data CASTOR SWITCH HLT farm Detector TFC System SWITCH READOUT NETWORK L0 trigger LHC clock MEP Request Event building Front-End CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU Experiment Control System (ECS) VELO STOT RICH ECalHCalMuon L0 Trigger Event data Timing and Fast Control Signals SWITCH MON farm CPUCPU CPUCPU CPUCPU CPUCPU Readout Board FEE Readout Board

5 LHCb Online Network  Two dedicated networks:  Control network: general purpose network for experiment control system Connects all the Ethernet devices in LHCb  Data network: dedicated to data acquisition Performance critical Guoming Liu 5 ICALEPCS2009

6 LHCb Online Network  Two geographic parts: surface and underground Connected by two 10G links Guoming Liu 6 ICALEPCS2009

7 LHCb Online Network Guoming Liu 7 ICALEPCS2009 Core CTRL Routers Core DAQ Router CTRL Access Switches (~100) DAQ Access Switches (~50) On the surface

8 Network Monitoring System based on SCADA  Motivation  This large network needs sophisticated monitoring  Integration into LHCb ECS coherently  Provides homogeneous interfaces for non-expert shift-crew Commercial network management software?  Expensive  Integration? Guoming Liu 8 ICALEPCS2009

9 Network Monitoring System: Architecture  Supervisory layer  PVSS II: commercial SCADA system  JCOP: Joint Control Project for LHC experiments  Front–end Processes:  SNMP  sFlow  syslog  Data communication  DIM: Distributed Information Management Guoming Liu 9 ICALEPCS2009 SNMP / sFlow / Syslog DIM

10 Network Monitoring System: FSM  All behaviors are modeled as Finite State Machines (FSM)  Hierarchical structure: status/command propagated Guoming Liu 10 ICALEPCS2009  Device Units: Device Description Device Access Based on PVSS II datapoint: Alarm Handling, Archiving, Trending etc.  Control Units Abstract behavior modeling Represents the associated sub-tree

11 Network Monitoring System The major items under monitor  Physical topology  Discovery of the network topology based on the Link Layer Discovery Protocol (LLDP)  Discovery of the network nodes: based on the information in switches (ARP, MAC forwarding table)  Traffic  Octet / packet counters  Discard/Error counters ...  Switch status: CPU/Memory, temperature, power supply,...  Data Paths for DAQ Guoming Liu 11 ICALEPCS2009

12 Network Monitoring Snapshot(1): Topology Guoming Liu 12 ICALEPCS2009

13 Network Monitoring Snapshot(2): traffic Guoming Liu 13 ICALEPCS2009

14 Summary  The network management system has been implemented based on the commercial SCADA system PVSS II and the framework JCOP  It provides sophisticated monitoring of the network which are essential for our operation, i.e. switch status, traffic  It provides the homogenous operation interface and intuitive display as well  Currently only monitoring is provided, some control commands of switches to be integrated Guoming Liu 14 ICALEPCS2009

15 Thanks for your attention! Guoming Liu 15 ICALEPCS2009

16 Backup Guoming Liu 16 ICALEPCS2009

17 NMS Architecture: front-end processes Guoming Liu 17 ICALEPCS2009  SNMP: Simple network management protocol Used for general network monitoring, configuring  sFlow:  A sampling mechanism to capture traffic data  Based on hardware.  Two kinds of sFlow samples: flow samples and counter samples. Used on the core switch to collect traffic counters: SNMP too slow, and consumes high CPU/Memory  Syslog: event notification messages  Three distinct parts: priority, header and message.  The priority part represents both the facility and severity of the message.

18 Network Monitoring: hardware/system  Syslog can collect some information not covered by SNMP  Syslog server is setup to receive the syslog messages from the network devices and parse the messages. Alarm information:  Hardware: temperature, fan status, power supply status  System: CPU, memory, login authentication etc.  All the messages with the priority higher than warning, will be sent to PVSS for further processing Guoming Liu 18 ICALEPCS2009

19 Network Monitoring: IP routing Guoming Liu 19 ICALEPCS2009 Control and Monitoring data CASTOR SWITCH HLT farm Detector SWITCH READOUT NETWORK Event building Front-End CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU CPUCPU Readout Board VELO STOT RICH ECalHCalMuon L0 Trigger Event data Timing and Fast Control Signals SWITCH MON farm CPUCPU CPUCPU CPUCPU CPUCPU Readout Board FEE 1 2 3  Monitoring the status of the routing using “ping“/”arping”  Three stages for the DAQ: 1. From readout board to HLT farm 2. From HLT Farm to the LHCb online storage 3. From the online storage to CERN CASTOR


Download ppt "Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland."

Similar presentations


Ads by Google