Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Similar presentations


Presentation on theme: "Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy."— Presentation transcript:

1 Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy

2 Guoming Liu 2 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

3 Guoming Liu 3 RT2009 LHCb online system LHCb Online system consists of three major components:  Data Acquisition (DAQ)  transfers the event data from the detector front-end electronics to the permanent storage  Timing and Fast Control (TFC)  drives all stages of the data readout of the LHCb detector between the front-end electronics and the online processing farm  Experiment Control System (ECS),  controls and monitors all parts of the experiment: the DAQ System, the TFC Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure etc.

4 Guoming Liu 4 RT2009 LHCb online system

5 Guoming Liu 5 RT2009 LHCb DAQ network  Components:  Readout board: TELL1/UKL1 In total: ~330  Aggregation switches  Core DAQ switch: Force10 E1200i Supports up to 1260 GbE ports Switch capacity: 3.5Tb/s  50 Edge switches Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches

6 Guoming Liu 6 RT2009 LHCb DAQ network Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches  DAQ works in a push mode  Protocols  Readout: MEP light-weight datagram protocol over IP  Storage: standard TCP/IP  Network throughputs  Read out: ~35 GByte/s First Level trigger accept rate: 1 MHz Avg. event size: ~ 35 kByte  Storage: ~ 70 MByte/s HLT accept rate: ~ 2 kHz ~280 Gb/s ~560 Mb/s

7 Guoming Liu 7 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

8 Guoming Liu 8 RT2009 Network monitoring  DAQ network  Large scale LAN: ~54 switches, ~3500 GbE ports  Performance critical  The status of the whole DAQ network must be monitored at different levels for different requirements  DAQ network monitoring is part of the LHCb ECS  Uses the same tool PVSS and framework JCOP PVSS: commercial SCADA system JCOP: Joint Control Project for LHC experiments  Provides the same operation interface

9 Guoming Liu 9 RT2009 Network Monitoring  Monitored Items  Topology  Traffic  IP routing  Hardware/system  Tools  Data collectors: Varied front-end processors based on SNMP, SysLog  Data communication: DIM Client/Server mechanism Server: publishes information by services Client: subscribes to the service Architecture of the Network Monitoring DIM SNMP / Syslog

10 Guoming Liu 10 RT2009 Network Monitoring (1): Topology  NeDi: an open source tool to discover the network  Discovery of the network topology based on Link Layer Discovery Protocol (LLDP) Seed  neighbors  neighbors of those neighbors  end (all devices are discovered)  Discovery of the network nodes  Certain modifications have been made for LHCb network environment  All information is sent to PVSS  PVSS Monitors any change of the topology.

11 Guoming Liu 11 RT2009 Network Monitoring (2): traffic  Traffic monitoring is based on SNMP (Simple Network Management Protocol)  The SNMP driver provided by PVSS has a low performance  Custom SNMP collectors:  Collect all the interface counters from the network devices Input and output traffic Input and output errors, discards  Publishes data for PVSS as a DIM server  PVSS:  Receives the data via PVSS-DIM bridge  Analyzes the traffic and archives them  Displays the current status and trending of the bandwidth utilization  Issues alarm in case of error

12 Guoming Liu 12 RT2009 Network Monitoring (2): traffic

13 Guoming Liu 13 RT2009 Network Monitoring (3): IP routing  Monitoring the status of the routing using “ping“  Three stages for the DAQ: 1. From readout board to HLT farm ICMP are not fully supported by the readout board, a normal computer is inserted to simulate the readout board 2. From HLT Farm to the LHCb online storage 3. From the online storage to CERN CASTOR  The front-end script gets the result and sends the summary to PVSS using DIM Core Switch HLT CPU ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches 1 2 3

14 Guoming Liu 14 RT2009 Network Monitoring (4): hardware/system  Syslog can collect some information not covered by SNMP  Syslog server is setup to receive the syslog messages from the network devices and parse the messages. Alarm information:  Hardware: temperature, fan status, power supply status  System: CPU, memory, login authentication etc.  All the messages with the priority higher than warning, will be sent to PVSS for further processing

15 Guoming Liu 15 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

16 Guoming Liu 16 RT2009 Network configuring  The LHCb online network system is quite large:  Different devices with different OS and command sets  But quite static, only a few features are essential for configuring the network devices.  Currently a set of Python scripts is used for configuring the network devices  Initial setup for new installed switch  Firmware upgrade  configuration file backup and restore

17 Guoming Liu 17 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

18 Guoming Liu 18 RT2009 Network Debugging Tools  Motivation: debugging the DAQ network problems, mainly packet dropping 1.High speed traffic monitoring  Queries the counters of selected interfaces using SNMP or CLI with a better time resolution  Shows the bandwidth utilization for the selected interfaces 2.sFlow Sampler  sFlow is a mechanism to capture packet headers, and collect the statistics from the device, especially in high speed networks  It is very useful to debug the packet dropping problem caused by wrong IP or MAC address

19 Guoming Liu 19 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

20 Guoming Liu 20 RT2009 Status of Network Installation and Deployment  Current setup:  With 2 aggregation switches  Only 2 linecards inserted to the core DAQ switch  For L0 trigger rate at ~200kHz  Upgrade for 1 MHz full speed readout.  Core DAQ switch: Forec10 E1200i 14 linecards, 1260 GbE ports will be ready at the end of June Upgrade from Terascale to Exascale: double the switch capacity and all ports run in line rate  All readout boards will be connected to the core DAQ switch directly Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches

21 Guoming Liu 21 RT2009


Download ppt "Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy."

Similar presentations


Ads by Google