Download presentation
Presentation is loading. Please wait.
Published byMiles Hodges Modified over 8 years ago
1
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy
2
Guoming Liu 2 RT2009 Outline Introduction to LHCb DAQ system Network Monitoring based on SCADA system Network configuring Network Debugging Status of LHCb network installation and deployment
3
Guoming Liu 3 RT2009 LHCb online system LHCb Online system consists of three major components: Data Acquisition (DAQ) transfers the event data from the detector front-end electronics to the permanent storage Timing and Fast Control (TFC) drives all stages of the data readout of the LHCb detector between the front-end electronics and the online processing farm Experiment Control System (ECS), controls and monitors all parts of the experiment: the DAQ System, the TFC Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure etc.
4
Guoming Liu 4 RT2009 LHCb online system
5
Guoming Liu 5 RT2009 LHCb DAQ network Components: Readout board: TELL1/UKL1 In total: ~330 Aggregation switches Core DAQ switch: Force10 E1200i Supports up to 1260 GbE ports Switch capacity: 3.5Tb/s 50 Edge switches Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches
6
Guoming Liu 6 RT2009 LHCb DAQ network Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches DAQ works in a push mode Protocols Readout: MEP light-weight datagram protocol over IP Storage: standard TCP/IP Network throughputs Read out: ~35 GByte/s First Level trigger accept rate: 1 MHz Avg. event size: ~ 35 kByte Storage: ~ 70 MByte/s HLT accept rate: ~ 2 kHz ~280 Gb/s ~560 Mb/s
7
Guoming Liu 7 RT2009 Outline Introduction to LHCb DAQ system Network Monitoring based on SCADA system Network configuring Network Debugging Status of LHCb network installation and deployment
8
Guoming Liu 8 RT2009 Network monitoring DAQ network Large scale LAN: ~54 switches, ~3500 GbE ports Performance critical The status of the whole DAQ network must be monitored at different levels for different requirements DAQ network monitoring is part of the LHCb ECS Uses the same tool PVSS and framework JCOP PVSS: commercial SCADA system JCOP: Joint Control Project for LHC experiments Provides the same operation interface
9
Guoming Liu 9 RT2009 Network Monitoring Monitored Items Topology Traffic IP routing Hardware/system Tools Data collectors: Varied front-end processors based on SNMP, SysLog Data communication: DIM Client/Server mechanism Server: publishes information by services Client: subscribes to the service Architecture of the Network Monitoring DIM SNMP / Syslog
10
Guoming Liu 10 RT2009 Network Monitoring (1): Topology NeDi: an open source tool to discover the network Discovery of the network topology based on Link Layer Discovery Protocol (LLDP) Seed neighbors neighbors of those neighbors end (all devices are discovered) Discovery of the network nodes Certain modifications have been made for LHCb network environment All information is sent to PVSS PVSS Monitors any change of the topology.
11
Guoming Liu 11 RT2009 Network Monitoring (2): traffic Traffic monitoring is based on SNMP (Simple Network Management Protocol) The SNMP driver provided by PVSS has a low performance Custom SNMP collectors: Collect all the interface counters from the network devices Input and output traffic Input and output errors, discards Publishes data for PVSS as a DIM server PVSS: Receives the data via PVSS-DIM bridge Analyzes the traffic and archives them Displays the current status and trending of the bandwidth utilization Issues alarm in case of error
12
Guoming Liu 12 RT2009 Network Monitoring (2): traffic
13
Guoming Liu 13 RT2009 Network Monitoring (3): IP routing Monitoring the status of the routing using “ping“ Three stages for the DAQ: 1. From readout board to HLT farm ICMP are not fully supported by the readout board, a normal computer is inserted to simulate the readout board 2. From HLT Farm to the LHCb online storage 3. From the online storage to CERN CASTOR The front-end script gets the result and sends the summary to PVSS using DIM Core Switch HLT CPU ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches 1 2 3
14
Guoming Liu 14 RT2009 Network Monitoring (4): hardware/system Syslog can collect some information not covered by SNMP Syslog server is setup to receive the syslog messages from the network devices and parse the messages. Alarm information: Hardware: temperature, fan status, power supply status System: CPU, memory, login authentication etc. All the messages with the priority higher than warning, will be sent to PVSS for further processing
15
Guoming Liu 15 RT2009 Outline Introduction to LHCb DAQ system Network Monitoring based on SCADA system Network configuring Network Debugging Status of LHCb network installation and deployment
16
Guoming Liu 16 RT2009 Network configuring The LHCb online network system is quite large: Different devices with different OS and command sets But quite static, only a few features are essential for configuring the network devices. Currently a set of Python scripts is used for configuring the network devices Initial setup for new installed switch Firmware upgrade configuration file backup and restore
17
Guoming Liu 17 RT2009 Outline Introduction to LHCb DAQ system Network Monitoring based on SCADA system Network configuring Network Debugging Status of LHCb network installation and deployment
18
Guoming Liu 18 RT2009 Network Debugging Tools Motivation: debugging the DAQ network problems, mainly packet dropping 1.High speed traffic monitoring Queries the counters of selected interfaces using SNMP or CLI with a better time resolution Shows the bandwidth utilization for the selected interfaces 2.sFlow Sampler sFlow is a mechanism to capture packet headers, and collect the statistics from the device, especially in high speed networks It is very useful to debug the packet dropping problem caused by wrong IP or MAC address
19
Guoming Liu 19 RT2009 Outline Introduction to LHCb DAQ system Network Monitoring based on SCADA system Network configuring Network Debugging Status of LHCb network installation and deployment
20
Guoming Liu 20 RT2009 Status of Network Installation and Deployment Current setup: With 2 aggregation switches Only 2 linecards inserted to the core DAQ switch For L0 trigger rate at ~200kHz Upgrade for 1 MHz full speed readout. Core DAQ switch: Forec10 E1200i 14 linecards, 1260 GbE ports will be ready at the end of June Upgrade from Terascale to Exascale: double the switch capacity and all ports run in line rate All readout boards will be connected to the core DAQ switch directly Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches
21
Guoming Liu 21 RT2009
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.