Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
June 19, 2002 A Software Skeleton for the Full Front-End Crate Test at BNL Goal: to provide a working data acquisition (DAQ) system for the coming full.
LHCb Upgrade Overview ALICE, ATLAS, CMS & LHCb joint workshop on DAQ Château de Bossey 13 March 2013 Beat Jost / Cern.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
The LHCb Event-Builder Markus Frank, Jean-Christophe Garnier, Clara Gaspar, Richard Jacobson, Beat Jost, Guoming Liu, Niko Neufeld, CERN/PH 17 th Real-Time.
DASAN NETWORKS GPON Training
Agenda SNMP Review SNMP Manager Management Information Base (MIB)
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.
The LHCb Online System Design, Implementation, Performance, Plans Presentation at the 2 nd TIPP Conference Chicago, 9 June 2011 Beat Jost Cern.
Hands-on Networking Fundamentals
Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2012 Xavier Vilasis.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Repeaters and Hubs Repeaters: simplest type of connectivity devices that regenerate a digital signal Operate in Physical layer Cannot improve or correct.
TRIGGER-LESS AND RECONFIGURABLE DATA ACQUISITION SYSTEM FOR POSITRON EMISSION TOMOGRAPHY Grzegorz Korcyl 2013.
Clara Gaspar, October 2011 The LHCb Experiment Control System: On the path to full automation.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
Discovery 2 Internetworking Module 5 JEOPARDY John Celum.
DAQ & ECS for TPC commissioning A few statements about what has been done and what is still in front of us F.Carena.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Scaling Networks Scaling Networks.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
802.11n Sniffer Design Overview Vladislav Mordohovich Igor Shtarev Luba Brouk.
Data Acquisition Backbone Core J. Adamczewski-Musch, N. Kurz, S. Linev GSI, Experiment Electronics, Data processing group.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
LHCb front-end electronics and its interface to the DAQ.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.
Clara Gaspar, July 2005 RTTC Control System Status and Plans.
Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
DAQ interface + implications for the electronics Niko Neufeld LHCb Electronics Upgrade June 10 th, 2010.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
LHCb Configuration Database Lana Abadie, PhD student (CERN & University of Pierre et Marie Curie (Paris VI), LIP6.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
The DCS Databases Peter Chochula. 31/05/2005Peter Chochula 2 Outline PVSS basics (boring topic but useful if one wants to understand the DCS data flow)
Monitoring for the ALICE O 2 Project 11 February 2016.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.
COMPASS DAQ Upgrade I.Konorov, A.Mann, S.Paul TU Munich M.Finger, V.Jary, T.Liska Technical University Prague April PANDA DAQ/FEE WS Игорь.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
CHAPTER 3 Router CLI Command Line Interface. Router User Interface User and privileged modes User mode --Typical tasks include those that check the router.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
2016 Global Seminar 按一下以編輯母片標題樣式 Virtualization apps simplify your IoT development Alfred Li.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
MPD Data Acquisition System: Architecture and Solutions
LHCb and InfiniBand on FPGA
CCNA Routing and Switching Routing and Switching Essentials v6.0
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
Controlling a large CPU farm using industrial tools
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Chapter 10: Device Discovery, Management, and Maintenance
CCNA Routing and Switching Routing and Switching Essentials v6.0
The LHCb Event Building Strategy
Chapter 10: Device Discovery, Management, and Maintenance
John Harvey CERN EP/LBC July 24, 2001
Philippe Vannerem CERN / EP ICALEPCS - Oct03
LHCb Trigger, Online and related Electronics
The LHCb High Level Trigger Software Framework
Network Processors for a 1 MHz Trigger-DAQ System
Presentation transcript:

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy

2 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network Configuration  Network Debugging  Status of LHCb network installation and deployment

3 LHCb online system LHCb Online system consists of three major components:  Data Acquisition (DAQ)  transfers the event data from the detector front-end electronics to the permanent storage  Timing and Fast Control (TFC)  drives all stages of the data readout of the LHCb detector between the front-end electronics and the online processing farm  Experiment Control System (ECS),  controls and monitors all parts of the experiment: the DAQ System, the TFC Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure etc.

4 LHCb online system

5 LHCb online network  Two large scale Ethernet networks:  DAQ network Dedicated to data acquisition  Control network For the instruments and computers in LHCb experiment In total:  ~170 switches  ~9000 ports

6 LHCb DAQ network  DAQ works in a push mode  Components:  Readout board: TELL1/UKL1 In total: ~330  Aggregation switches  Core DAQ switch: Force10 E1200i Supports up to 1260 GbE ports Switch capacity: 3.5Tb/s  Edge switches Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTOR Aggregation Switches

7 LHCb DAQ network Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTOR Aggregation Switches  Protocols  Readout: MEP light-weight datagram protocol over IP  Storage: standard TCP/IP  Network throughputs  Read out: ~35 GByte/s L0 trigger accept rate: 1 MHz Avg. event size: ~ 35 kByte  Storage: ~ 70 MByte/s HLT accept rate: ~ 2 kHz ~280 Gb/s ~560 Mb/s

8 Network Monitoring  Part of the LHCb ECS  Uses the same tool and framework  Provides the same operation interface  Implementation  Monitoring and integration: PVSS and JCOP  Data collection: Varied front-end processors  Data exchange: Distributed Information Management (DIM)

9 Architecture of the Network Monitoring Network Monitoring  Monitoring the status of the LHCb DAQ network at different levels  Topology  IP routing  Traffic  Hardware/system

10 Network Monitoring  Monitoring the status of the LHCb DAQ network at different levels  Topology  IP routing  Traffic  Hardware/system Structure of the Finite State Machine for Network Monitoring

11 Network Monitoring: Topology  The topology is quite “static”  NeDi: an open source tool to discover the network  Discovery of the network topology based on Link Layer Discovery Protocol (LLDP) Queries the neighbors of the seed, and then the neighbors of those neighbors, and so on until all the devices have been discovered in the network.  Discovery of the network nodes  All information is stored in the database, and can be queried by PVSS  PVSS Monitors the topology only (the uplinks between the switches). The nodes are monitored by Nagios.

12 Network Monitoring: IP routing  Monitoring the status of the routing with Internet Control Message Protocol (ICMP), specifically “ping“  Three stages for the DAQ:  Entire read-out event from the readout board to HLT farm ICMP not fully implemented in the readout board, a general computer is inserted to simulate the the readout board: Test the status of the readout board using “arping” Test the availability of the HLT nodes using “ping”  Selected events from the HLT to the LHCb online storage  From the online storage to CERN CASTOR  The front-end script gets the result and exchanges the message with PVSS using DIM

13 Network Monitoring: traffic  Front-end processors:  Collect all the interface counters from the network devices using SNMP Input and output traffic Input and output errors, discards  Exchange data as a DIM server  PVSS:  Receives the data via PVSS-DIM bridge  Analyzes the traffic and archives them  Displays the current status and trending of the bandwidth utilization  Issues alarm in case of error

14 Network Monitoring: traffic

15 Network Monitoring: hardware/system  Syslog server is setup to receive the syslog messages from the network devices and parse the messages. When the network devices run into problems, the error messages will be generated and sent to the syslog server as configured in the network device  Hardware: temperature, fan status, power supply status  System: CPU, memory, login authentication etc.  Syslog can collect some information not covered by SNMP  All the collected messages will be communicated to PVSS

16 Network Configuration  The LHCb online network system is quite large:  Different devices with different OS and command sets  But quite static luckily, only a few features are essential for configuring the network devices.  Currently a set of Python scripts is used for configuring the network devices, using module pexpect for interactive CLI access.  Initial setup for new installed switch  Firmware upgrade  Configuration file backup and restore

17 Network Configuration NeDi CLI access  Web-based interface  Possible to select a set of switches by type, IP, or name etc.  Can apply a batch of commands on a set of switches

18 Network Diagnostics Tools  sFlow Sampler  sFlow is a mechanism to capture packet headers, and collect the statistics from the device, especially in high speed networks  samples the packet on the switch port and displays the header information It is very useful to debug the packet loss problem, e.g. caused by wrong IP or MAC address  Relative high speed traffic monitoring  Queries the counters for selected interfaces using SNMP or CLI with a finer time resolution  Shows the utilization for the selected interfaces

19 Status of Network Installation and Deployment  Current setup:  With 2 aggregation switches  Only 2 linecards inserted to the core DAQ switch  For L0 trigger rate at ~200kHz  Upgrade for 1 MHz full speed readout.  Core DAQ switch: Forec10 E1200i 14 linecards, 1260 GbE ports will be ready at the end of June Upgrade from Terascale to Exascale: double the switch capacity and all ports run in line rate  All readout boards will be connected to the core DAQ switch directly

20