Download presentation
Presentation is loading. Please wait.
Published byNorah Jordan Modified over 9 years ago
1
HEPiX – 9 May 20081Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Advanced Monitoring Techniques for the ATLAS TDAQ Network Matei Ciobotaru CERN University of California, Irvine “Politehnica” University of Bucharest on behalf of the ATLAS Networking Group: B. Martin, A. Al-Shabibi, S. Batraneanu, S. Stancu, L. Leahu, L. Darlea, M. Ivanovici
2
HEPiX – 9 May 20082Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch The ATLAS TDAQ Network – Role n The ATLAS Trigger and Data Acquisition Network (TDAQ) handles the data transfers from the ATLAS detector to the analysis and storage nodes n Built with Gigabit Ethernet switches and routers n Sustained rates of 150 Gbit/s n The experiment relies on the network to function 24/7 with a minimal number of failures ATLAS detector TDAQ system
3
HEPiX – 9 May 20083Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch 2 concentrator switches per rack 5 “big” chassis-based devices at the core The ATLAS TDAQ Network – Photos n Almost 3000 devices and 5000 network connections… n How to make sure everything is working correctly? 2500 computers installed in 90 racks
4
HEPiX – 9 May 20084Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Inside this talk n Requirements in terms in network management n Commercial software we are using n Tools we developed in-house n Services for users, integration with ATLAS n Plans for the future n The big picture
5
HEPiX – 9 May 20085Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch ATLAS Requirements n Installation –Ease the equipment registration, inventory and verification –Configure the devices n Operation –Check the state of health of devices and links –Monitor traffic conditions, raise alarms when needed –Assist the user in navigating the realm of information –Integration with the ATLAS TDAQ software n Diagnostics –Provide aids to the admin in case something goes wrong –Be able to suggest solutions to problems Complexity n Manage a large local area network which has to be very reliable and which has very high throughput requirements
6
HEPiX – 9 May 20086Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Equipment registration n ATLAS equipment needs to be registered in four databases n Only some databases support batch registrations, others require manual intervention may lead to inconsistencies n Developed a web application to cope with this situation –Central place for querying all the information about a device –Ability to cross-check the data across all databases detect incomplete/incorrect registrations
7
HEPiX – 9 May 20087Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Equipment inventory n Network diagrams for ATLAS are made in Microsoft Visio using the NetDesign package n We created tools which discover what really exists in the network (what is connected where) n Developed an application which compares the two data sources (Visio and Auto-discovery) mismatches are detected and corrected in the field if necessary n For the network documentation – we also generate automatically a printable “report” with all the connectivity Visio Network Discovery
8
HEPiX – 9 May 20088Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Network configuration (1) n In ATLAS we have more than 200 switches –Different vendors –Different mechanisms for configuration and monitoring (telnet, SNMP, web) n Q: How to access all devices in a transparent manner? –A: Bring them all under a common denominator (common interface) n Q: How to automatize network management tasks? –A: Write scripts (little programs) n sw_script = Set of Python modules which can be used as building blocks for network management solutions n Common programming interface to all devices (object-oriented) n “Intelligent” tools for configuration and monitoring can be developed switches + scripting = sw_script http://cern.ch/ciobota/projects/sw_script/
9
HEPiX – 9 May 20089Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Interactive session with sw_script # Start the Python interpreter $ python2.5 # Load the sw_script module >>> import sw_script # Create an object associated with the switch (a Cisco device in this case) >>> sw = sw_script.Cisco_Catalyst_6500_Switch(ip_address = “192.168.100.59"); # List the ports available on this device >>> sw.get_port_names(); [’1/1’, ’1/2’, ’1/3’, ’1/4’,.... # Get all the information available for an interface >>> sw.get(“1/4"); [(’rx_packets’, 519.0), (’rx_bytes’, 127937.0), (’rx_discards’, 0.0), (’rx_errors’, 0.0), (’tx_packets’, 11199.0),(’tx_bytes’, 1111661.0), (’tx_discards’, 0.0), (’tx_errors’, 0.0), (’description’, ’GigabitEthernet1/4’), (’link_state’, ’up’), (’mac_addr’, [’00:90:27:8F:94:E3’])] # Set the description (ifAlias) of an interface >>> sw.set_interface_alias(“1/4”, “Uplink to Core Router”) # Show the serial number of this device >>> print sw.get_serial_number() FOC0913U075 sw_script is responsible for more than a half of our network management toolbox n Features –Supports devices from different vendors –Network topology auto-discovery –Can do traffic monitoring in real-time –Works as a module, can be easily embedded into other apps
10
HEPiX – 9 May 200810Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Network configuration (2) n In ATLAS, we have programs which use sw_script to perform configuration changes on devices: –defining VLANs –enabling protocols: spanning tree, time synchronization, etc. –setting interface aliases (descriptions) n We use Python scripts to perform unattended firmware upgrades n For keeping track of configuration files we plan to use ZipTie (open-source software)
11
HEPiX – 9 May 200811Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Basic monitoring n Spectrum from Computer Associates software for device health and traffic monitoring (used by the CERN IT department) n Monitors devices, raises alarms in case of failures n Auto-discovery for almost all network connections n Historical info – Gathers statistics from all devices –Throughput and error rates saved every 30 seconds n Limitations –The Spectrum GUI is hard to use –It is not easy to integrate with 3 rd party apps –Limited support for network performance monitoring –Basic support for querying historical traffic data –No support for device configuration –Virtually no features for diagnostics Spectrum GUI n We developed software to fill-in the gaps
12
HEPiX – 9 May 200812Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Navigating in the realm of monitoring data n Spectrum produces 3 plots for each network interface. We shall have 5000 ports and 15000 plots to look at… n We developed tools to browse, query and analyze the traffic plots.
13
HEPiX – 9 May 200813Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Network browser
14
HEPiX – 9 May 200814Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Searching and aggregating plots
15
HEPiX – 9 May 200815Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Scanning for traffic events
16
HEPiX – 9 May 200816Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Integration with ATLAS software n Network Panel –Shows network monitoring information relevant to an ATLAS data acquisition run n Alarm Watcher –Forwards alarms from Spectrum into the ATLAS “official” messaging channels n IS Feeder –Publish network statistics to the Information Services, a monitoring sub-system in ATLAS The network Panel
17
HEPiX – 9 May 200817Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Network visualization – 2D approach n Application which shows a topological map of the network n Colors the connections in real-time in function of their state and usage n The overloaded links are detected easily n Good navigation features (zoom, pan) n Based on GUESS, a Java application for visualizing graphs –http://graphexploration.cond.org/ n We developed a network monitoring plug-in for GUESS
18
HEPiX – 9 May 200818Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Network visualization – 3D approach (1) n Each object contains a panel with traffic information (updated in real-time) n Containers (racks, rooms) show aggregate values n Technologies used: X3D, Java and the Octaga Player n 3D model of the network n Racks, switches and computers Furniture in the 3D space n Navigation similar to Google Earth
19
HEPiX – 9 May 200819Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Network visualization – 3D approach (2)
20
HEPiX – 9 May 200820Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Real-time traffic monitoring Connections for one switch (with traffic values) The ATLAS applications running now in the network Real-time global top (most active connections)
21
HEPiX – 9 May 200821Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Diagnostics n For immediate response, we look in Spectrum and in the sw_script web pages n Human inspection of traffic plots (aggregates) – we search for abnormal patterns and correlations between plots n We have a collection of scripts to test different things –Checking that machines are configured properly and connections are ok n For bandwidth-related issues we use iperf n All the network operations are documented in a knowledge base (wiki)
22
HEPiX – 9 May 200822Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch Plans for the future n Better visualization techniques for traffic plots n Analysis tools for monitoring data. Pattern detection and recognition (periodic events, monotonic variations, etc.) n Add support for sFlow, the standard for statistical sampling – very useful to diagnose network congestion n Design and implement an expert system which will help us troubleshoot network issues
23
HEPiX – 9 May 200823Network monitoring in ATLAS – Matei.Ciobotaru@cern.ch The big picture Historical traffic data Real-time traffic info Dynamic web-pages Browse, search and aggregate 2D and 3D network visualization ATLAS software – network status and alarms Equipment configuration Device health monitoring Equipment auto-discovery, inventory and registration Commercial packageIn-house development sw_script & co.Spectrum
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.