Download presentation
Presentation is loading. Please wait.
Published byVictoria Watts Modified over 9 years ago
1
PLANNING FOR PREDICTABLE NETWORK PERFORMANCE IN THE ATLAS TDAQ C. Meirosu, B. Martin, A. Topurov, A. Al-Shabibi CHEP’06, Mumbai, India
2
February 2006Catalin Meirosu 2 Outline Network management, the basics TDAQ needs Proposed solution – architecture and implementation
3
February 2006Catalin Meirosu 3 What is network management ? Management workstation Hardwareinventory Software version Device settings Troubleshooting assistance
4
February 2006Catalin Meirosu 4 Network management Manager-agent model De-facto standard implementation through IETF specifications – Management Information Bases (MIBs) – Simple Network Management Protocol (SNMP) Best practices: IT Information Library (ITIL) – Configuration and change management (among others) – Emphasis on service, rather than hardware/software
5
February 2006Catalin Meirosu 5 The ATLAS TDAQ system Control network Courtesy Stefan Stancu
6
February 2006Catalin Meirosu 6 What does TDAQ need ? The TDAQ networking scenario – Pre-defined number of devices to connect to the network, staged deployment – Network resilient by design, optimised for a known traffic pattern Need to maximize network uptime – Rapid and precise fault localisation Maintain the agreed QoS for the applications known at design time while providing good service for late arrivals Use the network as a debug tool for application data transfer problems Provide real-time information on the network status to the physics operator on shift
7
February 2006Catalin Meirosu 7 Physics console middleware Fault and Performance management Configuration management YaTG Network administrator TDAQ network management solution Network management for TDAQ
8
February 2006Catalin Meirosu 8 Fault Management Detection – Where? – When? Why? Root Cause Analysis (and related methods) – Precise location of the actual fault standard component in commercial packages Physics console middleware Fault and Performance management Configuration management YaTG Network administrator TDAQ network management solution
9
February 2006Catalin Meirosu 9 Performance management “Why am I not receiving the advertised service ?” Traffic monitoring and reporting – Standard best practices: report 1-5 min averages Low frequency monitoring – Included in the commercial tool High frequency monitoring – YaTG (in-house development) : 1s average – Rate potentially higher, but not supported by SNMP implementations in many modern switches Physics console middleware Fault and Performance management Configuration management YaTG Network administrator TDAQ network management solution
10
February 2006Catalin Meirosu 10 Integration to physics console Physics operator controls the run of the experiment via the Online Software – needs to see the state of the network The fault management knows the state of the network We pass the information (via CORBA) to the Online Software – Hence the “middleware” term (mean: “in between”) Physics console middleware Fault and Performance management Configuration management YaTG Network administrator TDAQ network management solution
11
February 2006Catalin Meirosu 11 Configuration management Keep track of configuration changes in the network devices Push pre-defined configurations onto the network RANCID, open source tool – Covers the above basics Advanced features under discussion – Application-driven network reconfiguration ? Physics console middleware Fault and Performance management Configuration management YaTG Network administrator TDAQ network management solution
12
February 2006Catalin Meirosu 12 Conclusion A Network Management Solution is a must in the TDAQ context Areas to be covered – Performance management – Fault management – Configuration management – Integration with the Physics Console Sophisticated network management is expensive, but the network cannot troubleshoot itself (yet)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.