Presentation is loading. Please wait.

Presentation is loading. Please wait.

31 May, 2006Stefan Stancu1 Control and Data Networks Architecture Proposal S. Stancu, C. Meirosu, B. Martin.

Similar presentations


Presentation on theme: "31 May, 2006Stefan Stancu1 Control and Data Networks Architecture Proposal S. Stancu, C. Meirosu, B. Martin."— Presentation transcript:

1 31 May, 2006Stefan Stancu1 Control and Data Networks Architecture Proposal S. Stancu, C. Meirosu, B. Martin

2 31 May, 2006Stefan Stancu2 Outline Overview of the TDAQ system and networks Technology and equipment TDAQ networks:  Control network - no special bandwidth requirement  Dedicated data networks:  DataFlow network – high bandwidth (~100Gbit/s cross-sectional bw.) and minimal loss  BackEnd network – high bandwidth (~ 50Gbit/s cross-sectional bw.)  Switch management network Conclusions

3 31 May, 2006Stefan Stancu3 ATLAS TDAQ System/Networks High availability 24/7 when the accelerator is running

4 31 May, 2006Stefan Stancu4 Technology and equipment Ethernet is the dominant technology for LANs  TDAQ’s choice for networks (see [1])  multi-vendor, long term support, commodity (on-board GE adapters), etc.  Gigabit and TenGigabit Ethernet  Use GE for end-nodes  10GE whenever the bandwidth requirements exceed 1Gbit/s Multi-vendor Ethernet switches/routers available on the market:  Chassis-based devices ( ~320 Gbit/s switching)  GE line-cards: typically ~40 ports (1000BaseT)  10GE line-cards: typically 4 ports (10GBaseSR)  Pizza-box devices (~60 Gbit/s switching)  24/48 GE ports (1000BaseT)  Optional 10GE module with 2 up-links (10GBaseSR)

5 31 May, 2006Stefan Stancu5 Architecture and management principles The TDAQ networks will typically employ a multi-layer architecture:  Aggregation layer. The computers installed in the TDAQ system at Point 1 are grouped in racks, therefore it seems natural to use a rack level concentrator switch where the bandwidth constraints allow this.  Core layer. The network core is composed of chassis-based devices, receiving GE or 10GE up-links from the concentrator switches, and direct GE connections from the applications with high bandwidth requirements. The performance of the networks themselves will be continuously monitored for availability, status, traffic loss or other errors.  In-band management – The “in-band” approach allows intelligent network monitoring programs not only to indicate the exact place of a failure (root cause analysis) but also what are the repercussions of a failure to the rest of the network..  Out-of-band management – A dedicated switch management network (shown in blue in Figure 1) provides the communication between management servers (performing management and monitoring functions) and the “out-of-band” interface of all network devices from the control and data networks.

6 31 May, 2006Stefan Stancu6 Resilient Ethernet networks What happens if a switch or link fails?  Phone call, but nothing critical should happen after a single failure. Networks are made resilient by introducing redundancy:  Component-level redundancy: deployment of devices with built-in redundancy (PSU, supervision modules, switching fabric)  Network-level redundancy: deployment of additional devices/links in order to provide alternate paths between communicating nodes.  Protocols are needed to correctly (and efficiently) deal with multiple paths in the network [2]:  Layer 2 protocols: Link aggregation (trunking), spanning trees (STP, RSTP, MSTP)  Layer 3 protocols: virtual router redundancy (VRRP) for static environments, dynamic routing protocols (e.g. RIP, OSPF).

7 31 May, 2006Stefan Stancu7 Control network ~3000 end nodes.Design assumption:  No end-node will generate control traffic in excess to the capacity of a GE line.  The aggregated control traffic external to a rack of PCs (with the exception of the Online Services rack) will not exceed the capacity of a GE line. The network core is implemented with two chassis devices redundantly interconnected by two 10GE lines. A rack level concentration switch can be deployed for all units except for critical services. Protocols/Fault tolerance:  Layer 3 routed network  Static routing at the core should suffice.  One sub-net per concentrator switch  Small broadcast domains  potential layer 2 problems remain local.  Resiliency:  VRRP in case of Router/up-link failures  Interface bonding for the infrastructure servers.

8 31 May, 2006Stefan Stancu8 DataFlow network

9 31 May, 2006Stefan Stancu9 DataFlow network ~100 Gbit/s

10 31 May, 2006Stefan Stancu10 DataFlow network Two central switches One VLAN per switch (A and B) Fault tolerant (tolerate one switch failure)

11 31 May, 2006Stefan Stancu11 DataFlow network 10G ROS “concentration” Motivated by the need to use fibre transmission (distance > 100m) Full bandwidth provided by aggregating 10x GE into 1x10GE Requires the use of VLANs (and MST) to maintain a loop-free topology

12 31 May, 2006Stefan Stancu12 DataFlow network 10G L2PU concentration One switch per rack

13 31 May, 2006Stefan Stancu13 DataFlow network

14 31 May, 2006Stefan Stancu14 BackEnd network ~2000 end-nodes One core device with built-in redundancy (eventually two devices) Rack level concentration with use of link aggregation for redundant up- links to the core. Layer 3 routed network to restrict broadcast domains size.

15 31 May, 2006Stefan Stancu15 BackEnd network ~50 Gbit/s ~2.5 Gbit/s ~2000 end-nodes One core device with built-in redundancy (eventually two devices) Rack level concentration with use of link aggregation for redundant up- links to the core. Layer 3 routed network to restrict broadcast domains size.

16 31 May, 2006Stefan Stancu16 BackEnd network ~2000 end-nodes One core device with built-in redundancy (eventually two devices) Rack level concentration with use of link aggregation for redundant up- links to the core. Layer 3 routed network to restrict broadcast domains size.

17 31 May, 2006Stefan Stancu17 Interchangeable processing power Standard processor rack with up-links to both DataFlow and BackEnd networks. The processing power migration between L2 and EF is achieved by software enabling/disabling of the appropriate up-links.

18 31 May, 2006Stefan Stancu18 Switch Management Network It is good practice not to mix management traffic and normal traffic.  If the normal traffic has abnormal patterns it may overload some links and potentially starve the in-band management traffic The switch management network provides the management servers access the out-of-band interface of all the devices from the control and data networks Flat Layer 2 Ethernet network with no redundancy.

19 31 May, 2006Stefan Stancu19 Conclusions The ATLAS TDAQ system (approx. 3000 end-nodes) relies on networks for both control and data acquisition purposes. Ethernet technology (+IP) Networks architecture maps on multi-vendor devices Modular network design Resilient network design (high availability) Separate management path

20 31 May, 2006Stefan Stancu20 Back-up slides

21 31 May, 2006Stefan Stancu21 1GE links Central Sw. in SDX Option A

22 31 May, 2006Stefan Stancu22 10GE links Central Sw. in SDX Option B

23 31 May, 2006Stefan Stancu23 10GE links Central Sw. in USA Option C

24 31 May, 2006Stefan Stancu24 20 ROSs trunking

25 31 May, 2006Stefan Stancu25 20 ROSs VLANs

26 31 May, 2006Stefan Stancu26 10 ROSs single switch

27 31 May, 2006Stefan Stancu27 10 ROSs double switch

28 31 May, 2006Stefan Stancu28 Sample resiliency test (see [4]) MST

29 31 May, 2006Stefan Stancu29 Sample resiliency test MST

30 31 May, 2006Stefan Stancu30 Sample resiliency test MST

31 31 May, 2006Stefan Stancu31 Sample resiliency test MST

32 31 May, 2006Stefan Stancu32 Sample resiliency test MST

33 31 May, 2006Stefan Stancu33 Control network – Online services (a) 10Gbps Concatenated Control traffic (b) 1Gbps Direct connection to Control Core

34 31 May, 2006Stefan Stancu34 Control network – rack connectivity (a) Shared Control and Management (b) Separate Control and Management

35 31 May, 2006Stefan Stancu35 Control network – rack connectivity DC controlL2PUSFIs

36 31 May, 2006Stefan Stancu36 Control network – rack connectivity SFOEF shared ctrl/data EF separated cltr/data


Download ppt "31 May, 2006Stefan Stancu1 Control and Data Networks Architecture Proposal S. Stancu, C. Meirosu, B. Martin."

Similar presentations


Ads by Google