Download presentation
Presentation is loading. Please wait.
Published byJulia Sherman Modified over 9 years ago
1
Geneva – Kraków network measurements for the ATLAS Real-Time Remote Computing Farm Studies R. Hughes-Jones (Univ. of Manchester), K. Korcyl (IFJ-PAN), C. Meirosu (CERN) MOTIVATION Several experiments, including ATLAS at the Large Hadron Collider (LHC) and D0 at Fermi Lab, have expressed interest in using remote computing farms for processing and analysing, in real time, the information from particle collision events. Different architectures have been suggested from pseudo-real-time file transfer and subsequent remote processing, to the real-time requesting of individual events as described here. To test the feasibility of using remote farms for real-time processing, a collaboration was set up between members of ATLAS Trigger/DAQ community, with support from several national research and education network operators (DARENET, Canarie, Netera, PSNC, UKERNA and Dante) to demonstrate a Proof of Concept and measure end-to-end network performance. The testbed was centred at CERN and used three different types of wide area high-speed network infrastructures to link the remote sites: an end-to-end lightpath (SONET circuit) to the University of Alberta in Canada standard Internet connectivity to the University of Manchester in the UK and the Niels Bohr Institute in Denmark a Virtual Private Network (VPN) composed out of an MPLS tunnel over the GEANT and an Ethernet VPN over the PIONIER networks to IFJ PAN Krakow in Poland. EQUIPMENT We developed custom measuring equipment to measure the quality of service at Layer 2/3. The equipment is based on an Alteon Gigabit Ethernet (GE) network interface card (NIC) reprogrammed to act as an IP traffic generator, a custom clock card and commercial Global Positioning System equipment (used as a global clock time reference). We can measure one- way latency, inter-arrival time, frame loss and re-ordering on a packet-by-packet basis, as a function of load, up to Gigabit Ethernet speed. For the Layer 4 (TCP) measurements we developed and used the “tcpmon” program. The tcpmon is an instrumented request-response program that emulates the communication between the EFD and SFI components of the ATLAS Event Filter. It is builds on the experience of UDPmon, a generic tool that can be used to automate hardware and network performance measurements using UDP packets. UDPmon calculates the CPU load and the number of interrupts generated by the network interface card during a given test, along with standard network-related parameters like latency, inter-arrival time and bandwidth. TCPmon and UDPmon run on standard PCs under the Linux operating system CONCLUSIONS The quality of service over long-distance connectivity may vary quite a lot momentarily, even though it might still meet the Service Level Agreement (based on long-term averages). Long-distance Ethernet circuits, tunneled over routed networks, may produce out-of-order packets – which would be not the case in a LAN environment. Application with real-time requirements should monitor the performance of the underlying network and adapt accordingly. Out of order packets are important for our application. Studies are under way to determine the real impact. CERN RESULTS – summary: out of order frames present, even if the connection is composed of a Layer 2 tunnel over MPLS and a pure Ethernet VLAN the number of out of order frames may vary, depending on the offered load - we have not observed any out of order frames during our tests for loads lower than 500 Mbit/s relevant to the 1.5 MB transfers we would have in our application ? Yes ! minimal: using a modern TCP stack, if frames are “not too much out of order”, the stack will not request re-transmits. But the CPU load, required for the bookkeeping, is higher. worst case: the stack will require a re-transmit, halving the TCP window in the process hence reducing the maximum transfer rate for a given time interval ATLAS Application Protocol Event Request –EFD requests an event from SFI –SFI replies with the event ~2Mbytes Processing of event Return of computation –EF asks SFO for buffer space –SFO sends OK –EF transfers results of the computation Send OK Send event data Request event ●● ● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter EFD SFI and SFO Remote Computing Concepts ROB L2PU SFI PF Local Event Processing Farms ATLAS Detectors – Level 1 Trigger SFOs Mass storage Experimental Area CERN B513 Copenhagen Edmonton Krakow Manchester PF Remote Event Processing Farms PF lightpaths PF Data Collection Network Back End Network GÉANT Switch Level 2 Trigger Event Builders GDAŃSK POZNAŃ ZIELONA GÓRA KATOWICE KRAKÓW LUBLIN WARSZAWA BYDGOSZCZ TORUŃ CZĘSTOCHOWA BIAŁYSTOK OLSZTYN RZESZÓW BIELSKO-BIAŁA GÉANT 10 Gb/s Metropolitan Area Networks 622 Mb/s 155 Mb/s 10 Gb/s OWN FIBERS GÉANT LEASED CHANNELS KOSZALIN SZCZECIN WROCŁAW ŁÓDŹ KIELCE PUŁAWY OPOLE RADOM KRAKÓW PNSC steady state request-response latency: ~140 ms event rate: ~7.2 events/s the first event took 600 ms (due to the start-up time on the TCP connection) Web100 parameters on the server located at CERN (data source) Green – small requests Blue – big responses TCP ACK packets also counted (in each direction) One response = 1 MB ~ 380 packets Geant
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.