Download presentation
Presentation is loading. Please wait.
Published byKevin Chandler Modified over 9 years ago
1
Network Performance for ATLAS Real-Time Remote Computing Farm Study Alberta, CERN Cracow, Manchester, NBI MOTIVATION Several experiments, including ATLAS at the Large Hadron Collider (LHC) and D0 at Fermi Lab, have expressed interest in using remote computing farms for processing and analysing, in real time, the information from particle collision events. Different architectures have been suggested from pseudo-real-time file transfer and subsequent remote processing, to the real-time requesting of individual events as described here. To test the feasibility of using remote farms for real-time processing, a collaboration was set up between members of ATLAS Trigger/DAQ community, with support from several national research and education network operators (DARENET, Canarie, Netera, PSNC, UKERNA and Dante) to demonstrate a Proof of Concept and measure end-to-end network performance. The testbed was centred at CERN and used three different types of wide area high-speed network infrastructures to link the remote sites: an end-to-end lightpath (SONET circuit) to the University of Alberta in Canada standard Internet connectivity to the University of Manchester in the UK and the Niels Bohr Institute in Denmark a Virtual Private Network (VPN) composed out of an MPLS tunnel over the GEANT and an Ethernet VPN over the PIONIER networks to IFJ PAN Krakow in Poland. Remote Computing Concepts ROB L2PU SFI PF Local Event Processing Farms ATLAS Detectors – Level 1 Trigger SFOs Mass storage Experimental Area CERN B513 Copenhagen Edmonton Krakow Manchester PF Remote Event Processing Farms PF lightpaths PF Data Collection Network Back End Network GÉANT Switch Level 2 Trigger Event Builders CERN-Manchester TCP Activity TCP/IP behaviour of the ATLAS Request- Response Application Protocol observed with Web100 64 Byte Request in Green 1 Mbyte reponse in Blue TCP in Slow Start takes 19 round trips or ~ 380 ms TCP Congestion window in Red This is reset by TCP on each Request due to lack of data sent by the application over the network. TCP obeys RFC 2518 & RFC 2861 Observation of the Status of Standard TCP with web100 Observation of TCP with no Congestion window reduction TCP Congestion window in Red grows nicely Request-response takes 2 rtt after 1.5 s Rate ~ 10 events/s with 50 ms processing time Transfer achievable throughput grows to 800 Mbit/s Data Transferred when the Application requires the data 3 Round Trips 2 Round Trips The ATLAS Application Protocol Send OK Send event data Request event ●● ● Request Buffer Send processed event Process event Time Request-Response time (Histogram) Event Filter EFD SFI and SFO Event Request: EFD requests an event from SFI SFI replies with the event data Processing of the event occurs Return of Computation: EF asks SFO for buffer space SFO send OK EF transfers the results of the computation CERN-Alberta TCP Activity 64 Byte Request in Green 1 Mbyte reponse in Blue TCP in Slow Start takes 12 round trips or ~ 1.67 s Observation of TCP with no Congestion window reduction with web100 TCP Congestion window in Red grows gradually after slowstart Request-response takes 2 rtt after ~2.5 s Rate ~ 2.2 events/s with 50 ms processing time Transfer achievable throughput grows from 250 to 800 Mbit/s 2 RoundTrips Principal partners Web100 parameters on the server located at CERN (data source) Green – small requests Blue – big responses TCP ACK packets also counted (in each direction) One response = 1 MB ~ 380 packets 64 byte Request 1 Mbyte Response CERN-Kracow TCP Activity Steady state request-response latency ~140 ms Rate ~ 7.2 events/s First event takes 600 ms due to TCP slow start
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.