Download presentation
Presentation is loading. Please wait.
Published byEdwin White Modified over 9 years ago
1
Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual Meeting Vanderbilt May 14, 2014
2
ANSE: Advanced Network Services for (LHC) Experiments NSF CC-NIE Funded, 4 US Institutes: Caltech, Vanderbilt, Michigan, UT Arlington. A US ATLAS / US CMS Collaboration Goal: provide more efficient, deterministic workflows Method: Interface advanced network services, including dynamic circuits, with the LHC data management systems: PanDA in (US) Atlas PhEDEx in (US) CMS Includes leading personnel for the data production systems Kauschik De (PanDA Lead) Tony Wildish (PhEDEx Lead) 2
3
2000 MBytes/sec Performance measurements with PhEDEx and FDT for CMS FDT sustained rates: ~1500MB/sec Average over 24hrs: ~ 1360MB/sec Difference due to delay in starting jobs Bumpy plot due to binning and job size 24 Hour Throughput Reported by PhEDEx 1h moving average Throughput as reported by MonALISA 1500 1000 500 0 1500 1000 500 0 040214002216182008061012 PhEDEx testbed in ANSE T2_ANSE_Geneva & T2_ANSE_Amsterdam High Capacity link with dynamic circuit creation between storage nodes PhEDEx and storage nodes separate 4x4 SSD RAID 0 arrays, 16 physical CPU cores / machine
4
PhEDEx throughput on a shared path (with 5 Gbps of UDP cross traffic) Seamless switchover No interruption of service CMS: PhEDEx and Dynamic Circuits Latest efforts: integrating circuit awareness into the FileDownload agent: Prototype is backend agnostic; No modifications to PhEDEx DB All control logic is in the FileDownload agent Transparent for all other PhEDEx instances PhEDEx throughput on a dedicated path 1h moving average 600 400 200 1200 1000 800 04030201000605 0 Testing circuit integration into the Download agent PhEDEx transfer rates (in MB/sec) 18121009 08 07111615 14 13172221 20 19222123 Using dynamic circuits in PhEDEx allows for more deterministic workflows, useful for co-scheduling CPU with data movement Vlad Lapadatescu Tony Wildish
5
25M Jobs at > 100 Sites Now Completed Each Month 6X Growth in 3 Years (2010-13): Production and Distributed Analysis Kaushik De STEP1: Import network information into PanDA STEP2: Use network information directly to optimize workflow for data transfer/access; at a higher level than individual transfers alone Start with simple use cases leading to measureable improvements in workflow/user experience A New Plateau
6
1. Faster User Analysis Analysis jobs normally go to sites with local data: sometimes leads to long wait times due to queuing Could use network information to assign work to ‘nearby’ sites with idle CPUs and good connectivity 2. Optimal Cloud Selection Tier2s are connected to Tier1 “Clouds”, manually by the ops team (may be attached to multiple Tier1s) To be automated using network info: Algorithm under test 3. PD2P = PanDA Dynamic Data Placement: Asynchronous usage-based Repeated use of data or Backlog in Processing Make add’l copies Rebrokerage of queues new data locations PD2P is perfect for network integration Use network for strategic replication + site selection – tested soon Try SDN provisioning since this usually involves large datasets USE CASES Kaushik De
7
DYNES (NSF MRI-R2): Dynamic Circuits Nationwide: Led by Internet2 with Caltech 7 DYNES is extending circuit capabilities to ~50 US campuses Turns out to be nontrivial Will be an integral part of the point-to-point service in LHCONE Partners: I2, Caltech, Michigan, Vanderbilt. Working with I2 and ESnet on dynamic circuits issuessoftware http://internet2.edu/dynes Extended the OSCARS scope; Transition: DRAGON to PSS, OESS
8
Challenges Encountered perfSONAR deployment status For meaningful results, we need most LHC computing sites equipped with perfSONAR nodes. This is work in progress. Easy to use perfSONAR API: Was missing, but a REST API has been made available recently Inter-domain Dynamic Circuits Intra-domain systems have been in production for some time E.g. ESnet uses OSCARS as production tool since several years OESS (OpenFlow-based) also in production – single domain Inter-domain circuit provisioning continues to be hard Implementations are fragile; Error recovery tends to require manual intervention Holistic approach needed: pervasive monitoring + tracking of configuration state changes; intelligent clean-up and timeout handling NSI framework needs faster standardization, adoption and implementation among the major networks, or Future SDN-based solution: for example OpenFlow and Open Daylight
9
Some of the DYNES Challenges Encountered; Approaches to a Solution Some of the issues encountered in both the control and data planes came from immaturity of the implementation at the time Failed request left configuration on switches, causing subsequent failures Too long time to get failure notification, blocks serialized requests Error messages often erratic hard to find root cause of problem End-to-end VLAN translation not always resulting in functional data plane Static data plane configuration need changes upon upgrades Grid certificates validity (1 year), over 40+ sites led to frequent expiration issues – not DYNES specific! Solution: We use nagios to monitor certificate states at all DYNES sites, generating early warning to the local administrators. Alternate solution would be to create a DYNES CA, and administer certificates in a coordinated way. Requires a responsible party. DYNES path forward: Working with a selected subset of sites on getting automated tests failure free Taking input from these – propagate changes to other sites, and/or deploy NSI If funding allows (future proposal): an SDN based multidomain solution
10
Efficient Long Distance End to End Throughput from the Campus Over 100G Networks Harvey Newman, Artur Barczyk, Azher Mughal California Institute of Technology NSF CC-NIE Meeting, Washington DC April 30, 2014
11
SC06 BWC: Fast Data Transfer http://monalisa.cern.ch/FDT An easy to use open source Java application that runs on all major platforms Uses asynch. multithreaded system to achieve smooth, linear data flow: Streams a dataset (list of files) contin- uously through an open TCP socket No protocol Start/stops between files Sends buffers at rate matched to the monitored capability of end to end path Use independent threads to read & write on each physical device SC06 BWC: Stable disk-to-disk flows Tampa-Caltech: 10-to-10 and 8-to-8 1U Server-pairs for 9 + 7 = 16 Gbps; then Solid overnight. Using One 10G link 17.77 Gbps BWC peak; + 8.6 Gbps to and from Korea By SC07: ~70-100 Gbps per rack of low cost 1U servers I. Legrand
12
Forward to 2014: Long-distance Wide Area 100G Data Transfers 100 G Caltech SC’13 Demo: Solid 99-100G Throughput on one 100G Wave; Up to 325G WAN Traffic Caltech SC’13 Demo: Solid 99-100G Throughput on one 100G Wave; Up to 325G WAN Traffic BUT: using 100G infrastructure for production efficiently revealed several challenges Mostly end-system related: Need IRQ affinity tuning Multi-core support (multi-threaded applications) Storage controller limitations – mainly SW driver + CPU-controller-NIC flow control ? Mostly end-system related: Need IRQ affinity tuning Multi-core support (multi-threaded applications) Storage controller limitations – mainly SW driver + CPU-controller-NIC flow control ? It’s increasingly easy to saturate 100G infrastructure with well-prepared demonstration equipment – using aggregated traffic of several hosts 70-74 Gbps Caltech – Internet2 – ANA100 – CERN Note: single server, multiple TCP streams using FDT tool
13
Ramiro.Voicu@cern.ch Network Path Layout: Caltech (CHOPIN: CC-NIE) – CENIC – Internet2 – ANA100 – Amsterdam (SURFnet) – CERN (US LHCNet) Azher Mughal Ramiro Voicu CENIC CHOPIN: 100G Advanced Networking + Science Driver Targets for 2014-15 LIGO Scientific Collab. Astro Sky Surveys; VOs Geodetic + Seismic Nets Genomics: On-Chip Gene Sequencing
14
100G TCP tests CERN Caltech This Week (Cont’d) Ramiro.Voicu@cern.ch Peaks ~83Gbps on some AL2S segments You need strong and willing partners: Caltech Campus CENIC Internet2 ANA-100 SURFNet CERN + Engagement
15
Server 2: Newer generation (E5 2690V2 Ivy Bridge); same chassis as Server 1; issue with newer CPUs and the Mellanox 40GE NICs; Engaged with the vendors (Mellanox, Intel; LSI) 100G TCP Tests CERN Caltech; An Issue Ramiro.Voicu@cern.ch Server 1: ~58 Gbps Server 2: Only ~12 Gbps Expect further improvements once this issue is resolved Lessons Learned: Need a strong team with the right talents, a systems approach and especially strong partnerships: regional, national, global; manufacturers
16
CHOPIN Network Layout (CC-NIE Grant) Ramiro.Voicu@cern.ch 100GE Backbone capacity, operational External connectivity to major carriers including CENIC, Esnet, Internet2 and PacWave LIGO and IPAC are in process to join using 10G and 40G links
17
Ramiro.Voicu@cern.ch CHOPIN WAN Connections External connectivity to CENIC, Esnet, Internet2 and PacWave Able to create Layer2 paths using Internet2 OESS portal over the AL2S US footprints Dynamic Circuits through Internet2 ION over the 100GE path
18
Ramiro.Voicu@cern.ch Caltech CMS fully integrated with 100GE Backbone IP Peering with Internet2 and UFL at 100GE … ready for next LHC run Current peaks are around 8Gbps capacity, operational CHOPIN – CMS Tier2 Integration
19
Key Issue and Approach to a Solution: Next Generation System for LHC + Other Fields Present Solutions will not scale Beyond LHC Run2 We need: an agile architecture exploiting globally distributed grid, cloud, specialized (e.g. GPU) & opportunistic computing resources A Services System that moves the data flexibly and dynamically, and behaves coherently Examples do exist, with smaller but still very large scope A pervasive, agile autonomous agent architecture that deals with complexity Developed by talented system developers with a deep appreciation of networks Grid Job Lifelines-* Grid Topology MonALISA Automated Transfers on Dynamic Networks MonALISA ALICE Grid
20
Key Issues for ANSE ANSE is in its second year. We should develop the timeline and miiestones for the remainder of the project We should identify and clearly delineate a strong set of deliverables to improve data management and processing during LHC Run 2 We should communicate and discuss these with the experiments, to get their feedback and engagement We need to deal with a few key issues: Dynamic circuits: DYNES. We need a clear path forward; Can we solve some of the problems with SDN ? PerfSONAR, and filling in our monitoring needs (e.g. with ML ?) How to integrate high throughput methods in PanDA, and get high throughput (via PhEDEx) into production in CMS We have some bigger issues arising, and we need to discuss these among the PIs and project leaders during this meeting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.