Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual.

Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual Meeting Vanderbilt May 14, 2014

ANSE: Advanced Network Services for (LHC) Experiments  NSF CC-NIE Funded, 4 US Institutes: Caltech, Vanderbilt, Michigan, UT Arlington. A US ATLAS / US CMS Collaboration  Goal: provide more efficient, deterministic workflows  Method: Interface advanced network services, including dynamic circuits, with the LHC data management systems:  PanDA in (US) Atlas  PhEDEx in (US) CMS  Includes leading personnel for the data production systems  Kauschik De (PanDA Lead)  Tony Wildish (PhEDEx Lead) 2

2000 MBytes/sec Performance measurements with PhEDEx and FDT for CMS  FDT sustained rates: ~1500MB/sec  Average over 24hrs: ~ 1360MB/sec  Difference due to delay in starting jobs  Bumpy plot due to binning and job size 24 Hour Throughput Reported by PhEDEx 1h moving average Throughput as reported by MonALISA 1500 1000 500 0 1500 1000 500 0 040214002216182008061012 PhEDEx testbed in ANSE T2_ANSE_Geneva & T2_ANSE_Amsterdam High Capacity link with dynamic circuit creation between storage nodes PhEDEx and storage nodes separate 4x4 SSD RAID 0 arrays, 16 physical CPU cores / machine

PhEDEx throughput on a shared path (with 5 Gbps of UDP cross traffic) Seamless switchover No interruption of service CMS: PhEDEx and Dynamic Circuits Latest efforts: integrating circuit awareness into the FileDownload agent: Prototype is backend agnostic; No modifications to PhEDEx DB All control logic is in the FileDownload agent Transparent for all other PhEDEx instances PhEDEx throughput on a dedicated path 1h moving average 600 400 200 1200 1000 800 04030201000605 0 Testing circuit integration into the Download agent PhEDEx transfer rates (in MB/sec) 18121009 08 07111615 14 13172221 20 19222123 Using dynamic circuits in PhEDEx allows for more deterministic workflows, useful for co-scheduling CPU with data movement Vlad Lapadatescu Tony Wildish

25M Jobs at > 100 Sites Now Completed Each Month 6X Growth in 3 Years (2010-13): Production and Distributed Analysis Kaushik De  STEP1: Import network information into PanDA  STEP2: Use network information directly to optimize workflow for data transfer/access; at a higher level than individual transfers alone  Start with simple use cases leading to measureable improvements in workflow/user experience A New Plateau

1. Faster User Analysis  Analysis jobs normally go to sites with local data: sometimes leads to long wait times due to queuing  Could use network information to assign work to ‘nearby’ sites with idle CPUs and good connectivity 2. Optimal Cloud Selection  Tier2s are connected to Tier1 “Clouds”, manually by the ops team (may be attached to multiple Tier1s)  To be automated using network info: Algorithm under test 3. PD2P = PanDA Dynamic Data Placement: Asynchronous usage-based  Repeated use of data or Backlog in Processing  Make add’l copies  Rebrokerage of queues  new data locations  PD2P is perfect for network integration  Use network for strategic replication + site selection – tested soon  Try SDN provisioning since this usually involves large datasets USE CASES Kaushik De

DYNES (NSF MRI-R2): Dynamic Circuits Nationwide: Led by Internet2 with Caltech 7 DYNES is extending circuit capabilities to ~50 US campuses Turns out to be nontrivial Will be an integral part of the point-to-point service in LHCONE Partners: I2, Caltech, Michigan, Vanderbilt. Working with I2 and ESnet on dynamic circuits issuessoftware http://internet2.edu/dynes Extended the OSCARS scope; Transition: DRAGON to PSS, OESS

Challenges Encountered  perfSONAR deployment status  For meaningful results, we need most LHC computing sites equipped with perfSONAR nodes. This is work in progress.  Easy to use perfSONAR API: Was missing, but a REST API has been made available recently  Inter-domain Dynamic Circuits  Intra-domain systems have been in production for some time  E.g. ESnet uses OSCARS as production tool since several years  OESS (OpenFlow-based) also in production – single domain  Inter-domain circuit provisioning continues to be hard  Implementations are fragile; Error recovery tends to require manual intervention  Holistic approach needed: pervasive monitoring + tracking of configuration state changes; intelligent clean-up and timeout handling  NSI framework needs faster standardization, adoption and implementation among the major networks, or  Future SDN-based solution: for example OpenFlow and Open Daylight

Some of the DYNES Challenges Encountered; Approaches to a Solution  Some of the issues encountered in both the control and data planes came from immaturity of the implementation at the time  Failed request left configuration on switches, causing subsequent failures  Too long time to get failure notification, blocks serialized requests  Error messages often erratic  hard to find root cause of problem  End-to-end VLAN translation not always resulting in functional data plane  Static data plane configuration need changes upon upgrades  Grid certificates validity (1 year), over 40+ sites led to frequent expiration issues – not DYNES specific!  Solution: We use nagios to monitor certificate states at all DYNES sites, generating early warning to the local administrators.  Alternate solution would be to create a DYNES CA, and administer certificates in a coordinated way. Requires a responsible party.  DYNES path forward:  Working with a selected subset of sites on getting automated tests failure free  Taking input from these – propagate changes to other sites, and/or deploy NSI  If funding allows (future proposal): an SDN based multidomain solution

Efficient Long Distance End to End Throughput from the Campus Over 100G Networks Harvey Newman, Artur Barczyk, Azher Mughal California Institute of Technology NSF CC-NIE Meeting, Washington DC April 30, 2014

SC06 BWC: Fast Data Transfer http://monalisa.cern.ch/FDT  An easy to use open source Java application that runs on all major platforms  Uses asynch. multithreaded system to achieve smooth, linear data flow:  Streams a dataset (list of files) contin- uously through an open TCP socket  No protocol Start/stops between files  Sends buffers at rate matched to the monitored capability of end to end path  Use independent threads to read & write on each physical device  SC06 BWC: Stable disk-to-disk flows Tampa-Caltech: 10-to-10 and 8-to-8 1U Server-pairs for 9 + 7 = 16 Gbps; then Solid overnight. Using One 10G link  17.77 Gbps BWC peak; + 8.6 Gbps to and from Korea By SC07: ~70-100 Gbps per rack of low cost 1U servers I. Legrand

Forward to 2014: Long-distance Wide Area 100G Data Transfers 100 G Caltech SC’13 Demo: Solid 99-100G Throughput on one 100G Wave; Up to 325G WAN Traffic Caltech SC’13 Demo: Solid 99-100G Throughput on one 100G Wave; Up to 325G WAN Traffic BUT: using 100G infrastructure for production efficiently revealed several challenges Mostly end-system related: Need IRQ affinity tuning Multi-core support (multi-threaded applications) Storage controller limitations – mainly SW driver + CPU-controller-NIC flow control ? Mostly end-system related: Need IRQ affinity tuning Multi-core support (multi-threaded applications) Storage controller limitations – mainly SW driver + CPU-controller-NIC flow control ? It’s increasingly easy to saturate 100G infrastructure with well-prepared demonstration equipment – using aggregated traffic of several hosts 70-74 Gbps Caltech – Internet2 – ANA100 – CERN Note: single server, multiple TCP streams using FDT tool

Ramiro.Voicu@cern.ch Network Path Layout: Caltech (CHOPIN: CC-NIE) – CENIC – Internet2 – ANA100 – Amsterdam (SURFnet) – CERN (US LHCNet) Azher Mughal Ramiro Voicu CENIC CHOPIN: 100G Advanced Networking + Science Driver Targets for 2014-15  LIGO Scientific Collab.  Astro Sky Surveys; VOs  Geodetic + Seismic Nets  Genomics: On-Chip Gene Sequencing

100G TCP tests CERN  Caltech This Week (Cont’d) Ramiro.Voicu@cern.ch Peaks ~83Gbps on some AL2S segments You need strong and willing partners: Caltech Campus CENIC Internet2 ANA-100 SURFNet CERN + Engagement

Server 2: Newer generation (E5 2690V2 Ivy Bridge); same chassis as Server 1; issue with newer CPUs and the Mellanox 40GE NICs; Engaged with the vendors (Mellanox, Intel; LSI) 100G TCP Tests CERN  Caltech; An Issue Ramiro.Voicu@cern.ch Server 1: ~58 Gbps Server 2: Only ~12 Gbps Expect further improvements once this issue is resolved Lessons Learned: Need a strong team with the right talents, a systems approach and especially strong partnerships: regional, national, global; manufacturers

CHOPIN Network Layout (CC-NIE Grant) Ramiro.Voicu@cern.ch 100GE Backbone capacity, operational External connectivity to major carriers including CENIC, Esnet, Internet2 and PacWave LIGO and IPAC are in process to join using 10G and 40G links

Ramiro.Voicu@cern.ch CHOPIN WAN Connections External connectivity to CENIC, Esnet, Internet2 and PacWave Able to create Layer2 paths using Internet2 OESS portal over the AL2S US footprints Dynamic Circuits through Internet2 ION over the 100GE path

Ramiro.Voicu@cern.ch Caltech CMS fully integrated with 100GE Backbone IP Peering with Internet2 and UFL at 100GE … ready for next LHC run Current peaks are around 8Gbps capacity, operational CHOPIN – CMS Tier2 Integration

Key Issue and Approach to a Solution: Next Generation System for LHC + Other Fields  Present Solutions will not scale Beyond LHC Run2  We need: an agile architecture exploiting globally distributed grid, cloud, specialized (e.g. GPU) & opportunistic computing resources  A Services System that moves the data flexibly and dynamically, and behaves coherently  Examples do exist, with smaller but still very large scope  A pervasive, agile autonomous agent architecture that deals with complexity  Developed by talented system developers with a deep appreciation of networks Grid Job Lifelines-* Grid Topology MonALISA Automated Transfers on Dynamic Networks MonALISA ALICE Grid

Key Issues for ANSE  ANSE is in its second year. We should develop the timeline and miiestones for the remainder of the project  We should identify and clearly delineate a strong set of deliverables to improve data management and processing during LHC Run 2  We should communicate and discuss these with the experiments, to get their feedback and engagement  We need to deal with a few key issues:  Dynamic circuits: DYNES. We need a clear path forward; Can we solve some of the problems with SDN ?  PerfSONAR, and filling in our monitoring needs (e.g. with ML ?)  How to integrate high throughput methods in PanDA, and get high throughput (via PhEDEx) into production in CMS  We have some bigger issues arising, and we need to discuss these among the PIs and project leaders during this meeting

Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual.

Similar presentations

Presentation on theme: "Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual.

Similar presentations

Presentation on theme: "Advanced Network Services for Experiments (ANSE) Introduction: Progress, Outlook and Issues Harvey B Newman California Institute of Technology ANSE Annual."— Presentation transcript:

Similar presentations

About project

Feedback