Shawn McKee, Marian Babik for the

Slides:



Advertisements
Similar presentations
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Advertisements

Update on OSG/WLCG perfSONAR infrastructure Shawn McKee, Marian Babik HEPIX Spring Workshop, Oxford 23 rd - 27 th March 2015.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
Proximity service Main idea – provide “glue” between experiments and sonar topology – mainly map sonars to storages and vice versa – determine existing.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Model Bank Testing Accelerators “Ready-to-use” test scenarios to reduce effort, time and money.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Use Cases. Summary Define and understand slow transfers – Identify weak links, narrow down the source – Understand what perfSONAR measurements mean wrt.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik perfSONAR Operations Sub-group 22 nd October 2014.
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 18 h March 2015.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
WLCG Network and Transfer Metrics WG After One Year Shawn McKee, Marian Babik GDB 4 th November
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
PerfSONAR for LHCOPN/LHCONE Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Amsterdam, NL October 28 th, 2015.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Network Awareness and perfSONAR Why we want it. What are the challenges? Where are we going? Shawn McKee / University of Michigan OSG AHG - US CMS Tier-2.
WLCG Latency Mesh Comments + – It can be done, works consistently and already provides useful data – Latency mesh stable, once configured sonars are stable.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
LHCOPN / LHCONE Status Update John Shade /CERN IT-CS Summary of the LHCOPN/LHCONE meeting in Amsterdam Grid Deployment Board, October 2011.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
PerfSONAR operations meeting 3 rd October Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.
Daniele Bonacorsi Andrea Sciabà
Bob Jones EGEE Technical Director
EView/390z Management for IBM Mainframe for HPE Operations Manager i (OMi) Extending the cross-platform capabilities of Hewlett Packard Enterprise Software.
WLCG IPv6 deployment strategy
WLCG Network Discussion
Data Center Infrastructure
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
CIM Modeling for E&U - (Short Version)
perfSONAR-PS Deployment: Status/Plans
POW MND section.
LHCOPN/LHCONE perfSONAR Update
Networking for the Future of Science
Leveraging the Power of Collaboration
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
A Messaging Infrastructure for WLCG
EOB Methodology Overview
Alerting/Notifications (MadAlert)
Network Monitoring Update: June 14, 2017 Shawn McKee
ILMT/BigFix Inventory Demo
Monitoring of the infrastructure from the VO perspective
Leigh Grundhoefer Indiana University
OU BATTLECARD: Oracle Data Integrator
OU BATTLECARD: WebLogic Server 12c
Presentation transcript:

Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer Metrics Working Group CHEP 2015, Okinawa, Japan 13th April 2015

Network Integration Motivations A crucial contributor to the success of the massively scaled global computing system that delivers the analysis needs of the LHC experiments is the networking infrastructure upon which the system is built. The LHC experiments have been able to exploit excellent high-bandwidth networking in adapting their computing models for the most efficient utilization of resources. However there are still challenges and opportunities to make this even more effective for our needs. CHEP 2015

Evolving Network Capabilities We already have deployed End-to-end monitoring of our networks using perfSONAR. perfSONAR, combined with data flow performance metrics, further allows our applications to adapt based on real time conditions. New, advanced networking technologies are slowly becoming available as production equipment is upgraded and replaced Software Defined Networking(SDN) holds the potential to further leverage the network to optimize workflows and dataflows We foresee eventual proactive control of the network fabric on the part of high level applications such as experiment workload management and data management systems. We must prepare for this… CHEP 2015

Network Monitoring in WLCG/OSG Goals: Find and isolate “network” problems; alerting in time Characterize network use (base-lining) Provide a source of network metrics for higher level services Choice of a standard open source tool: perfSONAR Benefiting from the R&E community consensus Tasks achieved by the perfSONAR TF: Get monitoring in place to create a baseline of the current situation between sites Continuous measurements to track the network, alerting on problems as they develop Develop test coverage and make it possible to run “on-demand” tests to quickly isolate problems and identify problematic links CHEP 2015

Network and Transfer Metrics WG Started in May 2014, bringing together network and transfer experts Follows up on the perfSONAR TF goals Mandate Ensure all relevant network and transfer metrics are identified, collected and published Ensure sites and experiments can better understand and fix networking issues Enable use of network-aware tools to improve transfer efficiency and optimize experiment workflows Membership WLCG perSONAR support unit (regional experts), WLCG experiments, FTS, Panda, PhEDEx, FAX, Network experts (ESNet, LHCOPN, LHCONE) CHEP 2015

Network and Transfer Metrics WG Objectives Coordinate commissioning and maintenance of WLCG network monitoring Finalize, harden and maintain the perfSONAR deployment Ensure all links continue to be monitored and sites stay correctly configured Verify coverage and optimize test parameters Identify and continuously make available relevant transfer and network metrics Document metrics and their use Facilitate their integration in the middleware and/or experiment tool chain Since inception, main focus was to finalize deployment and commissioning, extend the infrastructure, but also to jump start common projects with network and transfer metrics CHEP 2015

perfSONAR Deployment http://grid-monitoring.cern.ch/perfsonar_report.txt for stats 259 perfSONAR instances registered in GOCDB/OIM 233 Active perfSONAR instances 172 Running latest version (3.4.2) LHCONE Instances at WIX, MANLAN, GEANT Amsterdam Jorge volunteered to help out here … Initial deployment coordinated by WLCG perfSONAR TF Commissioning of the network followed by WLCG Network and Transfer Metrics WG CHEP 2015

Infrastructure Monitoring Based on OMD/check_mk and MadDash MadDash developed as part of perfSONAR – 1.2 version released recently OMD/check_mk extended to cover WLCG perfSONAR needs Developed bootstrapping and auto-configuration scripts Synchronized with GOCDB/OIM and OSG configuration interface Packaged and deployed in OSG Developed new plugins - core functionality Toolkit Version, Regular Testing, NTP, Mesh configuration, Esmond (MA), Homepage, Contacts Updated to perfSONAR 3.4 information API/JSON High level functionality plugins Esmond freshness – checks if perfSONAR node’s local MA contains measurements it was configured to perform Extremely useful during commissioning CHEP 2015

Infrastructure Monitoring Auto-summaries are available per mesh Service summaries per metric type GOAL: To ensure we continue to reliably obtain ALL network metrics CHEP 2015

OSG perfSONAR Datastore All perfSONAR metrics should be collected into the OSG network datastore This is an Esmond datastore from perfSONAR (postgresql+cassandra backends) Loaded via RSV probes; currently one probe per perfSONAR instance every 15 minutes. Validation and testing ongoing in OSG Plan is to have it production ready by Q3 2015 Datastore on psds.grid.iu.edu JSON at http://psds.grid.iu.edu/esmond/perfsonar/archive/?format=json Python API at http://software.es.net/esmond/perfsonar_client.html Perl API at https://code.google.com/p/perfsonar-ps/wiki/MeasurementArchivePerlAPI CHEP 2015

Integration Projects Goal Plan Pilot projects Provide platform to integrate network and transfer metrics Enable network-aware tools (see ANSE http://cern.ch/go/M9Sj) Network resource allocation along CPU and storage Bandwidth reservation Create custom topology Plan Provide latency and trace routes and test how they can be integrated with throughput from transfer systems Provide mapping between sites/storages and sonars Uniform access to the network monitoring Pilot projects FTS performance – adding latency and routing to the optimizer Experiment’s interface to datastore CHEP 2015

Experiments Interface to Datastore Aim Develop publish/subscribe interface to perfSONAR Enable possibility to subscribe to different events (filter) and support different clients - integration via messaging, streaming data via topic/queue Provide mapping/translation btw sonar infrastructure and experiment’s topology Components esmond2mq – prototype already exists Retrieves all data (meta+raw) from esmond depending on existing mesh configs Publishes to a topic Proximity/topology service Handle mapping/translation of services (service to service; storage to sonar), service to site (sonar to site) Test different algorithms (site mapping, traceroutes, geoip) Evaluate if existing tools can be reused for this purpose CHEP 2015

Closing remarks perfSONAR widely deployed and already showing benefits in troubleshooting network issues Additional deployments within R&E/overlay networks still needed Significant progress in configuration and infrastructure monitoring Helping to reach full potential of the perfSONAR deployment OSG datastore – community network data store for all perfSONAR metrics – planned to enter production in Q3 Integration projects aiming to aggregate network and transfer metrics FTS Performance Experiment’s interface to perfSONAR ANSE project is providing “hooks” in PANDA and PhEDEx to utilize future SDN capabilities as they become available at our sites and in our networks. CHEP 2015

Questions? CHEP 2015