PerfSONAR operations meeting 3 rd October 2014. Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model.

Slides:



Advertisements
Similar presentations
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Advertisements

PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014.
EGI-Engage Recent Experiences in Operational Security: Incident prevention and incident handling in the EGI and WLCG infrastructure.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES WLCG operations: communication channels Andrea Sciabà WLCG operations.
DBS to DBSi 5.0 Environment Strategy Quinn March 22, 2011.
What if you suspect a security incident or software vulnerability? What if you suspect a security incident at your site? DON’T PANIC Immediately inform:
Use Cases. Summary Define and understand slow transfers – Identify weak links, narrow down the source – Understand what perfSONAR measurements mean wrt.
Network Monitoring for OSG Shawn McKee/University of Michigan OSG Staff Planning Retreat July 10 th, 2012 July 10 th, 2012.
Network and Transfer WG Metrics Area Meeting Shawn McKee, Marian Babik Network and Transfer Metrics Kick-off Meeting 26 h November 2014.
Connect communicate collaborate perfSONAR MDM updates: New interface, new weathermap, towards a complete interoperability Domenico Vicinanza perfSONAR.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 8 th April 2015.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik perfSONAR Operations Sub-group 22 nd October 2014.
Update on OSG/WLCG Network Services Shawn McKee, Marian Babik 2015 WLCG Collaboration Workshop 12 th April 2015.
LCG/EGEE Security Operations HEPiX, Fall 2004 BNL, 22 October 2004 David Kelsey CCLRC/RAL, UK
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
Update on WLCG/OSG perfSONAR Infrastructure Shawn McKee, Marian Babik HEPiX Fall 2015 Meeting at BNL 13 October 2015.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
Network and Transfer Metrics WG Meeting Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 18 h March 2015.
WLCG perfSONAR-PS Update Shawn McKee/University of Michigan WLCG Network and Transfers Metrics Co-Chair Spring 2014 HEPiX LAPP, Annecy, France May 21 st,
HEPiX IPv6 Working Group David Kelsey GDB, CERN 11 Jan 2012.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Network and Transfer WG perfSONAR operations Shawn McKee, Marian Babik Network and Transfer Metrics WG Meeting 28 h January 2015.
Update on Network and Transfer Metrics WG Shawn McKee, Marian Babik GDB 8 th October 2014.
PerfSONAR Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Cambridge, UK February 9 th, 2015.
OSG Networking: Summarizing a New Area in OSG Shawn McKee/University of Michigan Network Planning Meeting Esnet/Internet2/OSG August 23 rd, 2012.
PerfSONAR for LHCOPN/LHCONE Update Shawn McKee/University of Michigan LHCONE/LHCOPN Meeting Amsterdam, NL October 28 th, 2015.
WLCG Latency Mesh Comments + – It can be done, works consistently and already provides useful data – Latency mesh stable, once configured sonars are stable.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
Area Coordinator Report for Operations Rob Quick 4/10/2008.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
HEPiX IPv6 Working Group David Kelsey david DOT kelsey AT stfc DOT ac DOT uk (STFC-RAL) HEPiX, Vancouver 26 Oct 2011.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
WLCG Information System Status Maria Alandes Pradillo, CERN CERN IT Department, Support for Distributed Computing Group GDB 9 th September 2015.
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Regional Helpdesk GRNET Example Gkamas Vasileios NGI_GRNET User Support Team.
Operations Coordination Team Maria Girone, CERN IT-ES GDB, 11 July 2012.
WLCG IPv6 deployment strategy
Shawn McKee, Marian Babik for the
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
David Kelsey CCLRC/RAL, UK
perfSONAR-PS Deployment: Status/Plans
POW MND section.
LHCOPN/LHCONE perfSONAR Update
Networking for the Future of Science
Report on SLA progress Ioannis Liabotis <ilaboti at grnet.gr>
Monitoring the US ATLAS Network Infrastructure with perfSONAR-PS
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Update from the HEPiX IPv6 WG
Colorado Plugfest Planning
Shawn McKee/University of Michigan ATLAS Technical Interchange Meeting
TYPES OF SERVER. TYPES OF SERVER What is a server.
Alerting/Notifications (MadAlert)
LHCONE perfSONAR: Status and Plans
Network Monitoring Update: June 14, 2017 Shawn McKee
WLCG and support for IPv6-only CPU
Solutions for federated services management EGI
Leigh Grundhoefer Indiana University
ILMT/BigFix Inventory Demo
“Detective”: Integrating NDT and E2E piPEs
Technical Outreach Expert
e-Infrastructure commons Updates
Building a minimum viable Security Operations Centre
Operations Management Board March 26
Presentation transcript:

perfSONAR operations meeting 3 rd October 2014

Agenda Propose changes to the current operations of perfSONAR Discuss current and future deployment model Discuss and agree on the steps to take in the different areas of the perfSONAR operations Discuss coming release of perfSONAR (3.4) and its impact Discuss Shell Shock vulnerability Identify early-adopters for the testing of the new mesh configuration interface

perfSONAR deployment (present)

perfSONAR deployment (future) For WLCG data store plan see [2]

perfSONARs Metrics to gather – Bandwidth For each cloud mesh, a 30 sec test using Iperf every 6 hours between all members in mesh Additionally have a 30 sec test once per week between ALL WLCG perfSONAR bandwidth hosts – Latency/Packet-loss (10 Hz of packets for EACH test) OWAMP (one-way delay) tests between all mesh members; summarized every minute (600 packets) OWAMP tests between mesh members and all Tier-1 instances – Traceroute Traceroute run every hour between ALL WLCG perfSONAR latency hosts

Operations Team Mesh Leaders ( – Participate in the activities and meetings of the WLCG Network and Transfer Metrics WG – Serve as the primary contact for their mesh – Monitor tests within their mesh regularly (1/week) identifying any problems – Work with sites within their mesh to fix identified problems including failing tests, out-of-date software and mis-configuration

Helpdesk/Documentation Operations team mailing list wlcg-perfsonar- Proposing to establish GGUS SU and link it to the mailing list – 3 rd level experts SU (WLCG perfSONAR support) Complete rewrite of the current documentation is needed – – Integration (new sites) – requirements, procedure – Installation/upgrade guide – Troubleshooting/FAQ – Guide for experiments (storage APIs, dashboard, etc.)

Infrastructure Monitoring Functional/Availability testing – User: WLCGps, pw during meeting – Common tests: perfSONAR-BUOY Measurment Archive perfSONAR-PS Administration Details perfSONAR-PS Latitude/Longitude Configured perfSONAR-PS Toolkit Version PS-Homepage PS-Homepage-No-Cert – Bandwidth node specific tests: Bandwidth Test Controller NDT (drop ?) NPAD (drop ?) – Latency node specific tests: One-way Ping Service OWAMP Traceroute Measurement Archive Based on web scraping, but in 3.4 we do have an API Accessibility (current solution sub-optimal) Missing metrics ?

Infrastructure Monitoring (2) Dashboard with integrated information (in SSB) Temporary solution proposed is perfsonar report – offeres aggregated information and can be a good guide for mesh leaders – (updated daily) WLCG perfSONAR service status report on :03: ======= perfSONAR instances monitored: 214 perfSONAR-PS versions deployed: : : 95 Unknown: 113 GOCDB registered total: 173 OIM registered total: 55 Unreachable instances (not monitored): 91 Incorrectly configured (failing >4 metrics): 112 perfSONAR instances registered but not monitored (not included in any mesh) perfSONAR instances monitored, but not registered in GOCDB/OIM perfSONAR instances with incorrect versions perfSONAR instances with unknown versions: Unreachable instances Incorrectly configured (failing >4 metrics)

WLCG Mesh Configuration New web-based mesh configuration is now available (developed and deployed by OSG) perfSONAR instances and details retrieved from GOCDB/OIM – Instances can be grouped into Host Groups – Metrics to be measured by sonars configured via Parameter Sets – Meshes (tests) defined by binding Host Groups with Parameter Sets Auto-configuration URL to be available – based on source hostname all meshes where host participates will be provided Volunteers are needed to test and validate this (if you are interested Shawn smckee ‘at’ umich.edu)

Infrastructures OSG/EGI Very important to engage them in our operations – Having sonars registered in GOCDB/OIM essential – Proposing to offer some of our infrastructure metrics (toolkit-versions) to be also monitored by their tools – Clarify communication channels with the infrastructure security teams

Shell Shock Vulnerability As reported yesterday at WLCG ops coordination: Details on the shell shock vulnerabilites and its impact on perfSONAR available at AR AR We recommend ALL sites that didn't patch bash before Friday Sep 26 to terminate their instances and wait until perfSONAR 3.4 is released.

Remediation plan perfSONAR 3.4 to be released on Mon Oct 6 Focus on having just one campaign to reinstall, upgrade to 3.4 and move to new mesh configurations Proposed timeline – Review current configuration wrt. firewall rules [1] – Offer new auto-configuration URL (will redirect to CERN for now, but can be changed to WLCG mesh configuration) – Write new installation guide for 3.4 – Test and validate this works as expected Send WLCG and EGI broadcast with the (re)installation instructions to ALL sites – Updated WLCG Deployment page will outline details[4] Set and announce deadline for ALL sites to update to 3.4

AOB GDB WG update on Wed 8 th OCT CHEP2015 abstract by 15 th OCT (first draft ready) Coming meetings: – perfSONAR operations 20 Oct - 24 Oct – Metrics area 13-17th October Please check our Twiki and send me your comments ( rkTransferMetrics)

Refs [1] Firewall configuration for perfSONAR 3.4 (note this is work in progress, link will change) [2] OSG data store plan ePlan ePlan [3] WLCG mesh configuration [4] WLCG perfSONAR deployment page