Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGI Network Support task force: Proposal for the identified use cases

Similar presentations


Presentation on theme: "EGI Network Support task force: Proposal for the identified use cases"— Presentation transcript:

1 EGI Network Support task force: Proposal for the identified use cases
Mario Reale IGI / GARR January 24, 2011 EGI OMB f2f meeting Amsterdam EGI.eu 1

2 Overview Description of what we propose for each one of the identified use cases GGUS EGI PERT Network-related Scheduled Maintenances TroubleShooting on-demand End-to-end Multi Domain monitoring DownCollector Policy and Cooperation 2

3 1. GGUS based Network Support workflow

4 GGUS Reference Tools: GGUS / Network Support Unit
Additional network monitoring & troubleshooting tools will be involved by the parties involved (NRENs, NOCs, PERTs..) Proposal: Implement a workflow to handle network tickets based on using GGUS Which does not foresee the establishment of a permanent EGI Network Support Team for this GGUS unit the great majority of NGIs being against it

5 Proposed GGUS Workflow 1/2
A user belonging to a given Virtual Organization (VO) experiences poor performances, or repeated failures while transferring data from site A to site B: A  B Of course first simple debuging is assumed to be carried out at the user level (possibly involving some VO support). Aim is to exclude “trivial” issues ( Software, SE down, ..) Basic troubleshooting can be provided by troubleshooting on-demand tool Also check if monitoring data are available A network ticket is then opened in GGUS describing the problem

6 Proposed GGUS workflow 2/2
The network ticket is assigned automatically to the Site administrators of site A and Site B They are both responsible for handling the ticket. However, only person should be accountable for the ticket: Site-A (A  B: originator of the data transfer) In case of User Interface node to Site-X transfer, ticket is assigned to Site-X Site Administrator, after first basic debugging by user Site administrators handle the contacting the NRENs contacts (APM, NOC) : They inform their NGI Operation Centers They should contact first local Campus Network Admins, local NREN APM NRENs will handle it, using their PERT team, APM, NOCs, experts, and possibly involving DANTE/GEANT NOC and Federated EduPERT NOCS further involve TELCO operators if/when required according to their workflows At each step the ticket originator, the VO responsible persons and the site admins are kept posted, informed, until the issue is solved

7 Network Support workflow
Ticket posted (Already after initial debugging by user failed) Responible for the ticket processing NGI Operations Center informed / COD informed Site A & Site B Grid site Administrators try to fix it (for an A to B transfer) VO/VRC application expert & Campus Net Admin involved YES Solved ? NO NREN A & NREN B NOCs try to fix it NREN A & NREN B PERTs and local APMs involved YES Solved ? INFORM Site A&B NO GEANT NOC tries to fix it Federated PERT & GEANT APMs involved YES Solved ? NO Other Actors (TELCOs NOCs & Operations..) Problem fixed. Ticket closed

8 Observations Site A  Site B coordination assumed at least until clear domain of competence for the problem/bottleneck is identified A digging into the issue much more than B should be avoided Grid-to-Network domain crossing is responsibility of the Grid Site Administrators They contact Local Campus Admins, local NREN APMs and/or NOC Many parties involved (subsequently holding the token) but from the point of view of the workflow to be implemented, GGUS just assigns the network ticket to Site-A and Site-B site administrators (A->B) or Site-X admin ( UI->X) They will have to deal with following up the ticket Ultimately, site-A Grid Admin is responsible (start point for the data transfer) Whether NRENs prefer Grid Site Admins to contact their NOC or the Local APM first is something to be clarified ( NREN questionnaire)

9 Involved Actors A B LAN A Campus A NREN A BACKBONE LAN B Campus B
NREN B RESPONSIBLE FOR TICKET user user NREN B APM NOC NREN A APM NOC Campus A Net Sup Grid Site A Admin Campus B Net Sup Grid Site B Admin GEANT/DANTE TEIN3 ORIENT SEEREN2 ALICE2/RedCLARA EUMEDCONNECT2 TELCOs ACCOUNTABLE FOR TICKET VO experts VO experts

10 Problem Solving: Actors Stack
TELCOs operators Stack / Domain Federated PERT GEANT NOC (DANTE) GEANT APM NETWORK NREN PERT NREN-TELCOs operators NREN NOC NREN Local APM Campus Network Administrator GRID Grid Site Administrator NGI Operation Center COD Applications Hands over / Requests support VO/VRC applications support/experts Informs User Functional/Geographical distance from user

11 2. EGI PERT

12 EGI PERT At this stage we feel there wasn’t enough consensus by NGIs to establish a permanent EGI PERT team providing both Grid middleware/Applications expertise PERT Networking expertise PERTs will contribute to the general Network Support workflow We propose to leave involvement of PERT Teams to NRENs and GEANT NOC Our proposal is to provide a web contact point (web page) for EGI Users and Site Administrators, to fetch information from, about General PERT Issues and basic procedures and how to reach PERT Teams of NRENs and Federated PERT if required Gather relevant PERT contact information in one location Provide a basic web guide for common PERT-related issues (Example: how do I set the TCP window size on my machine ?, how do I check my machine is not closing a fundamental port for Grid middleware ?) Pointing to the EduPERT knowledge database General procedure for direct involvement of PERT Teams should however fall in the scope of the General GGUS-based workflow

13 3. Network-related Scheduled Maintenances

14 Network Related Scheduled Maintenances
Reference tool: no specific tool for Network-related maintenances currently GOC DB for GRID-related maintenances. No tools warning users about Network-related scheduled maintenances Locally, Grid Site administrators, warned by the NREN APM of possible availability, can post the unavailability of their sites/services relies on ATM-Grid Site Admin local coordination No automation

15 Network Related Scheduled Maintenances
What is envisageable around Network-related scheduled maintenances is NRENs coordinate with corresponding NGIs in order to have a Mapping between network devices/PoPs and directly impacted Grid resource center NGIs set up a tool implementing a mapping between Grid resource centers/services and involved user to be informed NGIs and NRENs to coordinate so that when a network device/PoP is object of a Scheduled Maintenance impacting on a Grid resource center/service, NREN informs NGI. NGI informs EGI.eu Operations and Users.

16 Network Related scheduled maintenances
Today things are demanded to the good will of local APMs and Grid Site Managers, and their co-ordination A higher level workflow should be put in place, systematically addressing this issue A relevant deepening of this proposal is still required, in close coordination with the OTAG, JRA1 and EGI operations

17 Network Troubleshooting on Demand

18 Troubleshooting on demand
Based on the experience gained in EGEE SA2, the French NGI has started and developed a new tool called HINTS HINTS has been developed on a volunteering basis by UREC CNRS Flexible, based on PerfSONAR web services and protocols The Task Force proposes this tool for Network Troubleshooting Presentation later on today

19 End-to-end multi domain monitoring

20 E2E multi domain monitoring
The reference tool for e2e multi domain monitoring we propose is PerfSONAR Many NRENs are familiar with it Long term development by many key organizations and projects both in Europe and America GEANT project The Spanish NREN RedIRIS developed a customized version of PerfSONAR on a live CD for e2e measurements The PerfSONAR Team is presenting today its tools and Use Cases The NetJobs tools has been developed by CNRS and GARR to perform basic network monitoring measurements using Grid Jobs No need for local deployment Presentations today

21 Down Collector

22 DownCollector The DownCollector is currently in use
EGI Inspire TSA1.4 Our proposal is to improve packaging, installation and configuration of the tool to ease the creation of new instances for the NGIs willing to deploy an instance – could be achieved with reasonable effort On the longer term an integrated system could be built, gathering information from the various distributed instances Building a mesh ( service at site X reached by site Y ) It has to be further discussed also with TSA1.4, JRA1 and OTAG However this cannot be endorsed immediately given manpower, responses to questionnaire and the pending discussions with relevant tasks/teams

23 Policy and Cooperation

24 Policy and Cooperation
Majority of NGIs is against the establishment of permanent EGI Network Support body for policy and cooperation Some of them are very much in favor though We did not elaborate enough a structured proposal We have only identified fields for cooperation At this stage, our proposal is to invite volunteering NGIs to join the activity of the Network Support coordination (within EGI-Inspire TSA1.7) to further discuss this issue and elaborate a plan for ensuring issues are tackled within TSA1.7

25 General Issues The proposed GGUS workflow for Network support assumes some basic Network checks are done at the user level  We should ensure users are familiar with basic network debugging operations and tools we are providing Users should somehow try to refer to their site administrators first Site administrators have to deal with network-related issues; they’re likely to have at least basic network know-how  Should we foresee Training on the tools we want to provide ? A guide for users and site administrators around network support and related debugging procedures/tools ?

26 General Issues For some use cases we should find out more about the NREN-NGI interaction A Questionnaire for NRENs about Grids and NGIs has to be organized, aimed at clarifying Which are the current NREN-NGI communication channels (especially for NGI operations..) ? How feasible is to set up a global system to automatically advertise network-related scheduled maintenances and accidents to users and Site Admins ? What are the used/preferred Multi Domain tools NRENs are currently using and are familiar with ?

27 General Issues Network-related Scheduled Maintenances requires further analysis to refine our proposal and design/identify the corresponding tools We need to acquire more information from NRENs We also need more internal discussion Within the GGUS-based workflow, we should find out / decide whether NRENs prefer Grid Site admins to contact their local APMs first or their NOC Also to be asked to NRENs

28 References EGI Network Support coordination: PERT: Federated PERT Knowledge DataBase


Download ppt "EGI Network Support task force: Proposal for the identified use cases"

Similar presentations


Ads by Google