EGI-InSPIRE EGI-InSPIRE RI Network Troubleshooting and PerfSONAR-Lite_TSS Mario Reale GARR
EGI-InSPIRE RI During EGEE III a task of the SA2 activity was dedicated to the provisioning of a network monitoring solution for EGEE The emphasis drifted –from “monitoring” to “troubleshooting”, –from “scheduled measurements” to “on- demand”, –.. and to light deployment DFN (RRZN Erlangen) designed a tool based on the widely known/used A little bit of history
EGI-InSPIRE RI Same concept Launch test on demand from one site under the control of the central server: ping, traceroute, DNS lookup, nmap and bandwith measurements Local site light probe Central web monitoring server 1 Grid site B TOC NOC / ROCs members site administrator / troubleshooter Grid site A Authentication Authorization Process Login/passwd or certificate Mutual authentication Security based on IP address BWCTL port open on demand
EGI-InSPIRE RI Networking Support – Xavier Jeannin - EGEE-III First Review June Network monitoring tools Network monitoring tools for efficient remote troubleshooting –PerfSONAR-Lite TroubleShooting Service –Launch test on demand from a Grid site under central server control : –Bandwidth measurements, DNS lookup, Traceroute, Port testing, Ping ENOC Local site light PerfSONAR’s probe Central ENOC monitoring server 1 Grid site B ENOC supervisor ROCs members site administrator Grid site A Authentication Authorization Process PerfSONAR-Lite TSS –is easy to use for the Grid administrators –can be used quickly by site admin without the obligation to make contact with the remote site involved in the problem –fills the lack of network diagnostic tool
EGI-InSPIRE RI Networking Support – Xavier Jeannin - EGEE-III First Review June Network monitoring tools First version was released and installed on 6 sites Installation guide and procedure – perfsonar-lite-tss/ perfsonar-lite-tss/ –FAQ, tutorial, new features (users, sites, ROC management) –Software authorization schema was adapted to be able to fit with hierarchical EGI/NGI model Difficult to deploy the software during the transition phase toward EGI
EGI-InSPIRE RI New hierarchy TOC = Top Level Operation Center (in EGI context it should be EGI NOC at GARR) OC = Operation Center (NOC, in EGI context it should be NGIs)) S = Site TOC This organization is more flexible a site can be included in several operation centers operation center can be created easily
EGI-InSPIRE RI Become a non specialized software rather than a dedicated software –Used by any kind of operating centers Improve security –Probe should be accessible for bandwidth control test only by the source site Accept both login/password & certificate authentication Improve the general design of web forms that were not user friendly Maintain continuously a list of active probes for the end user Automate the probe installation more Improvement
EGI-InSPIRE RI As the database schema has to be changed deeply, we decided to rewrite completely the web server part –But the probe software part should not be modified (or very slightly) : OPPD, BWCTLD Use a Java framework technology ZK –More efficient development Work has been started by 2 trainees (Alexandre AL ABAYAJI and Youssef Diouane) Development choice
Site’s probe OPPD ping traceroute DNS lookup Port Scan BWCTLD Central server HINTS Local DB Web server Site’s probe OPPD ping traceroute DNS lookup Port Scan BWCTLD Soap message Software architecture Users : ENOC NOC / ROCs members site administrator / troubleshooter AA
EGI-InSPIRE RI Learning ZK and eclipse environment New design of the database and implementation All the technical problems for the development with ZK have been solved: –Troubleshooting is working –Multi-authentication and SSL authentication on web server (login/passwd or certificate) The state of the development
EGI-InSPIRE RI Improvement of the web interface ergonomics Lot of features are still not available –Manage sites –Manage operating centers –Update the list of active probes… GOC-DB data importation and synchronization SSL between web server and the probe Automate probe installation We plan to have a prototype version by the end of November Pending work and time schedule