EGI-InSPIRE RI Availability: Despite the efforts to keep it up and running, we see that it is not always available and queries fail sometimes. We had different type of problems : 1) with the database, occurred last year: located on the whole IN2P3-CC site. The situation should be more stable now 2) different glitches (especially with the Operations Dashboard): should be solved with the next version of the dashboard 3) Please send us or open tickets when the portal is not available. This is complicated to know what happens one month ago. Increased Availability 2
EGI-InSPIRE RI Messages received more than once : Ops Portal cannot check on which mailing lists people are registered and people are registered on several lists! In the VO management tools, there will a connection between account portal and the VO. We can see the usage statistics from a short path in each VO action lists Need Discussion short paths are currently not implemented in the accounting portal Broadcast Messages and Accounting Statistics 3
EGI-InSPIRE RI Everytime a Security Nagios probe fails, an alarm is showed in the dashboard. In my opinion, the Security Dashboard should only show real security threads while the Security Nagios probe fails should be caught by other operation monitoring tools (ROD Dashboard?) since they are reporting a normal job sumission failure and not a security problems The CSRIT group has proposed recently a solution with a mapping on the important issue. The mapping is implemented since February, 16th. glitches of the Security dashboard We are not aware of that, please fill in a GGUS ticket when happens Security Dashboard 4
EGI-InSPIRE RI Occasional problems with inconsistency between regional and central installation still exists The situation is quite better. The integration of the virtual queues should solve the remaining issues. Better advertisement of newly added features Better documentation of Security Dashboard should be provided broadcast tool to announce the new release and the release notes are available on the Portal The documentation will be provided soon with the help of the CSIRT team. Inconsistency and documentation 5
EGI-InSPIRE RI ) Automatic alert masquerading (hierarchy) Need a detailed RT 2) Flapping detection Internal discussion ongoing if this should be done on the NAGIOS or Ops Portal side. On SAM side can be implemented for simple tests, but not for the complex ones like CE. CE has its own state machine so switching on Nagios flapping mechanism would just cause problems (tested) 3)GUI Unintuitive, Slow, too Complicated The slowness will be reduced in the Operations Dashboard with the next version We will take into account some of the remarks about GUI in the next version Some other requirements should be refined (i.e “too complicated”) Improvements 6
EGI-InSPIRE RI Optimized for mobile access foreseen in the plan of next year Optimized for mobile access 7
EGI-InSPIRE RI Ops Portal Current Developments and Roadmap
EGI-InSPIRE RI New version in-line since February, 16th Authentication Authentication model : authorization is applied based on GOC DB and EGI SSO Automatic load of the list of sites / NGI depending from the scope Overview - Visualize security problems : Summarized by ngi or site or also by tests With historical details provided within a chart sort problems by any columns permalinks to access directly to the desired information 3 types of view : monitoring ( normal view ) – history view (with recent “ok” statuses) – debug (for csirt group) Notepads / Tickets Notepad with a mail to Site Security Officer with a template adapted to the current problems on the site With the possibility to visualize the status of the related problems Possibility also to create a ticket against sites. Metrics Generate dynamically metrics with the choice of format (table or charts ) / ngi or site / testname possibility to save charts (csv, pdf, jpg ) Events Possibility to declare / delete events : rotation declaration, monitoring downtimes … Security Dashboard
EGI-InSPIRE RI Pilote version in-line since February, 16th Goal : Provide a tool allowing a quick and easy identification of resources failing automatic VO SAM tests (Dedicated VO Nagios Box). VO experts could then access to this tool to validate results and alert infrastructure providers about how to mitigate the issues Authentication Authentication model : authorization is applied based on GOC DB and CIC DB Automatic load of the list of sites / NGI depending from the scope Overview Visualize Nagios Issues : Summarized by ngi or site / tests With historical details provided within a chart sort problems by any columns permalinks to access directly to the desired information Possibility to create/update notepads or tickets : with a template adapted to the current problems on the site With the possibility to visualize the status of the related problems Administration Add VO Staff and VO shifters - restricted to VO Managers Events Possibility to declare / delete events : rotation shift, monitoring downtimes... VO Oriented Dashboard
EGI-InSPIRE RI Roadmap TasksPlanned completion time Security dashboard : production version February 2012 VO Operations Dashboard : Pilot version February 2012 VO Operations Dashboard : production version April 2012 Major Upgrade of the regional package May 2012 Refactoring of the Operations Dashboard July – August 2012 Availability / reliability moduleOctober – November 2012 Mobile versionMarch – April 2013