Download presentation
Presentation is loading. Please wait.
Published byStuart Short Modified over 8 years ago
1
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao COD-19
2
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 1 st line support ROD C-COD Knowledge Sharing Summary Contents COD-19, Bologna, 30 March - 1 April 2009 2
3
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 1st line support Model – 0.4 FTE – Email notification when detecting site problem – GGUS tickets follow-up: Review opened GGUS tickets and update/close GGUS tickets. – Work with site admin till problem been resolved Remote login for troubleshooting (upon request) – Site deployment support Site design recommendations M/W installation and configuration support Tools –Not using Dashboard for daily operation work –Monitoring system: Regional-level Nagios system, Gstat, SAM and Smokeping –Ticketing system: GGUS COD-19, Bologna, 30 March - 1 April 2009 3
4
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 1st line support Communication –Email ( APROC support mailing list) –Regional ticketing system for user support –Voice and message services Concerns –HR : More FTE needed for 1 st line support (work loading is the heaviest!) Grid technical knowledge and operation experiences are required here. COD-19, Bologna, 30 March - 1 April 2009 4
5
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 ROD Model –0.3 FTE ( weekly rotation) –Work on Regional Dashboard Check alarms every 2 hr Create and assign tickets Diagnose and provide detail into tickets. Review GGUS tickets (track tickets until it’s resolved) –Escalate cases to C-COD –Site deployment support Site registration Site certification/SAM testing Tools (Dashboard) –To have “options” functions to hide sites without alarms and problems. hide masked alarms since we deal with the "main" alarms on the dashboard rank/sort alarms to help prioritize work –To automatically set off alarms with 'OK' status and are non critical within certain period time, or dashboard provides function to close all 'OK' alarms at the same time. –ROD metrics to provide more information COD-19, Bologna, 30 March - 1 April 2009 5
6
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 ROD Communication –Mailing list –Dashboard Concerns –Communication with site admins Passivity of site admins- No responses for GGUS tickets (opened by ROD) –Operation procedure will be customized when new regional level operation tools be deployed (once the release is published from OAT) COD-19, Bologna, 30 March - 1 April 2009 6
7
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 C-COD Model –Work loading depends upon the performance of RODs Tools (Dashboard) –To have more automations to help simplify operation tasks. e.g. Request for action from ROD could be a automatic notifications. –Alarm interface shows ROD’s name (in addition to site name) –Alarm age during weekends should not be increased –Communication tool with RODs –Handover tool for C-CODs COD-19, Bologna, 30 March - 1 April 2009 7
8
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 Knowledge Sharing Three tools are currently utilized for Knowledge Sharing: WIKI page: http://lists.grid.sinica.edu.tw/apwiki/FrontPage Mainly contribute by ROD and 1 st list support FAQ system: http://faq.twgrid.org/faq/index.php?action=show Solution verified by ROD APROC mailing list Mailing: public accessible archive + search COD-19, Bologna, 30 March - 1 April 2009 8
9
Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 Summary Notable Success: –The result of new model is fruitful that we have provided better support to regions and the availability and reliability of Asia Federation sites has been improved We look forward to get more sites joining in the regional operations collaboration. COD-19, Bologna, 30 March - 1 April 2009 9
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.