Download presentation
Presentation is loading. Please wait.
Published byViolet Lucas Modified over 8 years ago
1
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations Workshop - 3 Abingdon, 27 - 29 September 2005
2
Enabling Grids for E-sciencE INFSO-RI-508833 2 Main Topics Metrics Integration of new services
3
Enabling Grids for E-sciencE INFSO-RI-508833 3 Metrics OSG and EGEE/LCG are producing “weather reports” Assessment of the grid quality based on tests used in operations (core functionality) Progress since last workshop: –EGEE: First shot at an implementation Based on SFT Doesn’t cover all central services ----> created a list Work group (ROC managers) have produced a wish list Needs to be synchronized with practical work –OSG: Metrics and Goals - Miron Livny Metrics and Goals - Miron Livny –Problem: How to decide what is critical------> VOs
4
Enabling Grids for E-sciencE INFSO-RI-508833 4 EGEE Practical Work Every Hour Every day CE Region Grid Weekly, Monthly, Quaterly Prototype metrics report: https://lcg-sft.cern.ch:9443/sft/metrics.html
5
Enabling Grids for E-sciencE INFSO-RI-508833 5 Graphs
6
Enabling Grids for E-sciencE INFSO-RI-508833 6 Next Practical Steps Defined the critical services Guide for target definition: LCG MOU Central Services –Resources Broker David Kant can adapt his RB mon –CE –MyProxy –BDII Gstat has components to provide this (Min?) –R-GMA Analysis from logfiles ( gridView team) –LFC Indirect by SFT, now each local and VO specific (SC team) –FTS No probes available and complex –SRM Data management tests at higher frequency (David Kant)
7
Enabling Grids for E-sciencE INFSO-RI-508833 7 Integration Of New Services Triggered by LCG SC 3 experience EGEE goal: All services are under COD operations! OSG has a defined process –Wiki page to follow progess Deployment Activity Integratio n Test Bed Provisioning Blueprint (ARCH) Release Description Technical Groups VO’s Service Development (Sponsored Activities) ITB 0.3 Operations OSG 0.4 Release Candidate
8
Enabling Grids for E-sciencE INFSO-RI-508833 8 Ticklist for new service User support procedures (GGUS) –Troubleshooting guides + FAQs –User guides Operations Team Training –Site admins –CIC personnel –GGUS personnel Monitoring –Service status reporting –Performance data Accounting –Usage data Service Parameters –Scope - Global/Local/Regional –SLAs –Impact of service outage –Security implications Contact Info –Developers –Support Contact –Escalation procedure to developers Interoperation –??? First level support procedures –How to start/stop/restart service –How to check it’s up –Which logs are useful to send to CIC/Developers and where they are SFT Tests –Client validation –Server validation –Procedure to analyse these error messages and likely causes Tools for CIC to spot problems –GIIS monitor validation rules (e.g. only one “global” component) –Definition of normal behaviour Metrics CIC Dashboard –Alarms Deployment Info –RPM list –Configuration details (for yaim) –Security audit
9
Enabling Grids for E-sciencE INFSO-RI-508833 9 Common Problems Leigh: Why can’t we move services through more quickly? Why can’t the software/software work the first time? We have to find a way to start work before a service has met all criteria –Pilot service?? Release process: –Minimum 1 month in EGEE/LCG –OSG “organic” but not faster
10
Enabling Grids for E-sciencE INFSO-RI-508833 10 Summary Metrics have moved from discussion to prototypes Partners volunteered to help to fill the gaps COD well established first shot at a “tick list” based process to introduce new services
11
Enabling Grids for E-sciencE INFSO-RI-508833 11 Summary II Did we meet the goals? From the agenda: –Interoperation: all aspects; what makes sense? what can be achieved? what can we learn from each other? Plenary –Metrics: to demonstrate a reliable, performant, robust, supported service that improves in quality Progress, work distributed –Integrating LCG Service Challenges and pre-production service into the regular operations TickList –Monitoring tools: where are we? what is missing? How do we fill in the gaps? Plenary –(EGEE) Release/deployment process in the SC/LHC era ROC managers meeting
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.