Presentation is loading. Please wait.

Presentation is loading. Please wait.

LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY.

Similar presentations


Presentation on theme: "LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY."— Presentation transcript:

1 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY

2 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 2 Information sources Input: Review of Tier 1 readiness June 8th 2006 @ Cern Reviewers: John Gordon (RAL), Volker Gülzow (DESY) Chair, Alessandro de Salvo (INFN Rome), Jeff Templon (NIKHEF), Frank Würthwein (UCSD) From a questionnaire to Tier 1‘s, from questions to the Experiments (Tier 1‘s, Middleware, Interoperability) From documents from MB, CRRB CTDR’s + supplement Tier 1 milestone plans LCG-wiki’s From other sources

3 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 3 Review Process I Mandate: (Discussed in MB) “… review pays specific attention to the following topics: state of readiness of CERN and the Tier-1 centres, including operational procedures and expertise, 24 X 7 support, resource planning to provide the required capacity and performance, site test and validation programme; the essential components and services missing in SC4 and the plans to make these available in time for the initial LHC service; the EGEE-middleware deployment and maintenance process, including the relationship between the development and deployment teams, and the steps being taken to reduce the time taken to deploy a new release; the plans for testing the functionality, reliability and performance of the overall service; interoperability between the LCG sites in EGEE, OSG and NDGF;” http://www.cern.ch/lcg/documents/mb/service_review_mandate_jun06.doc

4 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 4

5 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 5 Tier1/2 Summary Table 40 Tier2 centres have their data included in above table. 9 more centres plan to join as soon as possible. Source: Chris Eck, CRRB April 2006

6 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 6 Overall Comments to Tier 1‘s The Tier 1 requirements are currently changing due to accelerator time schedule, new resource planning from the experiments will show up in October A lot of diversity among the Tier-1’s i.e. Background Technology Funding Staffing # of experiments, size

7 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 7 Overall Comments to Tier 1‘s (June06) Not all the Tier-1’s have reached the level of readiness, which is required for LHC start-up. Key-factors are organisational gaps in implementing off-hour service, funding problems, communication with experiments (two sided problem) There are severe risks with the scalability of the resources. The manpower situation on the Tier 1‘s was not always transparent during the review

8 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 8 Source: Les Robertson

9 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 9 Overall Comments to Tier 1‘s The overall monitoring of the Tier 0/1/2 complex is of very great importance. The Tier 2 associations are not completely clear. This needs immediate clarification The support concept for Tier 2/Tier 3 centres by Tier 1’s is not well determined. This is partly because of unclear requirements from the experiments. At this stage, one should no longer make distinction between production and SC4 infrastructure (experiments complain)

10 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 10 milestone plans https://twiki.cern.ch/twiki/pub/LCG/MilestonesPlans

11 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 11 „Communication“ Clear (and redundant) contact persons (e.g. liaison officers) have to be nominated on both sides. Clear/precise information from the experiments, well structured. Web based monitoring pages for operational issues should be made available by the experiments.

12 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 12 „Communication“ Operations meetings OPS/SCM/RSM are important -> mandate etc. reviewed by MB GGUS is a well accepted tool and should be used as the main tracking tool. Further improvements are needed (e.g. GUI, amount of mails, support for full set of problem categories, “when can a case be declared closed?”)

13 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 13 „24x7“ A full 24x7 in the sense of live monitoring and alarming and for a certain class of problems „immediate“ reaction is required. A „on call“- Service still has to be setup at many sites. It‘s required to 1.have the right tools, which are often not sufficient. For the setup of tools, a initiative (eg via HEPIX) should be started to sharpen the tool set, which is helpful for Tier 2 and Tier 3‘s as well. 2.Have adequate staff available -> management. In the focus of MB.

14 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 14 „Management issues“ The funding situation is not clear at every centre. A revised ramp up planning may help. This has to be followed carefully. Clear, up to date and realistic requirements from the Exp. would help the Tier 1‘s to acquire on time. At some centres critical work is carried out by temporary staff, depending on the country this can cause severe problems.

15 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 15 „Middleware“ The introduction of gLite 3 was a bit “bumpy”, people were somewhat confused. Many emotions prior to real experience were expressed, which was not helpful. There were lots of complaints but only very little error reporting. The “post mortem” analysis of the process was very much appreciated.

16 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 16 „Middleware“ Sites were not able to meet the tight time constraints. Reasons were (and are)  lack of manpower,  lack of understanding,  Site localization  coordination with needs of non-LHC experiments.

17 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 17 „Middleware“ Stable production environments have to be the no. 1 goal today. Worry about effort diverted on side projects. The Software was not mature enough, we need to find ways to guarantee readiness of software when released. The representation of operational issues in the TCG is not adequate, the Tier 1’s should be better represented, their input has to be taken. The TCG should include operational issues in the priority list and allow sites to influence the ranking. Full VOMS needed! The error reporting from the users has to improve. The middleware urgently needs proper operational interfaces: –Logging –Diagnostics –Service operation interfaces

18 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 18 „Interoperability“ The experiments should make the importance of the problem clear. Interoperability of the grids needs more attention and manpower as there is today if required Can we expect uniform testing (SFT’s), monitoring, accounting, and metrics for ALL WLCG sites?

19 LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 19 Conclusion: Excellent work was done at the Tier 1’s on many tasks The cultural gap has to be bridged The 24x7 case is almost open Monitoring of sites strongly recommended the funding and staffing situation needs careful attention Middleware robustness and operational hooks needed More binding acting in certain areas is required (on all Tier levels) The new ramp up does not allow to lean back


Download ppt "LCG Sep. 26thLHCC comprehensive review 2006 Volker Guelzow 1 Tier 1 status, a summary based upon a internal review Volker Gülzow DESY."

Similar presentations


Ads by Google