Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002 LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
Why is it relevant to HEPiX? What is LCG? Why is it relevant to HEPiX? Ian.Bird@cern.ch
LCG Project Goals Goal – Prepare and deploy the LHC computing environment applications - tools, frameworks, environment, persistency computing system global grid service cluster automated fabric collaborating computer centres grid CERN-centric analysis global analysis environment central role of data challenges This is not another grid technology project – it is a grid deployment project Ian.Bird@cern.ch
LCG Level 1 Milestones proposed to LHCC M1.1 - June 03 First Global Grid Service (LCG-1) available M1.2 - June 03 Hybrid Event Store (Persistency Framework) available for general users M1.3a - November 03 LCG-1 reliability and performance targets achieved M1.3b - November 03 Distributed batch production using grid services M1.4 - May 04 Distributed end-user interactive analysis from “Tier 3” centre M1.5 - December 04 “50% prototype” (LCG-3) available M1.6 - March 05 Full Persistency Framework M1.7 - June 05 LHC Global Grid TDR Ian.Bird@cern.ch
LCG and its interactions Experiments Grid Projects HEPCAL PPDG GTA Common Applications Deployment Fabric iVDGL (VDT) GriPhyN Globus GLUE EDG NorduGrid GDB AliEn Regional Centres CERN Ian.Bird@cern.ch
Multi-dimensional problem Regional Centres: Host one or more experiments Different RC’s deploy different grid middleware in existing testbeds Have different operational and security policies Experiments: Use middleware from various grid projects Run at many regional centres Provide applications that rely on specific middleware Grid projects: Provide middleware – that does not often (yet) interoperate Starting to collaborate on common solutions and interoperability The Deployment area of LCG ties these all together Ian.Bird@cern.ch
Grid Deployment – goals of LCG-1 Production service for Data Challenges in 2H03 & 2004 Focused on batch production work Experience in close collaboration between the Regional Centres Should have wide enough participation to understand the issues, but not too many initially Learn how to maintain and operate a global grid Focus on a production-quality service and all that implies Robustness, fault-tolerance, predictability, and supportability take precedence over functionality But – minimum functionality to be of value This requires: a middleware support group with integration, certification, testing, packaging etc. responsibilities A support structure LCG should be integrated into the sites’ physics computing services – should not be something apart This requires coordination between participating sites in: Policies and collaborative agreements Resource planning and scheduling Operations Support Ian.Bird@cern.ch
What might LCG-1 look like? User’s perspective: - requires Functionality adequate to provide advantage over not using distributed model Straightforward to use – Well defined services Advice on how to use the system Help with problems Failures should be understandable Ability to determine status of jobs and data Sites’ perspective: Integrated into computer centre/IT (inc. security) infrastructures Able to support service Able to allocate and manage resources – local autonomy where needed Overall service perspective: Performance and problem monitoring Accounting Etc. Ian.Bird@cern.ch
Requires agreements, collaboration, and coordination LCG has to build the “virtual computer centre” (= LHC computing environment) With all that is expected from a production service User support Operations group “Account” management Security Fabric management Etc.. Except this is now distributed across many countries and continents Requires agreements, collaboration, and coordination At all levels: management, system managers, user support, etc. Ian.Bird@cern.ch
Grid Operations Centre queries monitoring & alarms corrective actions User Local site Local user support Local operation Call Centre Grid Operations Centre Grid information service Grid operations Grid logging & bookkeeping Virtual Organisation Network Operations Centre Ian.Bird@cern.ch
Deployment Summary Deploy middleware to support essential functionality, but goal is to evolve and incrementally add functionality Added value is to robustify, support and make into a 24x7 production service How? Certification & test procedure – tight feedback to developers must develop support agreements with grid projects to ensure this Define missing functionality – require from providers Provide documentation and training Provide missing operational services Provide a 24x7 Operations and Call Centre Guarantee to respond Single point of contact for a user Make software easy to install – facilitate new centres joining Ian.Bird@cern.ch
LCG Strategy Develop as little as possible Use existing middleware, tools and software Pressure developers to provide missing functionality Negotiate support agreements Leverage existing experience Various data grid projects and testbeds Teragrid, interoperability demonstrations, GGF – production grids area Actively encourage collaboration and coordination Ian.Bird@cern.ch
Grid Deployment Teams – the plan suppliers’ integration teams provide tested releases common applications s/w Trillium - US grid middleware DataGrid middleware HEPiX interests certification, build & distribution LCG infrastructure coordination & operation user support grid operation LCG call centre … fabric operation regional centre A fabric operation regional centre B fabric operation regional centre X fabric operation regional centre Y Ian.Bird@cern.ch
Coordination & Collaboration There are many opportunities for common solutions, which should be actively pursued HICB – JTB, existing & proposed new collaborative activities GLUE Schema definitions & interoperability work Validation and Test Suites Distribution and Meta-Packaging Interoperable distribution and configuration utilities identified as a definite need by all the recent trans-Atlantic demonstration and validation work. Support for this group comes from: LCG, EDG, EDT, Trillium, DataTAG Security czars Already talking to address grid issues GGF Production grids AAA Etc. LCG – grid deployment board, etc. Ian.Bird@cern.ch
Summary of Issues that might be addressed by HEPiX/LCCWS I know many of these are discussed by a plethora of grid projects and offshoots, but remember, more than ever before we all have to work together coherently to make a grid work: Grid operations centre: Teragrid, iVDGL User support – distributed helpdesk/call centre: iVDGL, Teragrid, Nordic grid collabs, GGF production grids area Helpdesk tools Certification process for operating environments Upgrade procedures Configuration management Joint OS version certification Packaging, installation – inc applications User management Security etc. Fabric management (see LCCWS) Etc. Ian.Bird@cern.ch
Proposal HEPiX is already (a lot of) the right people Already, or soon to be, deploying LCG and other grids in their computer centres Keep LCCWS associated with HEPiX Add a Grid Coordination/LCG interest group – like HEPNT or Storage To address themes and issues of common interest Encourage new people to attend Line up specific talks by selected people to address issues and to propose activities to follow on We need to solve the problems – not just talk about them Needs a coordinator & agenda to make sure this happens – Volunteers? Ian.Bird@cern.ch