Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006
October 7, The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors The LHC Accelerator
October 7, LHC DATA This is reduced by online computers that filter out a few hundred “good” events per sec. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
October 7, The Worldwide LHC Computing Grid Purpose Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments Ensure the computing service … and common application libraries and tools Phase I – Development & planning Phase II – – Deployment & commissioning of the initial services
October 7, WLCG Collaboration The Collaboration – still growing ~130 computing centres 12 large centres (Tier-0, Tier-1) federations of smaller “Tier-2” centres 29 countries Memorandum of Understanding Agreed in October 2005, now being signed Purpose Focuses on the needs of the four LHC experiments Commits resources – each October for the coming year 5-year forward look Agrees on standards and procedures
October 7, LCG Service Hierarchy Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process high availability Managed Mass Storage – grid-enabled data service Data-heavy analysis National, regional support Tier-2 – ~120 centres in ~29 countries Simulation End-user analysis – batch and interactive
October 7, LHC EGEE Grid High Energy Physics a new computing infrastructure for science 1999 – Monarc Project Early discussions on how to organise distributed computing for LHC 2000 – growing interest in grid technology HEP community was the driver in launching the DataGrid project EU DataGrid project middleware & testbed for an operational grid – LHC Computing Grid – LCG deploying the results of DataGrid to provide a production facility for LHC experiments – EU EGEE project phase 1 starts from the LCG grid shared production infrastructure expanding to other communities and sciences CERN
October 7, LCG depends on two major science grid infrastructures EGEE - Enabling Grids for E-Science OSG - US Open Science Grid
October 7, Production Grids for LHC EGEE Grid ~50K jobs/day ~14K simultaneous jobs during prolonged periods Jobs/day EGEE Grid 14K
October 7, OSG Production for LHC OSG ~15K jobs/day. 3 big users are ATLAS, CDF, CMS. ~3K simultaneous jobs -- at the moment use quite spiky. ATLASCMS OSG-CMS Data Distribution - past 3 months OSG-ATLAS Running Jobs - past 3 months 10,000 20,000 1,000 Jobs/day OSG Grid
October 7, Pre-SC4 April tests CERN T1s – SC4 target 1.6 GB/s reached – but only for one day But – experiment-driven transfers (ATLAS and CMS) sustained 50% of the target under much more realistic conditions CMS transferred a steady 1 PByte/month between Tier-1s & Tier-2s during a 90 day period ATLAS distributed 1.25 PBytes from CERN during a 6-week period Data Distribution 1.6 GBytes/sec 0.8 GBytes/sec
October 7, Interoperation between Grid Infrastructures Good progress EGEE-OSG interoperability Cross job submission – in use by CMS Integrating basic operation – series of workshops Early technical studies on integration with Nordic countries and NAREGI in Japan
Enabling Grids for E-sciencE EGEE-II INFSO-RI Collaborating Infrastructures Potential for linking ~80 countries by 2008 KnowARC DEISA TeraGrid
Enabling Grids for E-sciencE EGEE-II INFSO-RI Applications on EGEE More than 25 applications from an increasing number of domains –Astrophysics –Computational Chemistry –Earth Sciences –Financial Simulation –Fusion –Geophysics –High Energy Physics –Life Sciences –Multimedia –Material Sciences –….. Book of abstracts:
Enabling Grids for E-sciencE EGEE-II INFSO-RI Example: EGEE Attacks Avian Flu EGEE used to analyse 300,000 possible potential drug compounds against bird flu virus, H5N computers at 60 computer centres in Europe, Russia, Asia and Middle East ran during four weeks in April - the equivalent of 100 years on a single computer. Potential drug compounds now being identified and ranked. Neuraminidase, one of the two major surface proteins of influenza viruses, facilitating the release of virions from infected cells. Image Courtesy Ying-Ta Wu, AcademiaSinica.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ITU International Telecommunication Union –ITU/BR: Radio-communication Sector management of the radio-frequency spectrum and satellite orbits for fixed, mobile, broadcasting and other communication services RRC-06 (15 May–16 June 2006) –120 countries negotiate the new frequency plan –introduction of digital broadcasting UHF ( Mhz) & VHF ( Mhz) –Demanding computing problem with short- deadlines –Using EGEE grid were able to complete a cycle in less than 1 hour
Enabling Grids for E-sciencE EGEE-II INFSO-RI Grid management: structure Operations Coordination Centre (OCC) –management, oversight of all operational and support activities Regional Operations Centres (ROC) –providing the core of the support infrastructure, each supporting a number of resource centres within its region –Grid manager on Duty (COD) Resource centres –providing resources (computing, storage, network, etc.); Grid User Support (GGUS) –At FZK, coordination and management of user support, single point of contact for users
Enabling Grids for E-sciencE EGEE-II INFSO-RI Security & Policy Collaborative policy development –Many policy aspects are collaborative works; e.g.: Joint Security Policy Group Certification Authorities –EUGridPMA IGTF, etc. Grid Acceptable Use Policy (AUP) –common, general and simple AUP –for all VO members using many Grid infrastructures EGEE, OSG, SEE-GRID, DEISA, national Grids… Incident Handling and Response –defines basic communications paths –defines requirements (MUSTs) for IR –not to replace or interfere with local response plans Security & Availability Policy Usage Rules Certification Authorities Audit Requirements Incident Response User Registration & VO Management Application Development & Network Admin Guide VO Security
Enabling Grids for E-sciencE EGEE-II INFSO-RI Sustainability: Beyond EGEE-II Need to prepare for permanent Grid infrastructure –Maintain Europe’s leading position in global science Grids –Ensure a reliable and adaptive support for all sciences –Independent of short project funding cycles –Modelled on success of GÉANT Infrastructure managed in collaboration with national grid initiatives
October 7, Conclusions LCG will depend on ~130 computer centres two major science grid infrastructures – EGEE and OSG excellent global research networking Grids are now operational >200 sites between EGEE and OSG Grid operations centres running for well over a year >40K jobs per day, 20K simultaneous jobs with the right load and job mix Demonstrated target data distribution rates from CERN Tier-1s EGEE is a large multi-disciplinary grid Although HEP is a driving force, must remain broader to ensure the long term Planning for a long-term sustainable infrastructure now