LHC Computing Grid Project: Status and Prospects

Slides:



Advertisements
Similar presentations
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
RomeWorkshop on eInfrastructures 9 December LCG Progress on Policies & Coming Challenges Ian Bird IT Division, CERN LCG and EGEE Rome 9 December.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Frédéric Hemmer, CERN, IT Department The LHC Computing Grid – June 2006 The LHC Computing Grid Visit of the Comité d’avis pour les questions Scientifiques.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Grid Applications for High Energy Physics and Interoperability Dominique Boutigny CC-IN2P3 June 24, 2006 Centre de Calcul de l’IN2P3 et du DAPNIA.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
May 2004Sverre Jarp1 Preparing the computing solutions for the Large Hadron Collider (LHC) at CERN Sverre Jarp, openlab CTO IT Department, CERN.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
1 The LHC Computing Grid – February 2007 Frédéric Hemmer, CERN, IT Department LHC Computing and Grids Frédéric Hemmer Deputy IT Department Head January.
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
CERN LCG Deployment Overview Ian Bird CERN IT/GD LHCC Comprehensive Review November 2003.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Computing Coordination Aspects for HEP in Germany International ICFA Workshop on HEP Networking, Grid and Digital Divide Issues for Global e-Science nLCG.
LCG Status and Plans GridPP13 Durham, UK 4 th July 2005 Ian Bird IT/GD, CERN.
Service, Operations and Support Infrastructures in HEP Processing the Data from the World’s Largest Scientific Machine Patricia Méndez Lorenzo (IT-GS/EIS),
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
SC4 Planning Planning for the Initial LCG Service September 2005.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
EGEE is a project funded by the European Commission under contract IST NA4/HEP work F Harris (Oxford/CERN) M.Lamanna(CERN) NA4 Open meeting.
LHC Computing, CERN, & Federated Identities
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
The LHC Computing Grid Visit of Dr. John Marburger
1 The LHC Computing Grid – April 2007 Frédéric Hemmer, CERN, IT Department The LHC Computing Grid A World-Wide Computer Centre Frédéric Hemmer Deputy IT.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Physics Data Management at CERN
Grid site as a tool for data processing and data analysis
IT Department and The LHC Computing Grid
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Ian Bird GDB Meeting CERN 9 September 2003
The LHC Computing Challenge
Operating the World’s largest grid infrastructure
LHC Computing Grid Project - LCG
LHC Data Analysis using a worldwide computing grid
The LHC Computing Grid Visit of Prof. Friedrich Wagner
Overview & Status Al-Ain, UAE November 2007.
The LHCb Computing Data Challenge DC06
Presentation transcript:

LHC Computing Grid Project: Status and Prospects Ian Bird IT Department, CERN China-CERN Workshop Beijing 14th May 2005

The Large Hadron Collider Project 4 detectors CMS ATLAS LHCb

The Large Hadron Collider Project 4 detectors CMS ATLAS Requirements for world-wide data analysis Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at ~15 PetaBytes/year 10 PetaBytes of disk Processing – 100,000 of today’s fastest PCs LHCb

LCG – Goals The goal of the LCG project is to prototype and deploy the computing environment for the LHC experiments Two phases: Phase 1: 2002 – 2005 Build a service prototype, based on existing grid middleware Gain experience in running a production grid service Produce the TDR for the final system Phase 2: 2006 – 2008 Build and commission the initial LHC computing environment LCG is not a development project – it relies on other grid projects for grid middleware development and support

LHC data (simplified) Per experiment: = 1% of 1 Megabyte (1MB) A digital photo 1 Gigabyte (1GB) = 1000MB A DVD movie 1 Terabyte (1TB) = 1000GB World annual book production 1 Petabyte (1PB) = 1000TB 10% of the annual production by LHC experiments 1 Exabyte (1EB) = 1000 PB World annual information production LHC data (simplified) Per experiment: 40 million collisions per second After filtering, 100 collisions of interest per second A Megabyte of digitised information for each collision = recording rate of 100 Terabytes/sec 1 billion collisions recorded = 1 Petabyte/year With four experiments, processed data we will accumulate 15 PetaBytes of new data each year = 1% of CMS LHCb ATLAS ALICE

LHC Computing Grid Project - a Collaboration Building and operating the LHC Grid – a global collaboration between Researchers The physicists and computing specialists from the LHC experiments The projects in Europe and the US that have been developing Grid middleware The regional and national computing centres that provide resources for LHC The research networks Computer Scientists & Software Engineers Service Providers Virtual Data Toolkit

LHC Computing Hierarchy Tier 1 Tier2 Center Online System CERN Center PBs of Disk; Tape Robot FNAL Center IN2P3 Center INFN Center RAL Center Institute Workstations ~100-1500 MBytes/sec 2.5-10 Gbps Tens of Petabytes by 2007-8. An Exabyte ~5-7 Years later. ~PByte/sec ~2.5-10 Gbps Tier 0 +1 Tier 3 Tier 4 Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2) ~1:1:1 0.1 to 10 Gbps Physics data cache

LCG Service Hierarchy Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data  Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taipei – Academia SInica UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process  high availability Managed Mass Storage –  grid-enabled data service Data-heavy analysis National, regional support Tier-2 – ~100 centres in ~40 countries Simulation End-user analysis – batch and interactive

Project Areas & Management Project Leader Les Robertson Resource Manager – Chris Eck Planning Officer – Jürgen Knobloch Administration – Fabienne Baud-Lavigne Distributed Analysis - ARDA Massimo Lamanna Prototyping of distributed end-user analysis using grid technology Joint with EGEE Applications Area Pere Mato Development environment Joint projects, Data management Distributed analysis Middleware Area Frédéric Hemmer Provision of a base set of grid middleware (acquisition, development, integration) Testing, maintenance, support CERN Fabric Area Bernd Panzer Large cluster management Data recording, Cluster technology Networking, Computing service at CERN Grid Deployment Area Ian Bird Establishing and managing the Grid Service - Middleware, certification, security operations, registration, authorisation, accounting

Relation of LCG and EGEE Goal Create a European-wide production quality multi-science grid infrastructure on top of national & regional grid programs Scale 70 partners in 27 countries Initial funding (€32M) for 2 years Activities Grid operations and support (joint LCG/EGEE operations team) Middleware re-engineering (close attention to LHC data analysis requirements) Training, support for applications groups (inc. contribution to the ARDA team) Builds on LCG grid deployment Experience gained in HEP LHC experiments  pilot applications

Applications Area All Applications Area projects have software deployed in production by the experiments POOL, SEAL, ROOT, Geant4, GENSER, PI/AIDA, Savannah 400 TB of POOL data produced in 2004 Pre-release of Conditions Database (COOL) 3D project will help POOL and COOL in terms of scalability 3D: Distributed Deployment of Databases Geant4 successfully used in ATLAS, CMS and LHCb Data Challenges with excellent reliability GENSER MC generator library in production Progress on integrating ROOT with other Applications Area components Improved I/O package used by POOL; Common dictionary, maths library with SEAL Plan for next phase of the applications area has just undergone internal review

CERN Fabric Extremely Large Fabric management system Fabric automation has seen very good progress The new systems for managing large farms are in production at CERN since January configuration, installation and management of nodes lemon LHC Era Monitoring - system & service monitoring LHC Era Automated Fabric – hardware / state management Extremely Large Fabric management system Includes technology developed by European DataGrid

CERN Fabric Fabric automation has seen very good progress The new systems for managing large farms are in production at CERN since January New CASTOR Mass Storage System Was deployed first on the high throughput cluster for the recent ALICE data recording computing challenge Agreement on collaboration with Fermilab on Linux distribution Scientific Linux based on Red Hat Enterprise 3 Improves uniformity between the HEP sites serving LHC and Run 2 experiments

CERN Fabric Fabric automation has seen very good progress The new systems for managing large farms are in production at CERN since January New CASTOR Mass Storage System Was deployed first on the high throughput cluster for the recent ALICE data recording computing challenge Agreement on collaboration with Fermilab on Linux distribution Scientific Linux based on Red Hat Enterprise 3 Improves uniformity between the HEP sites serving LHC and Run 2 experiments CERN computer centre preparations Power upgrade to 2.5 MW Computer centre refurbishment well under way Acquisition process started

Preparing for 7,000 boxes in 2008

High Throughput Prototype openlab/LCG Experience with likely ingredients in LCG: 64-bit programming next generation I/O (10 Gb Ethernet, Infiniband, etc.) High performance cluster used for evaluations, and for data challenges with experiments Flexible configuration components moved in and out of production environment Co-funded by industry and CERN

Alice Data Recording Challenge Target – one week sustained at 450 MB/sec Used the new version of Castor mass storage system Note smooth degradation and recovery after equipment failure

Deployment and Operations

Tier-0 – the accelerator centre LHC Computing Model (simplified!!) Tier-0 – the accelerator centre Filter raw data  reconstruction  event summary data (ESD) Record the master copy of raw and ESD Tier-1 – Managed Mass Storage – permanent storage raw, ESD, calibration data, meta-data, analysis data and databases  grid-enabled data service Data-heavy (ESD-based) analysis Re-processing of raw data National, regional support “online” to the data acquisition process high availability, long-term commitment Tier-2 – Well-managed, grid-enabled disk storage End-user analysis – batch and interactive Simulation

Computing Resources: May 2005 Country providing resources Country anticipating joining In LCG-2: 139 sites, 32 countries ~14,000 cpu ~5 PB storage Includes non-EGEE sites: 9 countries 18 sites Number of sites is already at the scale expected for LHC - demonstrates the full complexity of operations

Operations Structure Operations Management Centre (OMC): At CERN – coordination etc Core Infrastructure Centres (CIC) Manage daily grid operations – oversight, troubleshooting Run essential infrastructure services Provide 2nd level support to ROCs UK/I, Fr, It, CERN, + Russia (M12) Hope to get non-European centres Regional Operations Centres (ROC) Act as front-line support for user and operations issues Provide local knowledge and adaptations One in each region – many distributed User Support Centre (GGUS) In FZK – support portal – provide single point of contact (service desk)

Grid Operations The grid is flat, but Hierarchy of responsibility Essential to scale the operation CICs act as a single Operations Centre Operational oversight (grid operator) responsibility rotates weekly between CICs Report problems to ROC/RC ROC is responsible for ensuring problem is resolved ROC oversees regional RCs ROCs responsible for organising the operations in a region Coordinate deployment of middleware, etc CERN coordinates sites not associated with a ROC CIC RC ROC OMC RC = Resource Centre It is in setting up this operational infrastructure where we have really benefited from EGEE funding

Grid monitoring Operation of Production Service: real-time display of grid operations Accounting information Selection of Monitoring tools: GIIS Monitor + Monitor Graphs Sites Functional Tests GOC Data Base Scheduled Downtimes Live Job Monitor GridIce – VO + fabric view Certificate Lifetime Monitor

Operations focus 2004 2005 Main focus of activities now: Improving the operational reliability and application efficiency: Automating monitoring  alarms Ensuring a 24x7 service Removing sites that fail functional tests Operations interoperability with OSG and others Improving user support: Demonstrate to users a reliable and trusted support infrastructure Deployment of gLite components: Testing, certification  pre-production service Migration planning and deployment – while maintaining/growing interoperability Further developments now have to be driven by experience in real use LCG-2 (=EGEE-0) prototyping product 2004 2005 LCG-3 (=EGEE-x?)

Recent ATLAS work ATLAS jobs in EGEE/LCG-2 in 2005 Number of jobs/day ~10,000 concurrent jobs in the system ATLAS jobs in EGEE/LCG-2 in 2005 In latest period up to 8K jobs/day Several times the current capacity for ATLAS at CERN alone – shows the reality of the grid solution

Deployment of other applications Pilot applications High Energy Physics Biomed applications http://egee-na4.ct.infn.it/biomed/applications.html Generic applications – Deployment under way Computational Chemistry Earth science research EGEODE: first industrial application Astrophysics With interest from Hydrology Seismology Grid search engines Stock market simulators Digital video etc. Industry (provider, user, supplier) Many users broad range of needs different communities with different background and internal organization Pilot New

LCG Service Challenges - Overview LHC will enter production (physics) in April 2007 Will generate an enormous volume of data Will require huge amount of processing power LCG ‘solution’ is a world-wide Grid Many components understood, deployed, tested.. But… Unprecedented scale Tremendous challenge of getting large numbers of institutes and individuals, all with existing, sometimes conflicting commitments, to work together LCG must be ready at full production capacity, functionality and reliability in less than 2 years from now Issues include h/w acquisition, personnel hiring and training, vendor rollout schedules etc. Should not limit ability of physicist to exploit performance of detectors nor LHC’s physics potential Whilst being stable, reliable and easy to use

Service Challenges – ramp up to LHC start-up service June05 - Technical Design Report Sep05 - SC3 Service Phase May06 – SC4 Service Phase Sep06 – Initial LHC Service in stable operation Apr07 – LHC Service commissioned Full physics run 2005 2007 2006 2008 First physics First beams cosmics SC2 SC3 SC4 LHC Service Operation SC2 – Reliable data transfer (disk-network-disk) – 5 Tier-1s, aggregate 500 MB/sec sustained at CERN SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 500 MB/sec, including mass storage (~25% of the nominal final throughput for the proton period) SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput

Why Service Challenges? To test Tier-0  Tier-1 Tier-2 services Network service Sufficient bandwidth: ~10 Gbit/sec Backup path Quality of service: security, help desk, error reporting, bug fixing, .. Robust file transfer service File servers File Transfer Software (GridFTP) Data Management software (SRM, dCache) Archiving service: tapeservers,taperobots, tapes, tapedrives, .. Sustainability Weeks in a row un-interrupted 24/7 operation Manpower implications: ~7 fte/site Quality of service: helpdesk, error reporting, bug fixing, .. Towards a stable production environment for experiments

SC2 met its throughput targets >600MB/s daily average for 10 days was achieved - Midday 23rd March to Midday 2nd April Not without outages, but system showed it could recover rate again from outages Load reasonable evenly divided over sites (give network bandwidth constraints of Tier-1 sites)

Tier-1 Network Topology

Opportunities for collaboration A tier 1 centre in China: Tier 1 for LHC experiments – long term data curation, data-intense analysis, etc. Regional support: support for Tier 2 centres for deployment, operations, and users Support and training for new grid applications Regional Operations Centre: participation in monitoring of entire LCG grid operation – helping provide round the clock operations Participation in the service challenges Computer fabric management projects: In building up a Tier 1 or large LHC computing resource Mass storage management systems – Castor Large fabric management systems – Quattor, etc. These are both areas where CERN and other sites collaborate on developments and support Grid management: Is an area where a lot of work is still to be done – many opportunities for projects to provide missing tools

Summary The LCG project – with EGEE partnership – has built an international computing grid Used for real production work for LHC and other HEP experiments Is also now being used by many other application domains – bio-medical, physics, chemistry, earth science The scale of this grid is already at the level needed for LHC in terms of number of sites We already see the full scale of complexity in operations There is a managed operation in place But – there is still a lot to do to make this into a reliable service for LHC Service challenge program – ramp up to 2007 – is an extremely aggressive plan Many opportunities for collaboration in all aspects of the project