Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.

Slides:



Advertisements
Similar presentations
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Last update: 02/06/ :05 LCG les robertson - cern-it 1 The LHC Computing Grid Project Preparing for LHC Data Analysis NorduGrid Workshop Stockholm,
CERN – June 2007 View of the ATLAS detector (under construction) 150 million sensors deliver data … … 40 million times per second.
INFSO-RI Enabling Grids for E-sciencE The Grid Challenges in LHC Service Deployment Patricia Méndez Lorenzo CERN (IT-GD) Linköping.
Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Tony Doyle - University of Glasgow 27 June 2006Collaboration Meeting GridPP2 Status Tony Doyle.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Frédéric Hemmer, CERN, IT Department The LHC Computing Grid – June 2006 The LHC Computing Grid Visit of the Comité d’avis pour les questions Scientifiques.
INFSO-RI Enabling Grids for E-sciencE Geant4 Physics Validation: Use of the GRID Resources Patricia Mendez Lorenzo CERN (IT-GD)
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
USATLAS SC4. 2 ?! …… The same host name for dual NIC dCache door is resolved to different IP addresses depending.
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
1 The LHC Computing Grid – February 2007 Frédéric Hemmer, CERN, IT Department LHC Computing and Grids Frédéric Hemmer Deputy IT Department Head January.
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
INFSO-RI Enabling Grids for E-sciencE Porting Scientific Applications on GRID: CERN Experience Patricia Méndez Lorenzo CERN (IT-PSS/ED)
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
Les Les Robertson LCG Project Leader WLCG – Management Overview LHCC Comprehensive Review September 2006.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Service, Operations and Support Infrastructures in HEP Processing the Data from the World’s Largest Scientific Machine Patricia Méndez Lorenzo (IT-GS/EIS),
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
SC4 Planning Planning for the Initial LCG Service September 2005.
Ian Bird CERN IT LCG Deployment Area manager EGEE Operations Manager LCG Status Report LHCC Open Session CERN 28 th June 2006.
Ian Bird LCG Project Leader WLCG Update 6 th May, 2008 HEPiX – Spring 2008 CERN.
The LHC Computing Environment Challenges in Building up the Full Production Environment [ Formerly known as the LCG Service Challenges ]
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
DJ: WLCG CB – 25 January WLCG Overview Board Activities in the first year Full details (reports/overheads/minutes) are at:
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
The LHC Computing Grid Visit of Dr. John Marburger
The Worldwide LHC Computing Grid Introduction & Housekeeping Collaboration Workshop, Jan 2007.
1 The LHC Computing Grid – April 2007 Frédéric Hemmer, CERN, IT Department The LHC Computing Grid A World-Wide Computer Centre Frédéric Hemmer Deputy IT.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
LCG LHC Grid Deployment Board Regional Centers Phase II Resource Planning Service Challenges LHCC Comprehensive Review November 2004 Kors Bos, GDB.
Ian Bird LCG Project Leader WLCG Status Report 7 th May, 2008 LHCC Open Session.
Jürgen Knobloch/CERN Slide 1 Grid Computing by Jürgen Knobloch CERN IT-Department Presented at Physics at the Terascale DESY, Hamburg December 4, 2007.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
LHCC meeting – Feb’06 1 SC3 - Experiments’ Experiences Nick Brook In chronological order: ALICE CMS LHCb ATLAS.
Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
“Replica Management in LCG”
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Grid Computing in HIGH ENERGY Physics
The LHC Computing Environment
Physics Data Management at CERN
Kors Bos NIKHEF, Amsterdam.
IT Department and The LHC Computing Grid
Data Challenge with the Grid in ATLAS
The LHC Computing Challenge
LCG Status Report LHCC Open Session CERN 28th June 2006.
Visit of US House of Representatives Committee on Appropriations
LHC Data Analysis using a worldwide computing grid
The LHC Computing Grid Visit of Prof. Friedrich Wagner
Overview & Status Al-Ain, UAE November 2007.
The ATLAS Computing Model
The LHCb Computing Data Challenge DC06
Presentation transcript:

Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX Meeting Rome 5 April 2006

HEPiX Rome 05apr06 LCG The Worldwide LHC Computing Grid  Purpose  Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments  Ensure the computing service  … and common application libraries and tools  Phase I – Development & planning  Phase II – – Deployment & commissioning of the initial services

HEPiX Rome 05apr06 LCG WLCG Collaboration  The Collaboration  ~100 computing centres  12 large centres (Tier-0, Tier-1)  38 federations of smaller “Tier-2” centres  20 countries  Memorandum of Understanding  Agreed in October 2005, now being signed  Resources  Commitment made each October for the coming year  5-year forward look

HEPiX Rome 05apr06 LCG Physics Support Grid Deployment Board – chair Kors Bos (NIKHEF) With a vote: One person from a major site in each country One person from each experiment Without a vote: Experiment Computing Coordinators Site service management representatives Project Leader, Area Managers Management Board – chair Project Leader Experiment Computing Coordinators One person fromTier-0 and each Tier-1 Site GDB chair Project Leader, Area Managers EGEE Technical Director Architects Forum – chair Pere Mato (CERN) Experiment software architects Applications Area Manager Applications Area project managers Collaboration Board – chair Neil Geddes (RAL) Sets the main technical directions One person from Tier-0 and each Tier-1, Tier-2 (or Tier-2 Federation) Experiment spokespersons Overview Board – chair Jos Engelen (CERN CSO) Committee of the Collaboration Board oversee the project resolve conflicts One person from Tier-0, Tier-1s Experiment spokespersons

HEPiX Rome 05apr06 LCG Boards and Committees All boards except the OB have open access to agendas, minutes, documents Planning data: MoU Documents and Resource Data Technical Design Reports Phase 2 Plans Status and Progress Reports Phase 2 Resources and costs at CERN More information on the collaboration

HEPiX Rome 05apr06 LCG LCG Service Hierarchy Tier-0 – the accelerator centre  Data acquisition & initial processing  Long-term data curation  Distribution of data  Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands Tier-1 (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process  high availability  Managed Mass Storage –  grid-enabled data service  Data-heavy analysis  National, regional support Tier-2 – ~100 centres in ~40 countries  Simulation  End-user analysis – batch and interactive

HEPiX Rome 05apr06 LCG CPU DiskTape

HEPiX Rome 05apr06 LCG LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid

LCG.. and an excellent Wide Area Network

HEPiX Rome 05apr06 LCG Sustained Data Distribution Rates CERN  Tier-1s CentreALICEATLASCMSLHCbRate into T1 MB/sec (pp run) ASGC, TaipeiXX100 CNAF, ItalyXXXX200 PIC, SpainXXX100 IN2P3, LyonXXXX200 GridKA, GermanyXXXX200 RAL, UKXXX150 BNL, USAX200 FNAL, USAX200 TRIUMF, CanadaX50 NIKHEF/SARA, NLXXX150 Nordic Data Grid Facility XX50 Totals1,600 Design target is twice these rates to enable catch-up after problems

HEPiX Rome 05apr06 LCG Tier-1 Tier-2 Experiment computing models define specific data flows between Tier-1s and Tier-2s

HEPiX Rome 05apr06 LCG ATLAS “average” Tier-1 Data Flow (2008) Tier-0 CPU farm T1 Other Tier-1s disk buffer RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day AODm2 500 MB/file Hz 0.34K f/day 2 MB/s 0.16 TB/day RAW ESD2 AODm Hz 3.74K f/day 44 MB/s 3.66 TB/day T1 Other Tier-1s T1 Tier-2s Tape RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day disk storage AODm2 500 MB/file Hz 0.34K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day Plus simulation & analysis data flow Real data storage, reprocessing and distribution

HEPiX Rome 05apr06 LCG More information on the Experiments’ Computing Models  LCG Planning Page  GDB Workshops  Mumbai Workshop - see GDB Meetings page Experiment presentations, documents  Tier-2 workshop and tutorials CERN June Technical Design Reports LCG TDR - Review by the LHCC ALICE TDR supplement: Tier-1 dataflow diagrams ATLAS TDR supplement: Tier-1 dataflow CMS TDR supplement Tier 1 Computing Model LHCb TDR supplement: Additional site dataflow diagrams

HEPiX Rome 05apr06 LCG Problem Response Time and Availability targets Tier-1 Centres Service Maximum delay in responding to operational problems (hours) Availability Service interruption Degradation of the service > 50%> 20% Acceptance of data from the Tier-0 Centre during accelerator operation % Other essential services – prime service hours 22498% Other essential services – outside prime service hours % LCG

HEPiX Rome 05apr06 LCG Problem Response Time and Availability targets Tier-2 Centres Service Maximum delay in responding to operational problems availability Prime timeOther periods End-user analysis facility 2 hours72 hours95% Other services12 hours72 hours95% LCG

HEPiX Rome 05apr06 LCG Measuring Response times and Availability Site Functional Test Framework:  monitoring services by running regular tests  basic services – SRM, LFC, FTS, CE, RB, Top-level BDII, Site BDII, MyProxy, VOMS, R-GMA, ….  VO environment – tests supplied by experiments  results stored in database  displays & alarms for sites, grid operations, experiments  high level metrics for management  integrated with EGEE operations-portal - main tool for daily operations (egee)

HEPiX Rome 05apr06 LCG Site Functional Tests  Tier-1 sites without BNL  Basic tests only  Only partially corrected for scheduled down time  Not corrected for sites with less than 24 hour coverage average value of sites shown

HEPiX Rome 05apr06 LCG Availability Targets  End September end of Service Challenge 4  8 Tier-1s and 20 Tier-2s > 90% of MoU targets  April 2007 – Service fully commissioned  All Tier-1s and 30 Tier-2s > 100% of MoU Targets

HEPiX Rome 05apr06 LCG Service Challenges  Purpose real grid service  Understand what it takes to operate a real grid service – run for weeks/months at a time (not just limited to experiment Data Challenges)  Trigger and verify Tier1 & large Tier-2 planning and deployment – - tested with realistic usage patterns  Get the essential grid services ramped up to target levels of reliability, availability, scalability, end-to-end performance  Four progressive steps from October 2004 thru September 2006  End SC1 – data transfer to subset of Tier-1s  Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier-2s  2 nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services  Jun-Sep 2006 – SC4 – pilot service  Autumn 2006 – LHC service in continuous operation – ready for data taking in 2007

HEPiX Rome 05apr06 LCG SC4 – the Pilot LHC Service from June 2006 A stable service on which experiments can make a full demonstration of experiment offline chain  DAQ  Tier-0  Tier-1 data recording, calibration, reconstruction  Offline analysis - Tier-1  Tier-2 data exchange simulation, batch and end-user analysis And sites can test their operational readiness  Service metrics  MoU service levels  Grid services  Mass storage services, including magnetic tape Extension to most Tier-2 sites Evolution of SC3 rather than lots of new functionality In parallel –  Development and deployment of distributed database services (3D project)  Testing and deployment of new mass storage services (SRM 2.1)

HEPiX Rome 05apr06 LCG Medium Term Schedule 3D distributed database services development test deployment SC4 stable service For experiment tests SRM 2 test and deployment plan being elaborated October target Additional functionality to be agreed, developed, evaluated then - tested deployed ?? Deployment schedule ??

HEPiX Rome 05apr06 LCG LCG Service Deadlines full physics run first physics cosmics Pilot Services – stable service from 1 June 06 LHC Service in operation – 1 Oct 06 over following six months ramp up to full operational capacity & performance LHC service commissioned – 1 Apr 07

HEPiX Rome 05apr06 LCG Conclusions  LCG will depend on  ~100 computer centres – run by you  two major science grid infrastructures – EGEE and OSG  excellent global research networking  We have  understanding of the experiment computing models  agreement on the baseline services  good experience from SC3 on what the problems and difficulties are  Grids are now operational  ~200 sites between EGEE and OSG  Grid operations centres running for well over a year  > 20K jobs per day accounted  ~15K simultaneous jobs with the right load and job mix BUT – a long way to go on reliability

HEPiX Rome 05apr06 LCG  The Service Challenge programme this year must show that we can run reliable services  Grid reliability is the product of many components – middleware, grid operations, computer centres, ….  Target for September  90% site availability  90% user job success  Requires a major effort by everyone to monitor, measure, debug First data will arrive next year NOT an option to get things going later Too modest? Too ambitious?