Grid and Data handling Gonzalo Merino, Port d’Informació Científica / CIEMAT Primeras Jornadas del CPAN, El Escorial, 25/11/2009
Disclaimer Though the title of this talk is very generic, I will focus in describing the LHC Grid and data handling as an example. This is the community with the largest and more imminent computing needs, as well as my area of work. I will try and address the Grid-related activities in other CPAN areas. The information presented does not aim to show a complete catalogue of Grid activities, but to describe the general view and provide a handful of URL pointers to further information. 25/11/20092
LHC computing needs The LHC is one of the world largest scientific machines Proton-proton collider, 27 Km perimeter, 100 m underground, superconducting magnets at 1,9 K Four detectors will record the outcome of collisions 1 GHz collisions Hz trigger near 1 GB/s PB /yr Adding up processed data, simulation, replicas: PB/year years lifetime LHC in the Exabyte scale Managing this huge amount of data and enabling its analysis by 1000s of scientists worldwide is a technological challenge. No way to concentrate such computing power and storage capacity at CERN. Decided to adopt the Grid paradigm for the LHC computing. 25/11/20093
4 LHC Grid: layered structure It comes from the early days (1999, MONARC). Then mainly motivated for limited network connectivity among sites. Today, the network is not the issue but the Tiered model is still used to organise work and data flows. Tier-1 (11 centres): Online to DAQ (24x7) Long term storage of RAW data copy, massive data reconstruction. Connect to CERN with dedicated 10 Gbps links. Tier-0 at CERN: DAQ and prompt reconstruction, Long term data curation. Tier-2 (>150 centres): End-user analysis and simulation. Connect to T1s with gral. purpose research networks. 25/11/2009
Worldwide LHC Computing Grid More than 170 centres in 34 countries ~ 86k CPUs, 68 PB disk, 65 PB tape CIEMAT UAM USC IFCA PIC IFAE UB IFIC PIC Tier-1ATLAS Tier-2CMS Tier-2LHCb Tier-2 (ATLAS, CMS, LHCb)IFICCIEMATUB IFAEIFCAUSC UAM Spain contributes with 1 Tier-1 and 7 Tier-2 (target ~5% of total T1/T2 capacity) 25/11/20095
Distribution of resources Experiment computing requirements for the run at the different WLCG Tiers More than 80% of the resources are outside CERN The Grid MUST work from day 1! 25/11/20096
LHC computing requirements The computing and storage capacity needs for WLCG are enormous. Capacity planning managed through the WLCG MoU: Yearly process where requirements and pledges are updated and agreed ~ today cores 25/11/20097
LHC Experiments Computing Models
Experiments Computing Models Every LHC experiment develops and maintains a Computing Model that aims to describe the organisation of the data and the computing infrastructure that is needed to process and analyse them. Example: input parameters to the ATLAS Computing Model 25/11/20099
ATLAS Computing Model Tier-0 CAF Prompt Reconstruction Calibration & Alignment Express Stream Analysis Tier-1 RAW Re-processing HITS Reconstruction Tier-2 Simulation and Analysis 650 MB/s MB/sec MB/s MB/sec 25/11/200910
CMS Computing Model Tier-0 CAF Prompt Reconstruction Calibration Express Stream Analysis Tier-1 Re-reconstruction Skimming & selection Tier MB/s MB/s MB/sec Simulation and analysis 25/11/200911
LHCb Computing Model Tier-0 CAF Reconstruction Stripping Analysis Calibration Express Stream Analysis Tier-1 Tier MB/s 10 MB/s Few MB/sec Simulation Tier-1 Reconstruction Stripping Analysis 25/11/200912
13 Data Analysis on the Grid The original vision: Application thin layer interacting with a powerful middleware layer Output User Algorithms Dataset Query Workload Management System Other Services “Super-WMS” to which the user throws input dataset queries plus algorithms and it spits the result out. 25/11/2009
14 Data Analysis User Analysis: Single interface for the whole analysis cycle, hide the complexity of the Grid (Ganga, CRAB, DIRAC, Alien …) Workload Management: Pilot jobs, late scheduling, VO- steered prioritisation (DIRAC, Alien, Panda …) Data Management: Topology aware higher level tools, capable of managing complex data flows (Phedex, DDM …) To use the Grid at such large scale is not an easy business! Reality today: LHC experiments have build increasingly sophisticated s/w stacks to interact with the Grid. On top of basic services: CE, SE, FTS, LFC Grid middleware Basic Services (FTS, LFC …) VO-specific user interface VO-specific WMS, DMS Computing and Storage resources 25/11/2009
Testing the LHC Grid
WLCG Service Challenges Large scale test campaigns by which the readiness of the overall LHC Computing Service to meet the requirements of the experiments has been tested. 2005, SC3: The first one where all Tier-1 centres participated. Transfer “dummy” data to try and reach high transfer throughput between sites. 2006, SC4: Target transfer rate 1,6 GB/s out of CERN reached during 1 day. Sustained 80% of this rate during long periods. More realistic data. 2008, CCRC08: Focus on having all 4 experiments testing all workflows simultaneously and keeping the service stable for a long period. 2009, STEP09: Last chance to stress-test the system before LHC start. Focus on multi-experiment workloads never tested before at large scale (e.g. massive data re-reconstruction recalling from tape) 25/11/200916
CCRC08 test (June 2008) MB/s STEP09 test (June 2009) Testing data export from CERN Example of data export CERN Tier-1s as tested by ATLAS: June-08: 2 days at 1 GB/s June-09: 2 weeks at 4 GB/s 25/11/200917
18 Performance: data volumes CMS has been transferring 100 – 200 TB per day (1 PB/week) on the Grid since more than 2 years TB/day Last June ATLAS added 4 PB in 11 days to their total of 12 PB on the Grid + 4 PB 25/11/2009
WLCG CPU Workload The CPU accounting of all Grid sites is centrally stored. Available from: Monthly CPU walltime (millions of kSI2K·hrs) ksi2k·month ~ simult. busy cores 25/11/ (Data up to 22-Nov-2009)
20 Availability Setting up and deploying robust operational tools is crucial for building reliable services on the Grid. One of the key tools for WLCG: The Service Availability Monitor 25/11/2009
21 Availability Setting up and deploying robust operational tools is crucial for building reliable services on the Grid. One of the key tools for WLCG: The Service Availability Monitor 25/11/2009
Improving Reliability An increasing number of more realistic sensors, plus a powerful monitoring framework that ensures peer pressure, guarantees that reliability of WLCG service will keep improving. 25/11/200922
CMS PIC transfers since Jan ,8 PB into PIC 4 PB out from PIC 25/11/ Level of testing of the system: Moving almost 10 TB daily avg. during 3 years.
24 Data Transfers to Tier-2s Reconstructed data sent to the T2s for analysis. Bursty nature. Experiment requirements very fuzzy for this dataflow (as fast as possible) –Links to all SP/PT Tier-2s certified with MB/s sustained –CMS Computing Model: sustained transfers to > 40 T2s worldwide ATLAS transfers PIC T2s daily avg. 200 MB/s CMS transfers PIC T2s daily avg. 100 MB/s 25/11/2009
Multi-discipline Grids for scientific research
Enabling Grids for E-sciencE EU funded project to build a production quality Grid infrastructure for scientific research in Europe. Three phases: 2004 – Outcome: the largest, most widely used multi-disciplinary Grid infrastructure in the world. – WLCG is built on top of EGEE (and OSG in the USA) Many VOs and applications registered as EGEE users. Look for yours in the App database: From 25/11/200926
Enabling Grids for E-sciencE The EGEE project contained all of the Grid stakeholders: Vision beyond EGEE-III: migrate the existing production European Grid from a project-based model to a sustainable infrastructure. – Infrastructure: European Grid Initiative ( Federated infrastructure based on National Grid Initiatives for multi- disciplinary use. Spanish Ministry of Science and Innovation signed the EGI MoU and designated CSIC as coordinator of the Spanish NGI. – Applications User community organized in Specialized Support Centers (SSCs). – Middleware Development in a separated project. Infrastructure and Applications can become “customers”. InfrastructureMiddlewareApplications 25/11/200927
EGI related projects submitted to EU Presented by C. Loomis in EGEE09 workshop, Sep-09 (link)link Astrophysics: MAGIC, et al. HEP: LHC, FAIR, et al. 25/11/200928
Spanish Network for e-Science A Network Initiative Funded by the Spanish Ministry of Science and Education. Officially Approved on Dec UPV is the coordinating institution. More than 900 researchers, 89 research groups. Organised in four areas: Grid infr.Supercomputing infr. ApplicationsMiddleware The Applications area coordinates the activities of the different users communities (see active groups and applications in the Area wiki)wiki 25/11/200929
Astroparticles MAGIC (IFAE, PIC, UCM, INSA) – Data centre at PIC Data storage, reduction and access for the collaboration Resources and tools for users’ analysis (in prep.) Publish data to the Virtual Observatory (in prep.) – Monte Carlo production “on-demand” AUGER (UAH, CETA-CIEMAT) – Run simulations on the Grid: CORSIKA, ESAF, AIRES … Two presentations from astroparticles in the last meeting of the “Red Española de e-ciencia” (Valencia Oct-09, see slides)slides 25/11/200930
Facility for Antiproton and Ion Research One of the largest projects of the ESFRI Road Map. Will provide high energy and intensity ions and antiproton beams for basic research. The computing and storage requirements for FAIR are expected to be of the order of those of the LHC or above. A detailed evaluation is under way. Two of the experiments (PANDA and CBM) have already started using the Grid for detector simulations. FAIR Baseline Technical Report: 2500 scientists, 250 institutions, 44 countries. Spain one of the 14 countries that signed the agreement for the construction of FAIR. Contributing with 2% of the cost. Civil construction expected to start in First beam expected in 2015/16. 25/11/200931
Summary In the recent years we are witnessing an explosion of the scientific data. – More precise and complex experiments. – Large international collaborations. Geographically dispersed users need to access the data. The LHC has been largely driving the activity in the last years, with the pressure of the Petabytes of data (now yes) around the corner. – WLCG, the largest Grid infrastructure in the world, has been deployed and is ready for storing, processing and analysing the LHC data. Since early 2000s, a series of EU funded projects (EGEE) have been in the core of the deployment of a Grid for scientific research in Europe. – Next round of EU projects focused in consolidating this into a sustainable infrastructure: federated model (NGIs). – Projects call closed yesterday. Stay tuned for the activity on the “Grid Users/Applications” arena (SSCs). 25/11/200932
thank you Gonzalo Merino Port d’Informació Científica (
Backup Slides
PIC Tier-1 Reliability Tier-1 reliability targets have been met for most of the months 25/11/200935
36 T0/T1 ↔ PIC data transfers Target ATLAS+CMS+LHCb ~ 210 MB/s CMS data imported from T1s CMS data exported to T1s Target ATLAS+CMS+LHCb ~ 100 MB/s ATLAS daily rate CERN PIC June 2009 Target: 76 MB/s CMS daily rate CERN PIC June 2009 Target: 60 MB/s Data import from CERN and transfers with other Tier-1s successfully tested above targets 25/11/2009
Networking Tier-1 ↔Tier-2 in Spain 25/11/200937
EGI-User interaction User community organized into series of Specialized Support Centers (SSCs) Goals of an SSC: – Increase number of active users in the community – Promote use of grid technologies within the community – Encourage cooperation within the community – Safeguard grid knowledge and expertise of the community – Build scientific collaboration within and between communities An SSC will be a central, long-lived hub for grid activities within a given scientific community. (Presented by Cal Loomis in EGEE09 conference) 25/11/200938
25/11/200939