Data Processing and the LHC Computing Grid (LCG) Jamie Shiers Database Group, IT Division CERN, Geneva, Switzerland
Jamie ShiersLCG Data Processing 2 Overview Brief overview / recap of LHC (emphasis on Data) The LHC Computing Grid The Importance of the Grid The role of the Database Group (IT-DB) Summary
LHC Overview
CERN – European Organization for Nuclear Research
The LHC Machine
Jamie ShiersLCG Data Processing 6 CMS Data Rates: 1PB/s from detector 100MB/s – 1.5GB/s to ‘disk’ 5-10PB growth / year ~3GB/s per PB of data Data Processing: 100,000 of today’s fastest PCs Level 1 Level 2 40 MHz (1000 TB/sec) Level 3 75 KHz (75 GB/sec) 5 KHz (5 GB/sec) 100 Hz (100 MB/sec) Data Recording & Offline Analysis
selectivity: 1 in person in a thousand world populations - A needle in 20 million haystacks LHC: Higgs Decay into 4 muons
RAWRAW ESDESD AODAOD TAG random seq. 1PB/yr (1PB/s prior to reduction!) 100TB/yr 10TB/yr 1TB/yr Data Users Tier0 Tier1
Jamie ShiersLCG Data Processing 9 LHC Data Grid Hierarchy Tier 1 Tier2 Center Online System CERN 700k SI95 ~1 PB Disk; Tape Robot FNAL: 200k SI95; 600 TB IN2P3 Center INFN Center RAL Center Institute Institute ~0.25TIPS Workstations ~ MBytes/sec 2.5 Gbps Mbits/sec Physics data cache ~PByte/sec ~2.5 Gbits/sec Tier2 Center ~2.5 Gbps Tier 0 +1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/( Tier1)/( Tier2) ~1:1:1
Jamie ShiersLCG Data Processing 10 HEP Data Analysis Physicists work on analysis “channels” Find collisions with similar features Physics extracted by collective iterative discovery – small groups of professors and students Each institute has ~10 physicists working on one or more channels Order 1000 physicists in 100 institutes in 10 countries
Jamie ShiersLCG Data Processing 11 LHC Computing Characteristics Perfect parallelism Independent events (collisions) bulk of the data is read-only – in conventional files New versions rather than updates meta-data (few %) in databases very large aggregate requirements computation, data, i/o chaotic workload – unpredictable demand, data access patterns no limit to the requirements
The LHC Computing Grid (LCG)
Jamie ShiersLCG Data Processing 13 From Mainframes to the Grid
Jamie ShiersLCG Data Processing 14
Jamie ShiersLCG Data Processing 15 The GRID Vision Computing resources Data Knowledge Instruments People Solution Complex problem GRID
Jamie ShiersLCG Data Processing 16 And Reality… physicist Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Germany USA UK France Italy ………. CERN Tier 1 Japan CERN Tier 0
CMS ATLAS LHCb CERN Tier 0 Centre at CERN grid for a physics study group grid for a regional group Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 physics department Desktop Germany Tier 1 USA UK France Italy Spain CERN Tier 1 Japan The LHC Computing Centre The promise of Grid technology CERN Tier 0 The opportunity of Grid technology
Jamie ShiersLCG Data Processing 18 Virtual Computing Centre The user --- sees the image of a single cluster does not need to know - where the data is - where the processing capacity is - how things are interconnected - the details of the different hardware and is not concerned by the conflicting policies of the equipment owners and managers
Jamie ShiersLCG Data Processing 19 The LHC Computing Grid Project applications support – develop and support the common tools, frameworks, and environment needed by the physics applications computing system – build and operate a global data analysis environment -integrating large local computing fabrics -and high bandwidth networks -to provide a service for ~6K researchers -in over ~40 countries Goal – Prepare and deploy the LHC computing environment This is not yet another grid technology project – it is a grid deployment project
Jamie ShiersLCG Data Processing 20 The LHC Computing Grid Project Phase 1 – Development and prototyping Operate a 50% prototype of the facility needed by one of the larger experiments Two phases Phase 2 – Installation and operation of the full world-wide initial production Grid for all four experiments
Jamie ShiersLCG Data Processing 21 Leveraging Other Grid Projects US projects European projects Many national, regional Grid projects -- GridPP(UK), INFN-grid(I), NorduGrid, … C ross G rid significant R&D funding for Grid middleware scope for divergence global grids need standards the trick will be to recognise and be willing to migrate to the winning solutions
Jamie ShiersLCG Data Processing 22 LHC Computing Grid Project The first Milestone - within one year - deploy a Global Grid Service sustained 24 X 7 service including sites from three continents identical or compatible Grid middleware and infrastructure several times the capacity of the CERN facility and as easy to use Having stabilised this base service – progressive evolution – number of nodes, performance, capacity and quality of service integrate new middleware functionality migrate to de facto standards as soon as they emerge
Jamie ShiersLCG Data Processing 23 LCG Production CMS preparing for distributed data challenge, starting Q3 2003, ending Q Tier0 (CERN), 2-3 Tier1, 5-10 Tier2 Total data volume ~100TB Need to be production ready with Grid Computing System and Applications by 1 st July 2003
The Importance of the Grid
Jamie ShiersLCG Data Processing 25 Birth of the Web Original proposal – explosion inside HEP 1993 – explosion across the world Largely due to NCSA Mosaic browser Now totally ubiquitous: every firm must have a Website!
Jamie ShiersLCG Data Processing 26
Jamie ShiersLCG Data Processing 27 US Grid Projects NASA Information Power Grid DOE Science Grid NSF National Virtual Observatory NSF GriPhyN DOE Particle Physics Data Grid NSF TeraGrid DOE ASCI Grid DOE Earth Systems Grid DARPA CoABS Grid NEESGrid DOH BIRN NSF iVDGL
Jamie ShiersLCG Data Processing 28 European Grid Projects UK e-Science Grid Netherlands – VLAM, PolderGrid Germany – UNICORE, Grid proposal France – Grid funding approved Italy – INFN Grid Eire – Grid proposals Switzerland - Network/Grid proposal Hungary – DemoGrid, Grid proposal Nordic Grid … SPAIN:
Jamie ShiersLCG Data Processing 29 EU GridProjects DataGrid (CERN,..) EuroGrid (Unicore) DataTag (TTT…) Astrophysical Virtual Observatory GRIP (Globus/Unicore) GRIA (Industrial applications) GridLab (Cactus Toolkit) CrossGrid (Infrastructure Components) EGSO (Solar Physics)
Jamie ShiersLCG Data Processing 30 IBM and the Grid Interview with Irving Wladawsky-Berger ‘Grid computing is a set of research management services that sit on top of the OS to link different systems together’ ‘We will work with the Globus community to build this layer of software to help share resources’ ‘All of our systems will be enabled to work with the grid, and all of our middleware will integrate with the software’
Jamie ShiersLCG Data Processing 31 Industrial Engagement? ‘Grid Computing is one of the three next big things for Sun and our customers’ Ed Zander, COO Sun ‘The alignment of OGSA with XML Web services is important because it will make Internet-scale, distributed Grid Computing possible’ Robert Wahbe, General Manager of Web Services, Microsoft Oracle: starting Grid activities…
Jamie ShiersLCG Data Processing 32 HP and Grids The Grid fabric will be: Soft – share everything, failure tolerant Dynamic – resources will constantly come and go, no steady state, ever Federated – global structure not owned by any single authority Heterogeneous – from supercomputer clusters to P2P PCs John Manley, HP Labs
The Role of the Database Group
Jamie ShiersLCG Data Processing 34 CERN-IT-DB Provides… Database Infrastructure for CERN laboratory Currently based on Oracle – all sectors Applications support in certain key areas Oracle Application Server, Engineering Data Management Service Physics Data Management support Services for the LHC experiments, Applications, … Grid Data Management European DataGrid WP2 – Data Management Corresponding LCG Services LCG Persistency Project: POOL
Jamie ShiersLCG Data Processing 35 Los Endos