Alain Romeyer - Dec Grid computing for CMS What is the Grid ? Let’s start with an analogy How it works ? (Some basic ideas) Grid for LHC and CMS computing model Conclusion Alain Romeyer (Mons - Belgium)
Alain Romeyer - Dec What is the Grid ? What is not a Grid? A cluster, a network attached storage device, a scientific instrument, a network, etc. Each may be an important component of a Grid, but by itself does not constitute a Grid For us : A new way of doing science !!! an integrated advanced cyber infrastructure that delivers: Computing capacity Data capacity Communication capacity Coordinated resource sharing and problem solving in dynamic no centralized control Use standard and open protocols and interfaces deliver nontrivial qualities of service
Alain Romeyer - Dec An analogy : Power electricity (on demand access) Time Quality, economies of scale
Alain Romeyer - Dec By analogy Decouple production and consumption Enable on-demand access Achieve economies of scale Enhance consumer flexibility Enable new device On a variety of scales Department Campus Enterprise Internet
Alain Romeyer - Dec Not a perfect analogy… I import electricity but must export data “Computing” is not interchangeable but highly heterogeneous Computers, data, sensors, services, … So the story is more complicated But more significantly, the sum can be greater than the parts Dynamic allocation of resources Access to distributed services Virtualization & distributed service management
Alain Romeyer - Dec How it works ? Grid responsibilities Security Infrastructure Authentication (identity) authorization (rights) Management : Information Management Soft-state, registration, discovery, selection, monitoring Resource Management Remote service invocation, reservation, allocation Resource specification Data Management High-performance, remote data access Cataloguing, replication, staging
Alain Romeyer - Dec Grid Security Infrastructure (GSI) Public key infrastructure (asymmetric) Need to be associated to a Virtual Organisation (VO) Need certificate delivered by a Certification Authority (CA) A certificate (x509 international standard) is : It contains : A subject name (identify the user/person) A user public key The identity of the CA The digital signature of the CA How it works ? Security - Authentification a digitally signed document attesting to the binding of a public key to an individual entity
Alain Romeyer - Dec How it works ? Security - Authentication CA VO Cert signing registration hash 3kjfgf*£$& Digital Signature Message Digest Public Certificate Certificate Request Encrypt Py75c%bn
Alain Romeyer - Dec Workload Manager Job control CONDOR-G Network Server Global Manager How it works ? Management LRMS Computing Element LRMS Storage Element Publish characs, status, available services… Request (JDL) Information Service Resource Location Service Where ? Status ? Best actions to satisfy the request : match-making where submit Grid status Decision Job submission Data Transfert End of job : outputs are stored in your « sand box » ask to download them
Alain Romeyer - Dec Some Grid e-science projects Sloan Digital Sky Survey ALMA LHC LHCb Atlas Alice CMS
Alain Romeyer - Dec EGEE ( Enabling Grid for E-science in Europe (2 years project) Funded by the EU, 3 core areas : 1) build a consistent, robust and secure Grid network that will attract additional computing resources. 2) continuously improve and maintain the middleware in order to deliver a reliable service to users. 3) attract new users from industry as well as science and ensure they receive the high standard of training and support they need. Two pilot application selected : Biomedical Grids (bioinformatics and healthcare data) Large Hadron Collider Computing Grid (LCG)
Alain Romeyer - Dec Phase I ( ) : development phase + series of computing data challenges Phase II (2006 – 2008) : real production and deployment phase 2 phase project LHC Computing Grid (LCG) physicist working together PetaBytes of data will be generated each year (20 millions CDs == 20 km) Analysing this will require the equivalent of 70,000 of today's fastest PC processors (~192 years) LCG goal : prepare the computing infrastructure for the simulation, processing and analysis of LHC data for the 4 experiments.
Alain Romeyer - Dec LCG status 22/09/2004 Total Sites : 82 Total CPUs : 7269 Total Storage : 6558 (TB)
Alain Romeyer - Dec CMS data production at LHC 40 MHz (1000 TB/sec) Level 1 Trigger 75 KHz (50 GB/sec) 1 bunch crossing Every 25 ns p pp High Level Trigger 100 Hz (100 MB/sec) Data Recording & Offline Analysis Cluster for the Trigger ~ 1000 – 2000 PCs
Alain Romeyer - Dec CMS computing model Online System CERN Center PBs of Disk; Tape Robot Tier 1 FNAL Center INFN Center ~ Gbps IN2P3 Center RAL Center ~ MBytes/sec ~PByte/sec Tier 0 +1 Experiment Gbps Workstations Tier 4 Tier2 Center Institute 0.1 to 10 Gbps Physics data cache Tier2 Center ~ Gbps Tier 3 Tier2 Center Tier 2 Physicists work on analysis “channels”. data for these channels should be cached by the institute server
Alain Romeyer - Dec DC04 Data Challenge T0 T0 at CERN in DC04 25 Hz input event rate Reconstruct quasi-realtime Events filtered into streams Distribute data to T1’s PIC Barcelona FZK Karlsruhe CNAF Bologna RAL Oxford IN2P3 Lyon T1 FNAL Chicago T1 T1 centres in DC04 Pull data from T0 to T1 and store Make data available to PRS Demonstrate quasi-realtime “fake” analysis March-April 2004
Alain Romeyer - Dec DC04 Processing Rate Processed about 30M events T0 events processed vs. days Got above 25Hz on many short occasions Only one full day >25Hz with full system T0 event processing rate (Hz) Next challenge: make it useable by average physicists …and demonstrate that the performance scales acceptably DC04 demonstrated that the system can work…at least for well controlled data flow / analysis, and for a few expert users
Alain Romeyer - Dec Conclusion Grid becomes a reality Management is the crucial issue that is not fully implemented will be done by the EGEE project For the HEP, LCG II already available and working CMS DC04 has showed that the system starts to work Next data challenge will be crucial : Usable by standard physicist Performances reasonable for LHC
Alain Romeyer - Dec Conclusion Belgrid project ( « a Belgian Grid initiative « Regroups academic, public and private partners Goal : share the local computing resources using Grid technologies Status : GridFTP between sites is working Plan : distributed computing BEgrid (belnet) : grid computing for the Belgian Research Belnet : official CA -> certificate also valid for use in EGEE 5 universities connected (KULeuven, UA, UG, ULB and VUB)KULeuvenUAUGULBVUB LCG II and follow the EGEE middleware