Presentation is loading. Please wait.

Presentation is loading. Please wait.

D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol

Similar presentations


Presentation on theme: "D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol"— Presentation transcript:

1 LHCb Distributed Computing and the Grid Nick Brook University of Bristol
D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol K. Harrison Cambridge E. Van Herwijnen, J. Closier, P. Mato CERN A. Khan Edinburgh A. Tsaregorodtsev Marseille H. Bulten, S. Klous Nikhef F. Harris, I. McArthur, A. Soroko Oxford G. N. Patrick, G. Kuznetsov RAL 27th June 2002 Nick Brook ACAT' 02

2 Overview of presentation
Current organisation of LHCb distributed computing UK facilities and support through GridPP Current use of Globus and EDG middleware Planning for data challenges and the use of Grid Current LHCb Grid/applications R/D Conclusions 27th June 2002 Nick Brook ACAT' 02

3 History of distributed MC production
Distributed System has been running for 3+ years & processed many millions of events for LHCb design. Main production sites: CERN, Bologna, Liverpool, Lyon, NIKHEF & RAL Globus already used for job submission to RAL and Lyon System interfaced to GRID and demonstrated at EU-DG Review and NeSC/UK Opening. For 2002 Data Challenges, adding new institutes: Bristol, Cambridge, Oxford, ScotGrid In 2003, add Barcelona, Moscow, Germany, Switzerland & Poland. 27th June 2002 Nick Brook ACAT' 02

4 Current Architecture Production Manager Physics Coordinator Physicist
Create no. of jobs (500 events each) Determine configuration Run executable Check data Copy data/logs Physics Coordinator Physicist Job Creation/Submission via Web Identify outstanding requests Select workflow Create scripts via Java servlets. Monitoring via PVSS Submit jobs to distributed sites See what jobs are running Check configuration Kill jobs, etc Bookkeeping Database 27th June 2002 Nick Brook ACAT' 02

5 LOGICAL FLOW Submit jobs remotely via Web Execute on farm Data quality
Analysis Execute on farm Data quality check Update bookkeeping database Transfer data to mass store 27th June 2002 Nick Brook ACAT' 02

6 Monitoring and Control of MC jobs
LHCb has adopted PVSS II as prototype control and monitoring system for MC production. PVSS is a commercial SCADA (Supervisory Control And Data Acquisition) product developed by ETM. Adopted as Control framework for LHC Joint Controls Project (JCOP). Available for Linux and Windows platforms. 27th June 2002 Nick Brook ACAT' 02

7 27th June 2002 Nick Brook ACAT' 02

8 UK Tier 1 - RAL 2004 Scale: 1000 CPUs 0.5 PBytes New Computing Farm
4 racks holding 156 dual 1.4GHz Pentium III cpus. Each box has 1GB of memory, a 40GB internal disk and 100Mb ethernet. Tape Robot upgraded last year uses 60GB STK 9940 tapes 45TB current capacity could hold 330TB. 50TByte disk-based Mass Storage Unit after RAID 5 overhead. PCs are clustered on network switches with up to 8x1000Mb ethernet out of each rack. 2004 Scale: 1000 CPUs 0.5 PBytes 27th June 2002 Nick Brook ACAT' 02

9 Optimisation of Number of Nodes? Relative size dependent
UK Regional Centres Local Perspective: Consolidate Research Computing Optimisation of Number of Nodes? 4 Relative size dependent on funding dynamics 27th June 2002 Nick Brook ACAT' 02

10 UK Prototype Tier2 - ScotGrid
ScotGrid Processing nodes at Glasgow 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and dual ethernet 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and Mbit/s ethernet 1TB disk LTO/Ultrium Tape Library Cisco ethernet switches ScotGrid Storage at Edinburgh IBM X Series 370 PIII Xeon with 512 MB memory 32 x 512 MB RAM 70 x 73.4 GB IBM FC Hot-Swap HDD 2004 Scale: 300 CPUs 0.1 PBytes 27th June 2002 Nick Brook ACAT' 02

11 GridPP support 2 LHCb posts:
to work on Gaudi (software framework) persistency services to work on MC monitoring and control software 2 ATLAS/LHCb : Gaudi/GANGA posts: Interface between software framework and Grid services 27th June 2002 Nick Brook ACAT' 02

12 Current Use of Grid Middleware in development system
Authentication grid-proxy-init Job submission to DataGrid dg-job-submit Monitoring and control dg-job-status dg-job-cancel dg-job-get-output Data publication and replication globus-url-copy, GDMP Resource scheduling – use of CERN MSS JDL, sandboxes, storage elements 27th June 2002 Nick Brook ACAT' 02

13 Example 1: Job Submission
dg-job-submit /home/evh/sicb/sicb/bbincl jdl -o /home/evh/logsub/ bbincl jdl: # Executable = "script_prod"; Arguments = " ,v235r4dst,v233r2"; StdOutput = "file output"; StdError = "file err"; InputSandbox = {"/home/evhtbed/scripts/x509up_u149","/home/evhtbed/sicb/mcsend","/home/evhtbed/sicb/fsize","/home/evhtbed/sicb/cdispose.class","/home/evhtbed/v235r4dst.tar.gz","/home/evhtbed/sicb/sicb/bbincl sh","/home/evhtbed/script_prod","/home/evhtbed/sicb/sicb dat","/home/evhtbed/sicb/sicb dat","/home/evhtbed/sicb/sicb dat","/home/evhtbed/v233r2.tar.gz"}; OutputSandbox = {"job txt","D ","file output","file err","job txt","job txt"}; 27th June 2002 Nick Brook ACAT' 02

14 Example 2: Data Publishing & Replication
Compute Element Storage Element Local disk MSS Job Data globus-url-copy Data register-local-file CERN TESTBED publish Replica Catalogue NIKHEF - Amsterdam REST-OF-GRID replica-get Job Data Storage Element 27th June 2002 Nick Brook ACAT' 02

15 LHCb Data Challenge 1 (July-September 2002)
Physics Data Challenge (PDC) for detector, physics and trigger evaluations based on existing MC production system – small amount of Grid tech to start with Generate ~3*107 events (signal + specific background + generic b and c + min bias) Computing Data Challenge (CDC) for checking developing software will make more extensive use of Grid middleware Components will be incorporated into PDC once proven in CDC 27th June 2002 Nick Brook ACAT' 02

16 LHCb software framework - Gaudi
Converter Algorithm Event Data Service Persistency Data Files Transient Event Store Detec. Data Transient Detector Store Message JobOptions Particle Prop. Other Services Histogram Transient Histogram Store Application Manager 27th June 2002 Nick Brook ACAT' 02

17 GANGA: Gaudi ANd Grid Alliance Joint Atlas (C. Tull) and LHCb (P
GANGA: Gaudi ANd Grid Alliance Joint Atlas (C. Tull) and LHCb (P. Mato) project, formally supported by GridPP/UK with 2 joint Atlas/LHCb research posts at Cambridge and Oxford Application facilitating end-user physicists and production managers the use of Grid services for running Gaudi/Athena jobs. a GUI based application that should help for the complete job life-time: - job preparation and configuration - resource booking - job submission - job monitoring and control GANGA GUI Collective & Resource Grid Services Histograms Monitoring Results JobOptions Algorithms GAUDI Program 27th June 2002 Nick Brook ACAT' 02

18 Required functionality
Before Gaudi/Athena program starts Security (obtaining certificates and credentials) Job configuration (algorithm configuration, input data selection, ...) Resource booking and policy checking (CPU, storage, network) Installation of required software components Job preparation and submission While Gaudi/Athena program is running: Job monitoring (generic and specific) Job control (suspend, abort, ...) After program has finished: Data management (registration) 27th June 2002 Nick Brook ACAT' 02

19 Python Bus Design (A possible model for implementation)
Internet GRID Athena\GAUDI Local user GaudiPython Remote user HTML page Job Configuration DB Bookkeeping Production GUI Java Module OS Module EDG API PythonROOT PYTHON SW BUS GAUDI client Workspaces 27th June 2002 Nick Brook ACAT' 02

20 Conclusions LHCb already has distributed MC production using GRID facilities for job submission We are embarking on large scale data challenges commencing July 2002, and we are developing our analysis model Grid middleware will be being progressively integrated into our production environment as it matures (starting with EDG, and looking forward to GLUE) R/D projects are in place for interfacing users (production + analysis) and Gaudi/Athena software framework to Grid services for putting production system into integrated Grid environment with monitoring and control All work being conducted in close participation with EDG and LCG projects Ongoing evaluations of EDG middleware with physics jobs Participate in LCG working groups e.g. Report on ‘Common use cases for a HEP Common Application layer’ 27th June 2002 Nick Brook ACAT' 02


Download ppt "D. Galli, U. Marconi, V. Vagnoni INFN Bologna N. Brook Bristol"

Similar presentations


Ads by Google