Extension of DIRAC to enable distributed computing using Windows resources 3 rd EGEE User Forum 11-14 February 2008, Clermont-Ferrand J. Coles, Y. Y. Li,

Slides:



Advertisements
Similar presentations
Ying Ying Li Windows Implementation of LHCb Experiment Workload Management System DIRAC LHCb is one of the four main high energy physics experiments at.
Advertisements

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Job Submission The European DataGrid Project Team
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Enabling Grids for E-sciencE EGEE-II INFSO-RI BG induction to GRID Computing and EGEE project – Sofia, 2006 Practical: Porting applications.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
TRASC Globus Application Launcher VPAC Development Team Sudarshan Ramachandran.
Computational grids and grids projects DSS,
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
SLICE Simulation for LHCb and Integrated Control Environment Gennady Kuznetsov & Glenn Patrick (RAL) Cosener’s House Workshop 23 rd May 2002.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
1 DIRAC Interfaces  APIs  Shells  Command lines  Web interfaces  Portals  DIRAC on a laptop  DIRAC on Windows.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
LHCb-ATLAS GANGA Workshop, 21 April 2004, CERN 1 DIRAC Software distribution A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
1 Grid2Win: porting of gLite middleware to Windows Dario Russo INFN Catania
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid2Win: Porting of gLite middleware to.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
GRID Security & DIRAC A. Casajus R. Graciani A. Tsaregorodtsev.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
The Institute of High Energy of Physics, Chinese Academy of Sciences Sharing LCG files across different platforms Cheng Yaodong, Wang Lu, Liu Aigui, Chen.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
The GridPP DIRAC project DIRAC for non-LHC communities.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
L’analisi in LHCb Angelo Carbone INFN Bologna
Belle II Physics Analysis Center at TIFR
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Grid2Win: Porting of gLite middleware to Windows XP platform
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Short update on the latest gLite status
The LHCb Computing Data Challenge DC06
Presentation transcript:

Extension of DIRAC to enable distributed computing using Windows resources 3 rd EGEE User Forum February 2008, Clermont-Ferrand J. Coles, Y. Y. Li, K. Harrison, A. Tsaregorodtsev, M. A. Parker, V. Lyutsarev

13th Feb 2008University of Cambridge2 Overview  Why port to Windows and who is involved?  DIRAC overview  Porting process  Client (job creation/submission)  Agents (job processing)  Resources  Successes/usage  Deployment  Summary

13th Feb 2008University of Cambridge3 Motivation  Aim:  Enabling Windows computing resources in the LHCb workload and data management system DIRAC  Allow what can be done under Linux to be possible under Windows  Motivation:  To increase the number CPU resources available to LHCb for production and analysis  To offer a service to Windows users  Allow transparent job submissions and execution on Linux and Windows  Who’s involved:  Cambridge, Cavendish – Ying Ying Li, Karl Harrison, Andy Parker  Marseilles, CPPM - Andrei Tsaregorodtsev (DIRAC Architect)  Microsoft Research – Vassily Lyutsarev

13th Feb DIRAC Overview  Distributed Infrastructure with Remote Agent Control  LHCb’s distributed production and analysis workload and data management system  Written in Python  4 sections  Client  User interface  Services  DIRAC Work Management System, based on the main Linux server  Agents  Resources  CPU resources and Data storage

13th Feb 2008University of Cambridge5 DISET security module  DIRAC Security Transport module – underlying security module of DIRAC  Provides grid authentication and encryption (using X509 certificates and grid proxies) between the DIRAC components  Uses OpenSSL with pyOpenSSL (DIRAC’s modified version) wrapped around it.  Standard: Implements Secure Sockets Layer and Transport Layer Security, and contains cryptographic algorithm.  Additional: Grid proxy support  Pre-built OpenSSL and pyOpenSSL libraries are shipped with DIRAC  Windows libraries are provided alongside Linux libraries, allowing appropriate libraries to be loaded at run time  Proxy generation under Windows  Multi-platform command: dirac-proxy-init  Validation of generated proxy is checked under both Windows and Linux

13th Feb 2008University of Cambridge6 Client – job submissions  Submissions made with valid grid proxy  Three Ways  JDL (Job Description Language)  DIRAC API  Ganga  Built on DIRAC API commands  Currently under porting process to Windows  Successful job submission returns job ID, provided by Job Monitoring Service SoftwarePackages = { “DaVinci.v12r15" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC04/v2/ /DST/Presel_ _ dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v12r15.log” “DVhbook.root” }; JobType = "user"; import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinci', 'v12r15') job.setInputSandbox(['DaVinci.opts’] ) job.setInputData(['LFN:/lhcb/producti on/DC04/v2/ /DST/Presel_ _ dst']) job.setOutputSandbox([‘DaVinci_v12 r15.log’, ‘DVhbook.root’]) dirac.submit(job) > myjob.py or enter directly in python under Windows > dirac-job-submit.py myjob.jdl Under Windows JDL API

13th Feb 2008University of Cambridge7 DIRAC Agent under Windows  Python installation script  Downloads and installs DIRAC software, and sets up DIRAC Agent  Agents are initiated on free resources  Agent Job retrieval:  Run DIRAC Agent to see if there are any suitable jobs on the server.  Agent retrieves any matched jobs.  Agent Reports to Job Monitoring Service of job status  Agent downloads and installs required applications to run the job.  Agent retrieves any required data.  (see next slide)  Agent creates Job Wrapper to run the job (wrapper platform aware).  Upload output to storage if requested Windows Sites Linux Sites

13th Feb 2008University of Cambridge8 Data access  Data access to LHCb’s distributed data storage system requires:  Access to LFC (LCG File Catalogue, maps LFNs (Logical File Names) to the PFNs (Physical File Names))  Access to the Storage Element  On Windows a catalogue client is provided via the DIRAC portal service  Uses DIRAC’s security module DISET and a valid user’s grid proxy  Authenticates to Proxy server, and proxy server contacts File catalogue on user’s behalf with its own credentials  Uses.NetGridFTP client provided by University of Virginia  Based on GridFTP v1, from tests it seems to be compatible with GridFTP server used by LHCb (edg uses GridFTP client and globus GT2)  Client contains functions needed for file transfers  get, put, mkdir  And a batch tool that mimics the command flags of globus-url-copy  Requirements: .Net v2.0 .NetGridFTP binaries are shipped with DIRAC  Allows full data registration and transfer to any Storage Element supporting GridFTP

13th Feb 2008University of Cambridge9 DIRAC CE backends  DIRAC provides a variety of Compute Element backends under Linux:  Inprocess (standalone machine), LCG, Condor etc…  Windows:  Inprocess  Agent loops in preset intervals assessing the status of the resource  Microsoft Windows Compute Cluster  Additional Windows specific CE backend  Requires one shared installation of DIRAC and applications on the Head node of the cluster  Agents are initiated from the Head node, and communicates with the Compute Cluster Services  Job outputs are uploaded to the Sandboxes directly from the worker nodes

13th Feb 2008University of Cambridge10 LHCb applications  Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci Python: Bender) Gauss Event Generation Detector Simulation Boole Digitalisation Brunel Reconstruction DaVinci Analysis Bender Sim DST Statistics RAWRAWmc Data flow from detector MC Production Job Analysis Job Sim – Simulation data format RAWmc – RAW Monte Carlo, equivalent to RAW data format from detector DST – Data Storage Tape

13th Feb 2008University of Cambridge11 Gauss  Most LHCb applications are compiled for both Linux and Windows  For historical reasons, we use Microsoft Visual Studio.Net 2003  Gauss – only application, previously not compiled under Windows.  Gauss relies on three major pieces of software not developed by LHCb  Pythia6: simulation of particle production – Legacy Fortran code  EvtGen: Simulation of particle decays – C++  Geant4: Simulation of detector – C++  Gauss needs each of the above to run under Windows  Work strongly supported by LHCb and LCG software teams  All third-party software now successfully built under Windows  Most build errors have resulted from Windows compiler being less tolerant of “risky coding” than gcc  Insist on arguments passed to function being of correct type  More strict about memory management  Good for forcing code improvements!  Able to fully build Gauss under Windows with both Generator and Simulation parts  We are able to produce full Gauss jobs of BBbar events, with comparable distributions to those produced under Linux  Have installed and tested Gauss v30r4 on Cambridge cluster  Latest release of Gauss v30r5  First fully Windows compatible release  Contains both pre-built GEANT4 and Generator Windows binaries

13th Feb 2008University of Cambridge12 Cross-platform job submissions  Job creation and submission process is the same under both Linux and Windows (i.e. uses the same DIRAC API commands, and the same steps)  Two current types of main LHCb grid jobs  MC Production Jobs – CPU intensive, no input required. Potentially ideal for ‘CPU scavenging’ jobs  Recent efforts (Y.Y.Li, K.Harrison) allowed Gauss to compile under Windows (see previous slide)  A full MC production chain is still to be demonstrated on Windows  Analysis Jobs – Requires input (data, private algorithms, etc …)  DaVinci, Brunel, Boole  Note: requires C++ compiler for customised user algorithms  Jobs submitted with libraries are bound to the same platform for processing  Platform requirements can be added during job submission  Bender (Python)  Note: no compiler, linker or private library required  Allows cross-platform analysis jobs to be performed  Results retrieved to local computer via >dirac_job_get_output.py 1234 results in the outputsandbox >dirac-rm-get(LFN) this uses GridFTP to retrieve outputdata from a Grid SE

13th Feb 2008University of Cambridge13 DIRAC Widows usage  DIRAC is supported on two Windows platforms  Windows XP  Windows Server 2003  Use of DIRAC to run LHCb physics analysis under Windows  Comparison between DC04 and DC06 data on B ± →D 0 (K s π + π - )K ± channel  917,000 DC04 events processed under Windows, per selection run  ~48hours total CPU time on 4 nodes  Further ~200 jobs (totalling ~4.7 million events) submitted from Windows to DIRAC, processing on LCG, retrieved on Windows  Further selection background studies are currently being carried out with the system  Processing speed comparisons between Linux and Windows  Difficult, as currently the Windows binaries are built in debug mode by default

13th Feb 2008University of Cambridge14 DIRAC deployment PlatformHardware Number of CPUs Available Disk Size Compute Element Backend Bristol Windows XP Professional Intel® Pentium® 4CPU 2.00GHz 2.00GHz, 504MB of RAM 437.2GB on C: driveInprocess Cambridge Windows XP Professional Dell Optiplex GX745 Intel® Core™2 CPU 22.13GHz 2.13GHz, 2.99GB of RAM 22.13GHz 2 Mapped drives can be linked to Cambridge HEP group storage disks Inprocess Windows Server 2003 x64 + Compute Cluster Pack 2006 AMD Athlon™ 64x2 Dual Core Processor GHz, 2.00GB of RAM 4 nodes available, with a total of 8CPU Compute Cluster Laptop Windows XP Tablet Intel® Pentium® M processor 2.00GHz 1.99GHz, 512MB of RAM 2-Inprocess Oxford Windows Server 2003 x64 + Compute Cluster Pack 2006 Intel® Xeon™ CPU 2.66GHz 2.66GHz, 31.9 GB of RAM 22 nodes available, with a total of 100CPU 208GB on Mapped disk Compute Cluster Windows Server 2003 Intel® Xeon™ CPU 2.80GHz 2.80GHz, 2.00GB of RAM 2 136GB on local C: drive Inprocess Birmingham Windows Server compute Cluster Pack machines, 4 core’s each -Compute Cluster

13th Feb 2008University of Cambridge15 Windows wrapping  Bulk of DIRAC python code was already platform independent  However not all python modules are platform independent  Three types of code modifications/additions:  Platform specific libraries and binaries (e.g. OpenSSL, pyOpenSSL,.NetGridFTP)  Additional Windows specific code (e.g. Windows Compute Cluster CE backend,.bat files to match Linux shell scripts)  Minor Python code modifications (e.g. changing process forks to threads)  Dirac installation ~ 60MB  Per LHCb application ~ 7GB Unmodified 60% Windows Specific 6% Modified for cross-platform compatibility 34% Windows port modifications by file size of used DIRAC code

13th Feb 2008University of Cambridge16 Summary  Working DIRAC v2r11, and able to integrate both Windows standalone and cluster CPUs to existing Linux system  Porting – replacement of Linux specific python code & provision of windows equivalents where platform independence not possible (e.g. pre-compiled libs, secure file transfers…)  Windows platforms tested:  Windows XP  Windows Server 2003  Cross-platform job submissions and retrievals  Little change to syntax for user  Full analysis jobs cycle on Windows, from algorithm development to results analysis. (Bender  Running(linux)  Getting results )  Continued use for further physics studies  All applications for MC production jobs tested  Deployment extended to three site so far, totalling 100+ Windows CPUs.  Two Windows Compute Cluster sites Requirements Python 2.4 PyWin32 (Windows specific python module) Grid Certificate Future plans:  Test the full production chain  Deploy on further systems/sites e.g. Birmingham  Larger scale test  Continued usage for physics studies  Provide a useful tool when LHC data arrives

13th Feb 2008University of Cambridge17 Backup slides

13th Feb 2008University of Cambridge18 Cross-platform compatibility LanguageBinaries Available GangaPython- DIRACPython Linux/Windows compatible LHCb Applications GaussC++SLC3, SLC4, Win32 BooleC++SLC3, SLC4, Win32 BrunelC++SLC3, SLC4, Win32 DaVinciC++SLC3, SLC4, Win32 BenderPython Linux/Windows compatible

13th Feb 2008University of Cambridge19 DIRAC Head Node Job Submission by User DaVinci Software Repository DISET Local SE DIRAC Proxy Server Agent Watch- dog Wrapper Job WMS Job Management Service Sandbox Service Job Matcher LFC Service Job Monitoring Service