Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Focus CMS@Home Dr Ivan D Reid Brunel University London 02/09/2016 Ivan D Reid.

Similar presentations


Presentation on theme: "External Focus CMS@Home Dr Ivan D Reid Brunel University London 02/09/2016 Ivan D Reid."— Presentation transcript:

1 External Focus CMS@Home
Dr Ivan D Reid Brunel University London 02/09/2016 Ivan D Reid

2 What is was Developmental distributed computing using Volunteer contributions, similar to Based on the framework BOINC (Berkeley Open Infrastructure for Network Computing) Also uses the VirtualBox virtualiser; CMSSW jobs are run in a Red Hat Linux VM on Windows, Linux, and Macintosh PCs. VM includes cvmfs to allow access to CMS software and data 02/09/2016 Ivan D Reid

3 Why? Snapshot of ATLAS Dashboard on 12/01/2016: 10,000 jobs running at once… Why me? 02/09/2016 Ivan D Reid

4 …to the Volunteer Installs BOINC and VirtualBox if necessary
Signs up to at using an invitation code (available on request) Enables Task download for the project, and watches as they arrive and run Can watch progress on a remote console (or Dashboard) and locally view logs in a browser 02/09/2016 Ivan D Reid

5 …to the Manager Creates a CMSSW project area and designs a suitable work-flow (Monte-Carlo only, no data) Submits batch jobs using CRAB3; special commands direct jobs to a dedicated Condor server at RAL Logs onto the Condor server to adjust job parameters Monitors batch on Dashboard and via logs on the server Monitors proxy certificate (default 7 days), manually installs new one if necessary; also periodically archives logs to save disk space It was intended that results come to local SE; not yet implemented, data remains on the CERN Data-Bridge 02/09/2016 Ivan D Reid

6 Finer Details The BOINC “tasks” just run the VM for 24 hrs
VM sends out requests to the Condor server and receives a CMSSW job from the batch When a job completes, its output files are copied to the Data-Bridge, and a new request is made The Volunteer receives BOINC credit for the time the VM is running, not for jobs completed 02/09/2016 Ivan D Reid

7 It’s Actually Not That Simple… The Way it Was
CMS Volunteer Infrastructure Volunteer CA Volunteer GET Glidein VM Agent Wrapper Glidein WMS CRT GET Proxy VCCS* glidein Jobs Join Condor Pool CRAB3 VBoxwrapper PUT Output DataBridge Task DynaFed Grid Async stage-out S3 FTS Happy User *Volunteer Computing Credential Service 02/09/2016 Ivan D Reid

8 Implementation The Way it is Now
Application Server Common Infrastructure Volunteer’s machine VCCS GET Proxy Job Manager Condor Volunteer VM condor_submit Instant Glidein DataBridge Join Pool gfal-copy PUT DynaFed Data IO S3 VBoxwrapper Grid FTS

9 Sandboxing and Authentication
Trusted Domain Untrusted Domain IGTF CA Online CA Request Certificate Request Certificate VCCS Sandbox GET Proxy Submit Job Get Job Grid Service Jobs VM Grid Data Transfer Data Data Transfer

10 Volunteer Considerations
Because of the special nature of tasks, some special considerations apply: Hosts should preferably run 24/7; stopping during a job risks losing the job (this has been considerably improved lately) High download bandwidth is required to fetch the VM, etc. Where upload bandwidth is limited (e.g. home ADSL), care must be taken running multiple hosts to avoid saturation 02/09/2016 Ivan D Reid

11 That was Then, This is Now
Organisation at CERN has changed somewhat this year, to enable a more common approach has become vLHCathome-dev Run development apps for several projects: CMS Simulation LHCb Simulation Theory Simulation ATLAS Simulation ALICE Simulation Sixtrack Simulation Benchmark Application 02/09/2016 Ivan D Reid

12 For “Production” The old Test4Theory has become vLHCathome
Has stable apps for Theory Simulations (Test4Theory) CMS Simulations – now opened up to any user LHCb Simulations News last week: 3 TRILLION EVENTS REACHED BY TEST4THEORY TODAY !!! 02/09/2016 Ivan D Reid

13 What Does CMS Run We can run practically any Monte-Carlo simulation on any version of CMSSW available through cvmfs; data analysis is not possible because of Volunteer bandwidth constraints Initially ran CMSSW_6_2_0_SLHC26_patch3 jobs for MinimumBias and TTbar Aimed for jobs ~2 hr with results O(100 MB) 02/09/2016 Ivan D Reid

14 Current Workflow To try to better match Volunteers’ bandwidth, we looked for a workflow that would have reduced output/hour and also be of some use to CMS Found a B-physics GEN-SIM request with low efficiency ( ) which had been put on hold, and started submitting jobs for it 02/09/2016 Ivan D Reid

15 Latest Performance Currently run jobs with 200,000 events, in batches of 10,000 jobs Median wall-time per job is 2h25m (~20m – 20h) Average result file ~16 MB Since the end of June we have processed requests for >22 billion events Dashboard currently reports ~98% success, but we have lost jobs in stage-out to DataBridge 02/09/2016 Ivan D Reid

16 Process The decay of Λ0b to p, µ and ν
The process is a background to Bs -> µµ due to a misidentified proton as a muon 8,822 result files on DataBridge from one batch (Dashboard claims 9,784 successes) were analysed and found to contain 58,196 events This gives a fraction of 58196/(8822*2e5)= 3.3e-5 02/09/2016 Ivan D Reid

17 pµ Invariant Mass, p pT and µ pT
02/09/2016 Ivan D Reid

18 Recent Snapshots vLHCatHome-dev vLHCatHome 02/09/2016 Ivan D Reid

19 Running Jobs 02/09/2016 Ivan D Reid

20 Dashboard Dashboard: 9724 files 02/09/2016 Ivan D Reid

21 Current Status Project is running relatively smoothly, but still time-consuming – more automated monitoring would help Submission is to move from CRAB3 to WMAgent – this will help in several areas; hindered by losing key personnel and (recently) August Implementation of the final stage, transfer from Data-Bridge to GRID storage, is needed soon but dependent upon WMAgent being functional 02/09/2016 Ivan D Reid

22 Developments Currently trialling multi-core VMs – these present themselves to BOINC as computers with n CPUs and thus run n BOINC jobs (n is user-selectable) Works well for Theory app, but there are problems that need to be ironed out for CMS and Benchmark apps, on Linux at least Have to be careful about memory and bandwidth… 02/09/2016 Ivan D Reid

23 Conclusions Distributed Volunteer Computing has the capability to significantly enhance the reach of scientific calculations into areas that may be seen as too non-productive for normal operations The use of DVC may be seen as “free” but it has non-trivial costs in organisation and management 02/09/2016 Ivan D Reid


Download ppt "External Focus CMS@Home Dr Ivan D Reid Brunel University London 02/09/2016 Ivan D Reid."

Similar presentations


Ads by Google