Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing.

Similar presentations


Presentation on theme: "David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing."— Presentation transcript:

1 David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing

2 What is volunteer computing? Ordinary people voluntarily running scientific tasks on their PCs

3 Berkeley Open Infrastructure for Network Computing (BOINC)

4 Volunteer Computing @ CERN 2004: LHC@Home Sixtrack 2011: LHC@Home Test4Theory 2014: ATLAS@Home, CMS@Home, Beauty@Home (LHCb)

5 ATLAS@Home Why use volunteer computing for ATLAS? –It’s free! (almost) –Public outreach Considerations –Low priority jobs with high CPU-I/O ratio Non-urgent Monte Carlo simulation –Need virtualisation for ATLAS sw environment CERNVM image and CVMFS –No grid credentials or access on volunteer hosts ARC middleware for data staging –The resources should look like a regular Panda queue ARC Control Tower

6 Initial ATLAS@Home Architecture ARC Control Tower Panda Server ARC CE Session Directory BOINC LRMS Plugin BOINC server Volunteer PC BOINC Client VM Shared Directory Grid Catalogs and Storage DB proxy cert BOINC PQ

7 CERN Current ATLAS@Home Setup ARC Control Tower Panda Server ARC CE BOINC server (vLHC@Home) Volunteer PC BOINC Client VM Shared Directory Grid Catalogs and Storage DB on demand BOINC PQ

8 ATLAS@Home History Test server with ARC CE and BOINC server with ATLAS@Home app ran in Beijing from January –http://gilda117.ihep.ac.cnhttp://gilda117.ihep.ac.cn –Volunteers found it somehow… In July volunteers were moved to CERN server with ARC CE + BOINC –http://arc-boinc-01.cern.ch (alias atlasathome.cern.ch)http://arc-boinc-01.cern.ch –CERN IT provided 1TB NFS space for job input/output At the same time ATLAS@Home became an official BOINC project In early October the BOINC server was changed to a vLHC@Home server run by CERN IT –Volunteers + credit moved too A parallel test setup with separate ARC CE and BOINC server exists for testing

9 Boinc jobs Real simulation tasks –mc12_8TeV.117079.PowhegPythia_P2011C_ttbar_nonallhad_mtt_2000p.simul.e2940_s1773 –Full athena jobs –50 events/job Runs in CERNVM with pre-cached software But some data still needs to be downloaded at runtime –Conditions data from squid/frontier Image is 1.1GB (500MB compressed) and downloaded only once Input files (data file + small scripts) is 1-100MB Output is ~100MB VM memory is now 2GB (was 1GB initially, but now more complex jobs) Jobs take from few hours up to a few days on fast (single) core Validation –Per work unit, that correct output is produced (just that file exists, the content is not checked) –Physics validation comparing results to regular Grid task

10 How does it work for volunteers? Install BOINC client and VirtualBox –Linux, Mac and Windows supported –Currently 80% of hosts have Windows In BOINC client choose ATLAS@Home and create an account That’s it!

11 Issues with jobs The majority of volunteers (~80%) never complete a single job –Not powerful enough resources, entry barrier is too high Requires 64-bit, at least 4GB, decent bandwidth, installing VirtualBox ATLAS@home is the hardest BOINC project to run (quote from volunteer) –Unreliable system/failing jobs also push people away The worst thing for volunteers is to use CPU and not give credit –BUT the normal retention rate of a project is 10% More problems –Virtualisation/VMwrapper causes a lot of problems (memory, jobs not finishing, unstable) –Firewall issues accessing conditions data through squids We are working on ways to cache this data in the image to avoid network access from the job

12 Volunteer growth Currently >12000 volunteers, 1000 active 300 new volunteers/week Einstein@Home: 300k volunteers, 47k active Seti@Home: 5 million volunteers, 150k active

13 Job statistics Continuous 2000-3000 running jobs almost 300k completed jobs 500k CPU hours 14M events 50% CPU efficiency

14 ATLAS@Home in PANDA

15 Scale of ATLAS@Home 28 th largest ATLAS simulation site

16 Very roughly 3 credits/event

17 Very active message boards

18 Standard Boinc webpage http://atlasathome.cern.ch Technical info on how to join Message boards Jobs/results Job statistics

19 ATLAS@Home public outreach page https://atlasphysathome. cern.chhttps://atlasphysathome. cern.ch Designed by Claire using Drupal Entry point for the public to find out what they are contributing to Many links to existing outreach pages

20 Screensaver Many BOINC projects run as “screensavers” Working with Riccardo-Maria Bianchi from ATLAS event display VP1 to make ATLAS@Home screensaver –Show pre-configured event displays as events are produced to show people what they are running This can help motivate people to look more into the physics details

21 Screensaver

22 Lessons Learned and Future It takes a lot of effort to run ATLAS@Home –In the interaction with volunteers Some volunteers are extremely competent and knowledgeable and help others –Maintaining and improving the system workflow The number of running jobs has reached a plateau –We are exploring scaling options with CERN IT (Ceph, multiple apache servers etc) –Not enough people joining But we deliberately haven’t advertised too much to ramp up slowly The major problems are caused by vboxwrapper BOINC developers very enthusiastic to help us –They give us fixes/new features in days We have a few more things to fix before ATLAS@Home can move out of beta –New manpower starting now will help greatly We want to push ATLAS@home internally inside ATLAS –eg now available as part of NICE, to put on CERN administrative PCs

23 Stop press! http://cds.cern.ch/journal/CERNBulletin/2014/49/News%20Articles/1971985?ln=en

24 ATLAS@Home potential It is not possible to run any ATLAS jobs on ATLAS@home –See earlier considerations about I/O, unreliability etc But ~50% of jobs could feasibly run on this platform The high entry barrier may limit general public participation Can it replace small Grid sites? –For example a CPU-only T3 site or small university cluster –Instead of setting up all the Grid infrastructure just install BOINC on the worker nodes –Standard Grid accounting in APEL is provided by ARC CE

25 Thanks Thanks to our CERN IT colleagues in LHC@Home for providing the Boinc infrastructure and storage space.. and please join us! http://atlasathome.cern.ch


Download ppt "David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing."

Similar presentations


Ads by Google