David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing.

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

Volunteer Computing Laurence Field IT/SDC 21 November 2014.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Volunteer Computing and Hubs David P. Anderson Space Sciences Lab University of California, Berkeley HUBbub September 26, 2013.
Public-resource computing for CEPC Simulation Wenxiao Kan Computing Center/Institute of High Physics Energy Chinese Academic of Science CEPC2014 Scientific.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
1 port BOSS on Wenjing Wu (IHEP-CC)
A Guided Tour of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley TACC November 8, 2013.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
HTCondor and BOINC. › Berkeley Open Infrastructure for Network Computing › Grew out of began in 2002 › Middleware system for volunteer computing.
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
Volunteer Computing with BOINC David P. Anderson Space Sciences Laboratory University of California, Berkeley.
The Data Bridge Laurence Field IT/SDC 6 March 2015.
Volunteer Computing 2 Overview Volunteer Computing BOINC Volunteer Computing For HEP Virtualization Volunteer Towards a Common Platform.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
David Cameron Riccardo Bianchi Claire Adam Bourdarios Andrej Filipcic Eric Lançon Efrat Tal Hod Wenjing Wu on behalf of the ATLAS Collaboration CHEP 15,
07:44:46Service Oriented Cyberinfrastructure Lab, Introduction to BOINC By: Andrew J Younge
BOINC.
1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public and Grid Computing.
TEMPLATE DESIGN © BOINC: Middleware for Volunteer Computing David P. Anderson Space Sciences Laboratory University of.
Dr Jukka Klem CHEP06 1 Public Resource Computing at CERN – Philippe Defert, Markku Degerholm, Francois Grey, Jukka Klem, Juan Antonio.
1 Volunteer Computing at CERN past, present and future Ben Segal / CERN (describing the work of many people at CERN and elsewhere ) White Area lecture.
1 BOINC + CernVM Ben Segal / CERN (describing the work of many people at CERN and elsewhere ) Pre-GDB on Volunteer Computing CERN, November 11, 2014.
Status of WLCG FCPPL project Status of Beijing site Activities over last year Ongoing work and prospects for next year LANÇON Eric & CHEN Gang.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen Budapest
BOINC: An Open Platform for Public-Resource Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
– Past, Present, Future Volunteer Computing at CERN Helge Meinhard, Nils Høimyr / CERN for the CERN BOINC service team H. Meinhard et al. - Volunteer.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Considerations on Using CernVM-FS for Datasets Sharing Within Various Research Communities Catalin Condurache STFC RAL UK ISGC, Taipei, 18 March 2016.
ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,
Volunteer Clouds and Citizen Cyberscience for LHC Physics Artem Harutyunyan / CERN Carlos Aguado Sanchez / CERN, Jakob Blomer / CERN, Predrag Buncic /
Volunteer Clouds for the LHC experiments H. Riahi – 12/11/15 EGI User Forum Laurence Field Hassen Riahi CERN IT-SDC.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
The Future of Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab UH CS Dept. March 22, 2007.
Volunteer Computing: Involving the World in Science David P. Anderson U.C. Berkeley Space Sciences Lab February 16, 2007.
The Limits of Volunteer Computing Dr. David P. Anderson University of California, Berkeley March 20, 2011.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Volunteer Computing and Large-Scale Simulation David P. Anderson U.C. Berkeley Space Sciences Lab February 3, 2007.
Volunteer Computing with BOINC: a Tutorial David P. Anderson Space Sciences Laboratory University of California – Berkeley May 16, 2006.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov
Volunteer Computing and BOINC
Review of the WLCG experiments compute plans
Status of WLCG FCPPL project
Volunteer Computing: SETI and Beyond David P
Volunteer Computing for Science Gateways
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
External Focus Dr Ivan D Reid Brunel University London 02/09/2016 Ivan D Reid.
David Cameron ATLAS Site Jamboree, 20 Jan 2017
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Grid Canada Testbed using HEP applications
Ivan Reid (Brunel University London/CMS)
Backfilling the Grid with Containerized BOINC in the ATLAS computing
Exploit the massive Volunteer Computing resource for HEP computation
Exploring Multi-Core on
Presentation transcript:

David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing

What is volunteer computing? Ordinary people voluntarily running scientific tasks on their PCs

Berkeley Open Infrastructure for Network Computing (BOINC)

Volunteer CERN 2004: Sixtrack 2011: Test4Theory 2014: (LHCb)

Why use volunteer computing for ATLAS? –It’s free! (almost) –Public outreach Considerations –Low priority jobs with high CPU-I/O ratio Non-urgent Monte Carlo simulation –Need virtualisation for ATLAS sw environment CERNVM image and CVMFS –No grid credentials or access on volunteer hosts ARC middleware for data staging –The resources should look like a regular Panda queue ARC Control Tower

Initial Architecture ARC Control Tower Panda Server ARC CE Session Directory BOINC LRMS Plugin BOINC server Volunteer PC BOINC Client VM Shared Directory Grid Catalogs and Storage DB proxy cert BOINC PQ

CERN Current Setup ARC Control Tower Panda Server ARC CE BOINC server Volunteer PC BOINC Client VM Shared Directory Grid Catalogs and Storage DB on demand BOINC PQ

History Test server with ARC CE and BOINC server with app ran in Beijing from January – –Volunteers found it somehow… In July volunteers were moved to CERN server with ARC CE + BOINC – (alias atlasathome.cern.ch) –CERN IT provided 1TB NFS space for job input/output At the same time became an official BOINC project In early October the BOINC server was changed to a server run by CERN IT –Volunteers + credit moved too A parallel test setup with separate ARC CE and BOINC server exists for testing

Boinc jobs Real simulation tasks –mc12_8TeV PowhegPythia_P2011C_ttbar_nonallhad_mtt_2000p.simul.e2940_s1773 –Full athena jobs –50 events/job Runs in CERNVM with pre-cached software But some data still needs to be downloaded at runtime –Conditions data from squid/frontier Image is 1.1GB (500MB compressed) and downloaded only once Input files (data file + small scripts) is 1-100MB Output is ~100MB VM memory is now 2GB (was 1GB initially, but now more complex jobs) Jobs take from few hours up to a few days on fast (single) core Validation –Per work unit, that correct output is produced (just that file exists, the content is not checked) –Physics validation comparing results to regular Grid task

How does it work for volunteers? Install BOINC client and VirtualBox –Linux, Mac and Windows supported –Currently 80% of hosts have Windows In BOINC client choose and create an account That’s it!

Issues with jobs The majority of volunteers (~80%) never complete a single job –Not powerful enough resources, entry barrier is too high Requires 64-bit, at least 4GB, decent bandwidth, installing VirtualBox is the hardest BOINC project to run (quote from volunteer) –Unreliable system/failing jobs also push people away The worst thing for volunteers is to use CPU and not give credit –BUT the normal retention rate of a project is 10% More problems –Virtualisation/VMwrapper causes a lot of problems (memory, jobs not finishing, unstable) –Firewall issues accessing conditions data through squids We are working on ways to cache this data in the image to avoid network access from the job

Volunteer growth Currently >12000 volunteers, 1000 active 300 new volunteers/week 300k volunteers, 47k active 5 million volunteers, 150k active

Job statistics Continuous running jobs almost 300k completed jobs 500k CPU hours 14M events 50% CPU efficiency

in PANDA

Scale of 28 th largest ATLAS simulation site

Very roughly 3 credits/event

Very active message boards

Standard Boinc webpage Technical info on how to join Message boards Jobs/results Job statistics

public outreach page cern.chhttps://atlasphysathome. cern.ch Designed by Claire using Drupal Entry point for the public to find out what they are contributing to Many links to existing outreach pages

Screensaver Many BOINC projects run as “screensavers” Working with Riccardo-Maria Bianchi from ATLAS event display VP1 to make screensaver –Show pre-configured event displays as events are produced to show people what they are running This can help motivate people to look more into the physics details

Screensaver

Lessons Learned and Future It takes a lot of effort to run –In the interaction with volunteers Some volunteers are extremely competent and knowledgeable and help others –Maintaining and improving the system workflow The number of running jobs has reached a plateau –We are exploring scaling options with CERN IT (Ceph, multiple apache servers etc) –Not enough people joining But we deliberately haven’t advertised too much to ramp up slowly The major problems are caused by vboxwrapper BOINC developers very enthusiastic to help us –They give us fixes/new features in days We have a few more things to fix before can move out of beta –New manpower starting now will help greatly We want to push internally inside ATLAS –eg now available as part of NICE, to put on CERN administrative PCs

Stop press!

potential It is not possible to run any ATLAS jobs on –See earlier considerations about I/O, unreliability etc But ~50% of jobs could feasibly run on this platform The high entry barrier may limit general public participation Can it replace small Grid sites? –For example a CPU-only T3 site or small university cluster –Instead of setting up all the Grid infrastructure just install BOINC on the worker nodes –Standard Grid accounting in APEL is provided by ARC CE

Thanks Thanks to our CERN IT colleagues in for providing the Boinc infrastructure and storage space.. and please join us!