A Brief History of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley MICS 2015 April 10, 2015
Scientific computing Consumer electronics BOINC
Scientific computing High-performance computing (HPC) High-throughput computing (HTC) Many independent jobs Rate of job completion matters, not per-job turnaround time
HTC applications Simulation of physical systems particle collisions atomic/molecular (bio, nano) Earth climate system Need for many jobs: Uncertainty (perturbed initial conditions) Parameter sweeps Fit model parameters to observed data
HTC applications Compute-intensive data analysis Particle colliders (LHC) Astrophysics pulsar search gravitational wave search Genomics
HTC applications Biology-inspired optimization algorithms genetic algorithms flocking ant colony
Units of computing speed Floating-point operation (FLOP) GigaFLOPS (109/sec): 1 Central Processing Unit (CPU) TeraFLOPS (1012/sec): 1 Graphics Processing Unit (GPU) PetaFLOPS (1015/sec): 1 supercomputer ExaFLOPS (1018/sec): current Holy Grail
Approaches to HTC Supercomputing Cluster computing Grid computing lots of closely-coupled processors Cluster computing lots of Ethernet-connect PC-type nodes in a room Grid computing share clusters between organizations Cloud computing rent cluster nodes, e.g. Amazon EC2 Volunteer computing use computers owned by consumers
Consumer electronics Bitcoin mining, Steam Computing devices Desktop and laptop computers Mobile: tablets, smartphones Game consoles Set-top boxes, DVRs Wearable (watches, glasses) Appliances Commodity Internet Cable, DSL, fiber to the home, cell networks Bitcoin mining, Steam
Performance potential 1 billion Desktop/laptop PCs CPUs: 10 ExaFLOPS GPUs: 1,000 ExaFLOPS 5 billion smartphones CPUs: 20 ExaFLOPS GPUs: 500 ExaFLOPS
Volunteer computing Consumers donate computing capacity to History support science be in a community compete History 1997: GIMPS, distributed.net 1999: SETI@home, Folding@home 2003: BOINC
Limiting factors Volunteership PC availability Study of college students [Toth 2006] 5% would “definitely participate” 10% would “possible participate” PC availability Study in [Kondo 2008] 65% average availability 35% of PCs are available 24/7
Cost of 1 TeraFLOPS/year
BOINC: middleware for volunteer computing Supported by NSF since 2002 Open source (LGPL) Based at UC Berkeley http://boinc.berkeley.edu
BOINC software client server GUI (C++ or Java) BOINC client (C++) Job handling (C++) application Web interfaces (PHP) BOINC API (C++) Tools (Python)
Volunteer computing with BOINC projects volunteers LHC@home CPDN attachments WCG volunteer computing “ecosystem”
How to volunteer
Choose projects
Configure the client
Community
Account manager architecture Examples: BAM!, GridRepublic BOINC client account manager projects projects projects
Creating a BOINC project Install BOINC server software on a Linux box on a VM Build apps for Windows/Mac/Linux Attract volunteers develop web site generate publicity communicate with volunteers
Volunteer computing today 500,000 active computers 50 projects 10 PetaFLOPS
Some BOINC-based projects IBM World Community Grid Climateprediction.net (Oxford) LHC@home (CERN) SETI@home (UC Berkeley) Rosetta@home (U. Wash) Einstein@home (Max Planck Inst.)
How BOINC works project home PC BOINC server BOINC client get jobs BOINC server download data, executables BOINC client run jobs upload output files report/get jobs … all over HTTP
Issues handled by BOINC Heterogeneous computers “Plan class” functions can program run on host? how fast? resource usage? Intelligent choice of app version Job size matching job app app versions platform
Issues handled by BOINC Untrusted, anonymous computers Result validation replication, adaptive replication Credit: cheat-proof accounting Consumer-friendly client Job runtime estimation
Issues handled by BOINC Server performance and scalability Shared memory Job cache scheduler feeder validator per app assimilator MySQL DB transitioner db_purge file_deleter
Using GPUs BOINC detects and schedules GPUs Issues NVIDIA, AMD, Intel multiple/mixed GPUs various language systems (CUDA, OpenCL, CAL) Issues non-preemptive GPU scheduling no paging of GPU memory identifying GPUs
Multicore apps Next-generation CPUs may have 100+ cores BOINC supports multi-core apps OpenMP, MPI OpenCL CPU apps
Using VM technology Problem: building and maintaining versions for different platforms is hard Even making a portable Linux executable is hard
Guest operating system Virtual machines application Guest operating system Host operating system
Virtual machines application Debian Linux 2.6 Windows 7
BOINC VM support Create a VM image for your favorite environment Create executables for that environment VirtualBox executive BOINC client Vbox wrapper VM instance shared directory: executable input, output files
VM advantages Develop in your favorite environment No need to learn Visual Studio, Xcode A VM is a strong “sandbox” Can run untrusted applications Free checkpointing VirtualBox snapshot mechanism base VM image image Δ image Δ
Use of BOINC at CERN CERN uses mostly VM-based computing CERN servers Co-Pilot BOINC client CernVM job queueing system software archive CVMFS ATLAS, CMS, LHCb, Theory
BOINC on Android New GUI Battery-related issues Released July 2013 Google, Amazon App Stores ~50K active devices Branded versions HTC: Power to Give Samsung: Power Sleep
The future of volunteer computing How to benefit more scientists? How to attract more volunteers?
Integrating with mainstream HTC HTCondor nanoHUB TACC adapter adapter adapter BOINC server BOINC server BOINC server consumer PCs projects projects
Science@home Single “brand” for volunteer computing Volunteers register for science areas rather than projects biomed, environmental, physics, astro, … Computers are attached to projects based on prefs
Implementing Science@home Use BOINC account manager architecture BOINC client Science@home projects projects projects
Summary Volunteer computing is BOINC provides software infrastructure Useful for many HTC applications A path to ExaFLOPS computing A way to popularize/democratize science BOINC provides software infrastructure Many challenges remain Many research opportunities
Contact info http://boinc.berkeley.edu davea@ssl.berkeley.edu