A Guided Tour of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley TACC November 8, 2013
BOINC in a nutshell ● BOINC is a batch system for resources that are – Extremely heterogeneous – Numerous – Sporadically available and connected – High-churn – Error-prone – Untrusted and anonymous
BOINC components Job handling (C++) Web interfaces (PHP) BOINC client (C++) BOINC API (C++) application client server
BOINC abstractions ● Application ● Platform – windows_x86_64 – x86_64-linux-gnu – x86_64_apple-darwin – arm-android-linux-gnu
BOINC abstractions ● App version – App, platform, version# – A set of files, including executable – “plan class”; determines processor usage ● # CPUs ● # GPUs
BOINC abstractions ● Job – List of input files – Latency bound – Resource usage ● Disk, memory, FLOPS ● Job instance
BOINC abstractions App App version Platform Job Job instance
BOINC client: directory structure Boinc/ projects/ proj1_url/ data1.txt slots/ 0/ in.txt link
The BOINC data model ● Files are immutable ● Client garbage-collects files ● Data files can be marked as “sticky” ● App version files are automatically sticky
BOINC client: runtime system Shared memory Message-passing BOINC client application Main thread API thread ● Process control – Suspend/resume/quit ● Fraction done reporting
The BOINC API ● boinc_init() ● boinc_resolve_filename() ● boinc_time_to_checkpoint() ● boinc_checkpoint_done() ● boinc_finish()
Building apps for BOINC ● Native (C/C++, FORTRAN, Java, Python) – Must call boinc_init() – Win: Visual Studio or MinGW – Mac: Xcode – Unix: gcc ● BOINC wrapper ● Vbox wrapper
Anonymous platform mechanism ● Volunteer (not project) supplies app versions ● Purposes: – Unusual platforms or coprocessors – Optimized apps – Security paranoia
BOINC server structure ● MySQL database ● Directory structure ● Process structure – Daemons, CGI programs ● User/group structure – apache, boincadm
Creating a BOINC project ● Deployment options – Server VM image (Virtualbox, Debian) – Amazon EC2 image – configure/make ● make_project script
Deploying application versions ● Directory structure apps/ appname1/ 1.0/ windows_intelx86/ (files) windows_intelx86__cuda/ (files) i686-apple-darwin/ (files) ● Code signing ● update_versions script
Plan class mechanism ● App versions can be tagged with a “plan class” ● app_plan(plan_class, host) function: – Can host run app version of that plan class? – If so, compute resource usage (CPUs, GPUs) – If so, estimate FLOPS ● Examples – vbox32 – cuda23 – opencl_nvidia_101
Job submission ● Input file staging – upload/download directory hierarchies ● Input, output templates – describe job’s input/output files ● Local job submission – C++, command-line interfaces ● Remote job submission – Web RPCs; C++, PHP interfaces
Job processing ● Validation – Replication – Homogeneous redundancy – Adaptive replication ● Assimilation
Other scheduling features ● Locality scheduling – sticky data files – preferentially send jobs to clients that already have the needed files ● Multi-size applications – send large jobs to fast devices
Multi-user projects ● Job submitters have accounts ● access control ● quotas ● batch scheduling – run small batches first – don’t starve large batches – enforce quotas
Server processes Shared memory Job cache scheduler transitioner feeder MySQL DB validator assimilator db_purge file_deleter per app
Scaling server performance ● Distribute daemons across machines ● Parallelize daemons – on same or different machines ● MySQL server performance ● example: server – ~50 machines – 1 million jobs/day
Web interface (public) ● preferences ● forums ● teams ● profiles ● leader boards ● host, job info ● social network features
Contacts ● ● ● lists: – boinc_projects – boinc_dev