Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011
The consumer digital infrastructure ● 1.5 billion PCs, 5 billion mobile devices – 100 ExaFLOPS – capable of most scientific computing ● Cost of sustained HPC: ● volunteer « dedicated « rented
Volunteer computing ● BOINC – 450,000 active computers – ~30 science projects ● Some areas of research and development – VM-based applications – Volunteer storage – Multi-user projects – Emulating scheduling policies
VM-based applications Fundamental problems of volunteer computing: ● Heterogeneity – need to compile apps for Win, Mac – portability is hard even on Linux ● Security – currently: account-based sandboxing – not enough for untrusted apps Virtual machine technology can solve both
VirtualBox ● Open source (owned by Oracle) ● Rich feature set – directory sharing – control of network activity in VM ● Low runtime overhead ● Easy to install
Process structure BOINC client vboxwrapper VirtualBox daemon VM instance shared-mem msg passing cmdline tool file-based communication
Directory structure ● Host OS BOINC/slots/0/ VM image file shared/ input/output files application executable ● Guest OS /shared/
Components of a VM app version ● VM image – developer’s preferred environment – whatever libraries they want – can be shared among any/all apps ● vboxwrapper – main program ● application executable ● VM config file – specifies memory size, etc.
Example: 2.0 ● CernVM – minimal image (~100 MB compressed) – science libraries (~10 GB) are paged in using HTTP – contains the client of existing CERN job queueing systems ● Goal: provide more computing power to physicists with requiring them to change anything.
Future work ● Bundle VirtualBox with BOINC installer ● Using GPUs within VMs ● Multiple VM jobs on multicore hosts: how to do it efficiently ● Streamlined mechanisms for deploying VM-based apps
Volunteer storage ● A modern PC has ~1 TB disk ● 1M PCs * 100GB = 100 Petabytes ● Amazon: $120 million/year
BOINC storage architecture BOINC file management infrastructure storage applications dataset storage data archival data stream buffering locality scheduling
Data archival ● Goals – store large files for long periods – arbitrarily high reliability ● Issues – high churn rate of hosts – high latency of file transfers ● Models – overlapping failure and recovery – server storage and bandwidth may be bottleneck
Replication Divide file into N chunks, store each chunk on M hosts Advantages: ● Fast recovery (1 upload, 1 download) ● Increase N to reduce server storage needs But: ● High space overhead ● Reliability decreases exponentially with N
Coding Divide file into N blocks, generate K additional “checksum” blocks. Recover file from any N blocks. Advantages: ● High reliability with low space overhead But: ● Recovering a block requires reassembling the entire file (network, space overhead)
Multi-level coding ● Divide file, encode each piece separately ● Use encoding for top-level chunks as well ● Can extend to > 2 levels N KN K
Hybrid coding/replication ● Use multi-level coding, but replicate each bottom- level block 2 or 3X. ● Most failures will be recovered with replication ● The idea: get both the fast recovery of replication and the high reliability of coding.
Distributed storage simulator ● Inputs: – host arrival rate, lifetime distribution, upload/download speeds, free disk space – parameters of files to be stored ● Policies that can be simulated – M-level coding, N and K coding values, R-fold replication ● Outputs – statistics of server disk space usage, network BW, “vulnerability” level
Multi-user projects ● Needed: – remote job submission mechanism – quota system – scheduling support for batches science portal BOINC server Scientists (users) sysadmins batches of jobs
Quota system ● Each user has “quota” ● Batch prioritization goals: – enforce quotas over the long term – give priority to short batches – don’t starve long batches
Batch prioritization ● Each user has a ● For a user U: – Q(U) = fractional quota of U – LST(U) = “logical start time” ● For a batch B: – E(B) = estimated elapsed time of B given all resources – EQ(B) = E(B)/Q(U)
Batch prioritization ● When a user submits a batch B – logical end time LET(B) = LST(U) + EQ(B) – LST(U) += EQ(B) ● Prioritize batches by increasing LET(B) ● Example time B1 LET(B1 ) B2B4B3
Emulating scheduling policies ● Job scheduling policy – what jobs to run – whether to leave suspended jobs in memory ● Work fetch policy – when to get more jobs – what project to get them from – how much to request These policies have big impact on system performance. They must work in a large space of scenarios
Scenarios ● Preferences ● Hardware ● Availability (computing, network) ● # of projects ● For each project/application – distribution of job size – accuracy of runtime estimate – latency bound – resource usage – project availability
Issues ● How can we design good scheduling policies? ● How can we debug the BOINC client? ● How can we plan for the future? – many cores – faster GPUs – tight latency bounds – large-RAM applications
The BOINC client emulator Main logic Scheduling policies Availability Job execution Scheduler RPC Emulated (same source code) Simulate d
Inputs ● Client state file – describes hardware, availability, projects and their characteristics ● Preferences, configuration files ● Duration, time step of simulation
Outputs ● Figures of merit – idle fraction – wasted fraction – resource share violation – monotony – RPCs per job ● Timeline ● message log ● graphs of scheduling data
Interfaces ● Web-based – volunteers upload scenarios – can see all scenarios, run simulations against them, comment on them ● Scripted – sweep an input parameter – compare 2 policies across a set of scenarios
Future work ● Characterize the scenario population – Monte-Carlo sampling ● Study new policies – e.g. alternatives to EDF ● More features in emulator – memory usage – file transfer time – application checkpointing behavior ● Better model of scheduler behavior
Questions?