Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011.

Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011

The consumer digital infrastructure ● 1.5 billion PCs, 5 billion mobile devices – 100 ExaFLOPS – capable of most scientific computing ● Cost of sustained HPC: ● volunteer « dedicated « rented

Volunteer computing ● BOINC – 450,000 active computers – ~30 science projects ● Some areas of research and development – VM-based applications – Volunteer storage – Multi-user projects – Emulating scheduling policies

VM-based applications Fundamental problems of volunteer computing: ● Heterogeneity – need to compile apps for Win, Mac – portability is hard even on Linux ● Security – currently: account-based sandboxing – not enough for untrusted apps Virtual machine technology can solve both

VirtualBox ● Open source (owned by Oracle) ● Rich feature set – directory sharing – control of network activity in VM ● Low runtime overhead ● Easy to install

Process structure BOINC client vboxwrapper VirtualBox daemon VM instance shared-mem msg passing cmdline tool file-based communication

Directory structure ● Host OS BOINC/slots/0/ VM image file shared/ input/output files application executable ● Guest OS /shared/

Components of a VM app version ● VM image – developer’s preferred environment – whatever libraries they want – can be shared among any/all apps ● vboxwrapper – main program ● application executable ● VM config file – specifies memory size, etc.

Example: LHC@home 2.0 ● CernVM – minimal image (~100 MB compressed) – science libraries (~10 GB) are paged in using HTTP – contains the client of existing CERN job queueing systems ● Goal: provide more computing power to physicists with requiring them to change anything.

Future work ● Bundle VirtualBox with BOINC installer ● Using GPUs within VMs ● Multiple VM jobs on multicore hosts: how to do it efficiently ● Streamlined mechanisms for deploying VM-based apps

Volunteer storage ● A modern PC has ~1 TB disk ● 1M PCs * 100GB = 100 Petabytes ● Amazon: $120 million/year

BOINC storage architecture BOINC file management infrastructure storage applications dataset storage data archival data stream buffering locality scheduling

Data archival ● Goals – store large files for long periods – arbitrarily high reliability ● Issues – high churn rate of hosts – high latency of file transfers ● Models – overlapping failure and recovery – server storage and bandwidth may be bottleneck

Replication Divide file into N chunks, store each chunk on M hosts Advantages: ● Fast recovery (1 upload, 1 download) ● Increase N to reduce server storage needs But: ● High space overhead ● Reliability decreases exponentially with N

Coding Divide file into N blocks, generate K additional “checksum” blocks. Recover file from any N blocks. Advantages: ● High reliability with low space overhead But: ● Recovering a block requires reassembling the entire file (network, space overhead)

Multi-level coding ● Divide file, encode each piece separately ● Use encoding for top-level chunks as well ● Can extend to > 2 levels N KN K

Hybrid coding/replication ● Use multi-level coding, but replicate each bottom- level block 2 or 3X. ● Most failures will be recovered with replication ● The idea: get both the fast recovery of replication and the high reliability of coding.

Distributed storage simulator ● Inputs: – host arrival rate, lifetime distribution, upload/download speeds, free disk space – parameters of files to be stored ● Policies that can be simulated – M-level coding, N and K coding values, R-fold replication ● Outputs – statistics of server disk space usage, network BW, “vulnerability” level

Multi-user projects ● Needed: – remote job submission mechanism – quota system – scheduling support for batches science portal BOINC server Scientists (users) sysadmins batches of jobs

Quota system ● Each user has “quota” ● Batch prioritization goals: – enforce quotas over the long term – give priority to short batches – don’t starve long batches

Batch prioritization ● Each user has a ● For a user U: – Q(U) = fractional quota of U – LST(U) = “logical start time” ● For a batch B: – E(B) = estimated elapsed time of B given all resources – EQ(B) = E(B)/Q(U)

Batch prioritization ● When a user submits a batch B – logical end time LET(B) = LST(U) + EQ(B) – LST(U) += EQ(B) ● Prioritize batches by increasing LET(B) ● Example time B1 LET(B1 ) B2B4B3

Emulating scheduling policies ● Job scheduling policy – what jobs to run – whether to leave suspended jobs in memory ● Work fetch policy – when to get more jobs – what project to get them from – how much to request These policies have big impact on system performance. They must work in a large space of scenarios

Scenarios ● Preferences ● Hardware ● Availability (computing, network) ● # of projects ● For each project/application – distribution of job size – accuracy of runtime estimate – latency bound – resource usage – project availability

Issues ● How can we design good scheduling policies? ● How can we debug the BOINC client? ● How can we plan for the future? – many cores – faster GPUs – tight latency bounds – large-RAM applications

The BOINC client emulator Main logic Scheduling policies Availability Job execution Scheduler RPC Emulated (same source code) Simulate d

Inputs ● Client state file – describes hardware, availability, projects and their characteristics ● Preferences, configuration files ● Duration, time step of simulation

Outputs ● Figures of merit – idle fraction – wasted fraction – resource share violation – monotony – RPCs per job ● Timeline ● message log ● graphs of scheduling data

Interfaces ● Web-based – volunteers upload scenarios – can see all scenarios, run simulations against them, comment on them ● Scripted – sweep an input parameter – compare 2 policies across a set of scenarios

Future work ● Characterize the scenario population – Monte-Carlo sampling ● Study new policies – e.g. alternatives to EDF ● More features in emulator – memory usage – file transfer time – application checkpointing behavior ● Better model of scheduler behavior

Questions?

Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011.

Similar presentations

Presentation on theme: "Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011.

Similar presentations

Presentation on theme: "Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov. 2011."— Presentation transcript:

Similar presentations

About project

Feedback