Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec. 2011.

1 Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec. 2011

2 High-Throughput Computing (HTC) ● Thousands or millions of separate jobs ● What matters is the rate of job completion – not the turnaround time of individual jobs ● Can use commodity computers – don’t need supercomputers

3 Scientific use of HTC ● Physical simulation – particle physics – atomic/molecular (bio, nano) – Earth climate system ● Compute-intensive data analysis – LHC (particle physics) – LIGO (gravitational waves) – genomics ● Bio-inspired optimization – genetic algorithms, flocking, ant colony etc.

4 Measures of computing throughput ● Floating-point operations (FLOP) – benchmarks: Linpack, Whetstone ● GigaFLOPS (10 9 /sec): 1 PC ● TeraFLOPS (10 12 /sec): 1 GPU ● PetaFLOPS (10 15 /sec): supercomputer ● ExaFLOPS (10 18 /sec): the future

5 Approaches to HPC ● Cluster computing – commodity or rack-mount PCs in 1 room ● Grid computing – sharing of clusters among/between organizations ● Cloud computing – rent cluster nodes, e.g. Amazon EC2 ● Volunteer computing – PC owners donate use of resources

6 Computing capacity ● Cluster: 1,000 nodes = ~10 TeraFLOPS ● Grid: largest one is ~100,000 nodes ● Cloud: Amazon ~100,000 nodes; ~1 PetaFLOPS ● Volunteer (actual): – 700,000 PCs, 100,000 with GPUs; 12 PetaFLOPS ● Volunteer (current potential): – 1.5 billion PCs: 100 ExaFLOPS – 5 billion mobile devices

7 Cost (for 10 TeraFLOPS/year) ● Cluster: $1.5M ● Amazon EC2 (5,000 instances): $4M ● Volunteer: ~ $0.1M

8 Energy All computing uses energy, but ● In cold climates, volunteer computing replaces conventional heating ● GPUs are 10X more efficient than CPUs ● Mobile device CPUs are 10x more efficient

9 Volunteer computing with BOINC volunteers projects CPDN LHC@home WCG attachments

10 How to volunteer

11 Choose projects

12 Configure

13 Graphical interfaces

14 Community

15 Creating a BOINC project ● Install BOINC server software on a Linux box ● Compile apps for Windows/Mac/Linux ● Attract volunteers – develop web site – generate publicity – communicate with volunteers

16 Some projects ● CAS@home ● IBM World Community Grid ● Einstein@home ● ● LHC@home ● Rosetta@home

17 Fundamental problems of volunteer computing ● Heterogeneity – need to compile apps for Win, Mac – portability is hard even on Linux ● Security – currently: account-based sandboxing – not enough for untrusted apps Virtual machine technology can solve both

18 Virtual machines application Operating system

19 Virtual machines Host operating system Guest operating system application

20 Virtual machines Windows 7 Debian Linux 2.6 application

21 VirtualBox: a VM system ● Open source (owned by Oracle) ● Rich feature set ● Low runtime overhead ● Easy to install

22 Process structure BOINC client vboxwrapper VirtualBox daemon VM instance shared-mem msg passing cmdline tool file-based communication

23 VM advantages ● Applications run in developer’s favorite environment (OS, libraries) – No need for multiple versions ● A VM is a strong “sandbox” – Application running in VM can’t access host OS – Can run untrusted applications

24 Volunteer storage ● A modern PC has ~1 TB disk ● 1M PCs * 100GB = 100 Petabytes ● Amazon: $120 million/year

25 BOINC storage architecture BOINC file management infrastructure storage applications dataset storage data archival data stream buffering locality scheduling

26 Data archival ● Goals – store large files for long periods – arbitrarily high reliability

27 Recovery in volunteer storage Server data

28 Recovery in volunteer storage Server data client s data

29 Recovery in volunteer storage Server client s data

30 Recovery in volunteer storage Server client s data X

31 Recovery in volunteer storage Server data client s data

32 Recovery in volunteer storage Server data client s data

33 Recovery in volunteer storage Server client s data

34 Volunteer storage issues ● high churn rate of hosts – ~90 day mean lifetime ● high latency of file transfers – hours or days ● Modeling volunteer storage systems – overlapping failure and recovery – server storage and bandwidth may be bottleneck

35 Replication ● Advantages: – Fast recovery (1 upload, 1 download) – Increase N to reduce server storage needs ● But: – High space overhead – Reliability decreases exponentially with N N M file hosts

36 Coding Divide file into N blocks, generate K additional “checksum” blocks. Recover file from any N blocks. Advantages: ● High reliability with low space overhead But: ● Recovering a block requires reassembling the entire file (network, space overhead) N K

37 Multi-level coding ● Divide file, encode each piece separately ● Use encoding for top-level chunks as well ● Can extend to > 2 levels N KN K

38 Hybrid coding/replication ● Use multi-level coding, but replicate each bottom- level block 2 or 3X. ● Most failures will be recovered with replication ● The idea: get both the fast recovery of replication and the high reliability of coding.

39 Distributed storage simulator ● Inputs: – host arrival rate, lifetime distribution, upload/download speeds, free disk space – parameters of files to be stored ● Policies that can be simulated – M-level coding, N and K coding values, R-fold replication ● Outputs – statistics of server disk space usage, network BW, “vulnerability” level

40 Multi-user projects ● Needed: – remote job submission mechanism – quota system – scheduling support for batches science portal BOINC server Scientists (users) sysadmins batches of jobs

41 Quota system ● Each user has “quota” ● Batch prioritization goals: – enforce quotas over the long term – give priority to short batches – don’t starve long batches

42 Batch prioritization ● Each user has “logical start time” LST(U) ● Prioritize batches by increasing end time ● Example: time B1 LST(U 1 ) B2B4B3 LST(U 2 )

43 Thank you!

