Download presentation
Presentation is loading. Please wait.
Published bySteven Warner Modified over 8 years ago
1
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec. 2011
2
High-Throughput Computing (HTC) ● Thousands or millions of separate jobs ● What matters is the rate of job completion – not the turnaround time of individual jobs ● Can use commodity computers – don’t need supercomputers
3
Scientific use of HTC ● Physical simulation – particle physics – atomic/molecular (bio, nano) – Earth climate system ● Compute-intensive data analysis – LHC (particle physics) – LIGO (gravitational waves) – genomics ● Bio-inspired optimization – genetic algorithms, flocking, ant colony etc.
4
Measures of computing throughput ● Floating-point operations (FLOP) – benchmarks: Linpack, Whetstone ● GigaFLOPS (10 9 /sec): 1 PC ● TeraFLOPS (10 12 /sec): 1 GPU ● PetaFLOPS (10 15 /sec): supercomputer ● ExaFLOPS (10 18 /sec): the future
5
Approaches to HPC ● Cluster computing – commodity or rack-mount PCs in 1 room ● Grid computing – sharing of clusters among/between organizations ● Cloud computing – rent cluster nodes, e.g. Amazon EC2 ● Volunteer computing – PC owners donate use of resources
6
Computing capacity ● Cluster: 1,000 nodes = ~10 TeraFLOPS ● Grid: largest one is ~100,000 nodes ● Cloud: Amazon ~100,000 nodes; ~1 PetaFLOPS ● Volunteer (actual): – 700,000 PCs, 100,000 with GPUs; 12 PetaFLOPS ● Volunteer (current potential): – 1.5 billion PCs: 100 ExaFLOPS – 5 billion mobile devices
7
Cost (for 10 TeraFLOPS/year) ● Cluster: $1.5M ● Amazon EC2 (5,000 instances): $4M ● Volunteer: ~ $0.1M
8
Energy All computing uses energy, but ● In cold climates, volunteer computing replaces conventional heating ● GPUs are 10X more efficient than CPUs ● Mobile device CPUs are 10x more efficient
9
Volunteer computing with BOINC volunteers projects CPDN LHC@home WCG attachments
10
How to volunteer
11
Choose projects
12
Configure
13
Graphical interfaces
14
Community
15
Creating a BOINC project ● Install BOINC server software on a Linux box ● Compile apps for Windows/Mac/Linux ● Attract volunteers – develop web site – generate publicity – communicate with volunteers
16
Some projects ● CAS@home ● IBM World Community Grid ● Einstein@home ● Climateprediction.net ● LHC@home ● Rosetta@home
17
Fundamental problems of volunteer computing ● Heterogeneity – need to compile apps for Win, Mac – portability is hard even on Linux ● Security – currently: account-based sandboxing – not enough for untrusted apps Virtual machine technology can solve both
18
Virtual machines application Operating system
19
Virtual machines Host operating system Guest operating system application
20
Virtual machines Windows 7 Debian Linux 2.6 application
21
VirtualBox: a VM system ● Open source (owned by Oracle) ● Rich feature set ● Low runtime overhead ● Easy to install
22
Process structure BOINC client vboxwrapper VirtualBox daemon VM instance shared-mem msg passing cmdline tool file-based communication
23
VM advantages ● Applications run in developer’s favorite environment (OS, libraries) – No need for multiple versions ● A VM is a strong “sandbox” – Application running in VM can’t access host OS – Can run untrusted applications
24
Volunteer storage ● A modern PC has ~1 TB disk ● 1M PCs * 100GB = 100 Petabytes ● Amazon: $120 million/year
25
BOINC storage architecture BOINC file management infrastructure storage applications dataset storage data archival data stream buffering locality scheduling
26
Data archival ● Goals – store large files for long periods – arbitrarily high reliability
27
Recovery in volunteer storage Server data
28
Recovery in volunteer storage Server data client s data
29
Recovery in volunteer storage Server client s data
30
Recovery in volunteer storage Server client s data X
31
Recovery in volunteer storage Server data client s data
32
Recovery in volunteer storage Server data client s data
33
Recovery in volunteer storage Server client s data
34
Volunteer storage issues ● high churn rate of hosts – ~90 day mean lifetime ● high latency of file transfers – hours or days ● Modeling volunteer storage systems – overlapping failure and recovery – server storage and bandwidth may be bottleneck
35
Replication ● Advantages: – Fast recovery (1 upload, 1 download) – Increase N to reduce server storage needs ● But: – High space overhead – Reliability decreases exponentially with N N M file hosts
36
Coding Divide file into N blocks, generate K additional “checksum” blocks. Recover file from any N blocks. Advantages: ● High reliability with low space overhead But: ● Recovering a block requires reassembling the entire file (network, space overhead) N K
37
Multi-level coding ● Divide file, encode each piece separately ● Use encoding for top-level chunks as well ● Can extend to > 2 levels N KN K
38
Hybrid coding/replication ● Use multi-level coding, but replicate each bottom- level block 2 or 3X. ● Most failures will be recovered with replication ● The idea: get both the fast recovery of replication and the high reliability of coding.
39
Distributed storage simulator ● Inputs: – host arrival rate, lifetime distribution, upload/download speeds, free disk space – parameters of files to be stored ● Policies that can be simulated – M-level coding, N and K coding values, R-fold replication ● Outputs – statistics of server disk space usage, network BW, “vulnerability” level
40
Multi-user projects ● Needed: – remote job submission mechanism – quota system – scheduling support for batches science portal BOINC server Scientists (users) sysadmins batches of jobs
41
Quota system ● Each user has “quota” ● Batch prioritization goals: – enforce quotas over the long term – give priority to short batches – don’t starve long batches
42
Batch prioritization ● Each user has “logical start time” LST(U) ● Prioritize batches by increasing end time ● Example: time B1 LST(U 1 ) B2B4B3 LST(U 2 )
43
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.