David P. Anderson UC Berkeley Gilles Fedak INRIA The Computational and Storage Potential of Volunteer Computing
Volunteer computing projectsvolunteer s trust distrust ≠ (desktop) Grid computing, P2P, etc.
Volunteer computing history Projectstartwherearea peak #hosts GIMPS1994math 10,000 distributed.net1995cryptography 100,000 I1999UCBSETI 600, ,000 United Devices2002commercialbiomedicine 200,000 CPDN2003Oxfordclimate change 150,000 60, ,000 WCG2004commercialbiomedicine 200, ,000 II2005UCBSETI 850,000 Washbiology 100,000 SIMAP2005T.U. Munichbioinformatics 10,000
What is it good for? ● Throughput-oriented computing ● Computing with (soft) deadlines ● Distributed storage – capacity/reliability/throughput/latency ● Computing with large RAM or disk needs ● Data-intensive computing
Limiting factors ● Hardware – CPU, memory, disk, network ● Availability – Powered on? Connected? Enabled? – Higher-priority usage – Host churn ● User preferences – Compute only when idle, time-of-day restrictions, disk limitations, etc.
Hardware measurements ● BOINC core client measures host hardware and availability ● Results are stored in server database ● We studied hosts participating in during the week of Feb. 4-10, ● We didn't study change over time ● Data is available online
CPU performance Gross capacity: 535 TFLOPS
Processor type
RAM Didn't measure memory bandwidth (important)
Free disk space (total: 12 PB)
Network throughput (download)
Processing versus RAM
Host participation lifetime
Host availability ● BOINC is running: 81% of the time ● Connected to network: 83% of the time BOINC is running ● Active (able to computing) 84% of the time BOINC is running ● CPU efficiency (wall time/CPU time): 90%
User preferences ● Run if active: 72% yes ● Confirm before connecting (modem): 8.4% yes ● Max disk usage: 63GB or 42% of total space ● 17% participate in multiple projects
Net capacity ● Hardware * availability * preferences ● CPU – Gross 535 TFLOP – Net 150 TFLOPS ● Disk: 42% of 12 PB (5 PB)
Data-intensive apps ● Data rate: MB per CPU-hour
Conclusion ● Volunteer computing – Works well for throughput-oriented computing – May work well for a range of computing types ● More analysis needed for specific cases ● Need to extend BOINC to realize potential in some cases ● Number of hosts – This study: 330,000 – Potential: millions? Tens/hundreds of millions?