Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 7, 2008
Where’s the computing power? Individuals (~1 billion PCs) Companies (~100M PCs) Government (~50M PCs) Volunteer computing
A brief history of volunteer computing Projects Platforms distributed.net, GIMPS Popular Power Entropia United Devices, Parabon BOINC Climateprediction.net Einstein, IBM World Community Grid
The BOINC project Based at UC Berkeley Space Sciences Lab Funded by NSF since 2002 Personnel director: David Anderson other employees: 1.5 programmers lots of volunteers What we do develop open-source software enable online communities What we don’t do branding, hosting, authorizing, endorsing, controlling
The BOINC community Projects Volunteer programmers Alpha testers Online Skype-based help Translators (web, client) Documentation (Wiki) Teams
The BOINC model Attachments Your PC BOINC-based projects Climateprediction.net Oxford; climate study U. of Washington; biology MalariaControl.net STI; malaria epidemiology World Community Grid IBM; several applications... Simple (but configurable) Secure Invisible Independent No central authority Unique ID: URL
The volunteer computing ecosystem Projects Public Do more science Involve public in science Teach, motivate volunteer
Participation and computing power BOINC 330K active participants 580K computers ~40 projects 1.2 PetaFLOPS average throughput about 3X an IBM Blue Gene L (non-BOINC) 200K active participants 1.4 PetaFLOPS (mostly PS3)
Cost per TeraFLOPS-year Cluster: $124K Amazon EC2: $1.75M BOINC: $2K
The road to ExaFLOPS CPUs in PCs (desktop, laptop) 1 ExaFLOPS = 50M PCs x 80 GFLOPS x 0.25 avail. GPUs 1 ExaFLOPS = 4M x 1 TFLOPS x 0.25 avail. Video-game consoles (PS3, Xbox) .25 ExaFLOPS = 10M x 100GFLOPS x 0.25 avail Mobile devices (cell phone, PDA, iPod, Kindle) .05 ExaFLOPS = 1B x 100MFLOPS x 0.5 avail Home media (cable box, Blu-ray player) 0.1 ExaFLOPS = 100M x 1 GFLOPS x 1.0 avail
But it’s not about numbers The real goals: enable new computational science change the way resources are allocated avoid return to the Dark Ages And that means we must: make volunteer computing feasible for all scientists involve the entire public, not just the geeks solve the “project discovery” problem Progress towards these goals: nonzero but small
BOINC server software Goals high performance (10M jobs/day) scalability MySQL DB (~1M jobs) scheduler (CGI) Clients feeder shared memory (~1K jobs) Various daemons
Database tables Application Platform Win32, Win64, Linux x86, Java, etc. App version Job resource usage estimates, bounds latency bound input file descriptions Job instance output file descriptions Account, team, etc.
Data model Files have both logical and physical names immutable (per physical name) may originate on client or server may be “sticky” may be compressed in various ways transferred via HTTP or BitTorrent app files must be signed Upload/download directory hierarchies
Submitting jobs Create XML description input, output files resource usage estimates, bounds latency bound Put input files into dir hierarchy Call create_work() creates DB record Mass production bags of tasks flow-controlled stream of tasks self-propagating computations trickle messages
Server scheduling policy Request message: platform(s) description of hardware CPU, memory, disk, coprocessors description of availability current jobs queued and in progress work request (CPU seconds) Send a set of jobs that are feasible (will fit in memory/disk) will probably get done by deadline satisfy the work request
Application platform Multithread and coprocessor support client scheduler List of platforms, Coprocessors #CPUs jobs avg/max #CPUs, coprocessor usage command line app planning function app versions platform app version job
Result validation Problem: can’t trust volunteers computational result claimed credit Approaches: Application-specific checking Job replication do N copies, require that M of them agree Adaptive replication Spot-checking
How to compare results? Problem: numerical discrepancies Stable problems: fuzzy comparison Unstable problems Eliminate discrepancies compiler/flags/libraries Homogeneous replication send instances only to numerically equivalent hosts (equivalence may depend on app)
Server scheduling policy revisited Goals (possibly conflicting): Send retries to fast/reliable hosts Send long jobs to fast hosts Send demanding jobs (RAM, disk, etc.) to qualified hosts Send jobs already committed to a homogeneous redundancy class Project-defined “score” function scan N jobs, send those with highest scores
Server daemons Per application: work generator validator assimilator Transitioner manages replication, creates job instances triggers other daemons File deleter DB purger
Ways to create a BOINC server Install BOINC on a Linux box lots of software dependencies Run BOINC server VM (Vmware) need to worry about hardware Run BOINC server VM on Amazon EC2
BOINC API Typical application structure: boinc_init() loop... boinc_fraction_done(x) if boinc_time_to_checkpoint() write checkpoint file boinc_checkpoint_completed() boinc_finish(0) Graphics Multi-program apps Wrapper for legacy apps
Volunteer’s view 1-click install All platforms Invisible, autonomic Highly configurable (optional)
BOINC client structure core client application BOINC library GUI screensaver local TCP schedulers, data servers Runtime system user preferences, control
Some BOINC projects Climateprediction.net Oxford University Global climate modeling LIGO scientific collaboration gravitational wave detection U.C. Berkeley Radio search for E.T.I. and black hole evaporation Leiden Classical Leiden University Surface chemistry using classical dynamics
More projects CERN simulator of LHC, collisions Univ. of Muenster Quantum chemistry Bielefeld Univ. Study nanoscale magnetism Leiden Univ. Number theory
Biomed-related BOINC projects University of Washington Rosetta: Protein folding, docking, and design Tanpaku Tokyo Univ. of Science Protein structure prediction using Brownian dynamics MalariaControl The Swiss Tropical Institute Epidemiological simulation
More projects Univ. of Michigan CHARMM, protein structure prediction SIMAP Tech. Univ. of Munich Protein similarity matrix Technion Genetic linkage analysis using Bayesian networks Quake Catcher Network Stanford Distributed seismograph
More projects (IBM WCG) Dengue fever drug discovery U. of Texas, U. of Chicago Autodock Human Proteome Folding New York University Rosetta Scripps Institute Autodock
Organizational models Single-scientist projects: a dead-end? Campus-level meta-project UC Berkeley: 1,000 instructional PCs 5,000 faculty/staff 30,000 students 400,000 alumni Lattice U. Maryland Center for Bioinformatics MindModeling.org ACT-R community (~20 universities) IBM World Community Grid ~8 applications from various institutions Extremadura (Spain) consortium of 5-10 universities SZTAKI (Hungary)
Conclusion Individuals (~1 billion PCs) Companies (~100M PCs) Government (~50M PCs) Volunteer computing Contact me about: Using BOINC Research based on BOINC