Volunteer Computing Involving the World in Science David P. Anderson Space Sciences Lab U.C. Berkeley 13 December 2007
Outline Science needs more computing power What is volunteer computing? How BOINC works Projects using BOINC Future directions
Simulation of physical systems Biolog y Climate study Cosmology
Data analysis Physic s Astronom y
Genetic algorithms and other new computational paradigms
Parallel computing Suppose you need 100 years of computing 1 CPU : 100 years 1,000 CPUs: 36 days 1,000,000 CPUs: 1 hour Types of parallelism CPUs on one chip (multi-core) CPUs in one box (supercomputers) CPUs in one room (cluster computing) CPUs owned by allied organizations (Grid computing) Any CPU, anywhere (volunteer computing)
Where’s the computing power? Goals of volunteer computing give science access to maximal computer power allocate resources based on merit, not money owned by individuals (~1 billion) owned by companies (~100M) owned by government (~50M)
A brief history of volunteer computing Projects Platforms distributed.net, GIMPS Popular Power Entropia United Devices, Parabon BOINC Climateprediction.net Einstein, IBM World Community Grid
The BOINC volunteer/project model Accounts PC Attachments Resource shares 40 % 60 % Volunteers Project s IBM WCG Climateprediction.ne t
The volunteer computing game Internet Projects Volunteers Do more science Involve public in science
Participation and computing power 500K active participants, 700K computers ~40 projects Computing power: about 2 PetaFLOPS That’s about 10X an IBM Blue Gene L ($300M)
Cost per TeraFLOPS-year Cluster (6.8 TeraFLOPS) power and A/C: $750K network hardware: $175K computing hardware (780 nodes): $1000K storage (300 TB RAID-6): $250K power: $140K/year sysadmin: $150K/year total: $124K/year Amazon EC2: $1.75M/year Average BOINC project: $2K/year
Volunteer computing ≠ Grid computing Resource owners Managed systems? Clients behind firewall? anonymous, unaccountable; need to check results no – need plug & play software yes – pull model yes – software stack requirements OK no – push model identified, accountable ISP bill? ye s nono... nor is it “peer-to-peer computing”
The BOINC project Location: UC Berkeley Space Sciences Lab Personnel director: David Anderson other employees: 1.5 programmers lots of volunteers Funding supported by NSF since 2002 current grant runs through Aug 2010
What the BOINC project does We develop software for volunteer computing We enable on-line communities What we don’t do: branding, hosting, authorizing, endorsing, controlling
BOINC software Distributed under LGPL license Server side uses Linux, Apache, MySQL, PHP Job distribution: C++, 20K lines Web features: PHP, 30K lines Client side uses WxWidgets, OpenGL Client: C++, 30K lines GUI: C++, 45K lines
BOINC server software High performance, scalability (10M jobs/day) Recovery from client errors and malfeasance MySQL DB (accounts, jobs, etc.) scheduler web site features file upload/ download executables, input files, output files assimilator DB purge file deleter transitioner validator work generator Clients and volunteers
Creating a BOINC project Set up server On a Linux box (some work) Use the BOINC VMware virtual server Use the BOINC VM for Amazon EC2 (easy but $$) Apply to IBM World Community Grid easy but restrictive Port application Develop web site Lots of testing and debugging Public relations and customer support
Volunteer’s view 1-click install All platforms Invisible, autonomic Highly configurable (optional)
BOINC client structure core client application BOINC library GUI screensaver local TCP schedulers, data servers Runtime system user preferences, control
Communication: “Pull” model client scheduler I can run Win32 and Win MB RAM 20GB free disk 2.5 GFLOPS CPU (description of current work) Here are three jobs. Job 1 has application files A,B,C, input files C,D,E and output file F...
The BOINC community Projects Volunteer programmers Alpha testers Online Skype-based help Translators (web, client) Documentation (Wiki) Teams
Some BOINC projects Climateprediction.net Oxford University Global climate modeling LIGO scientific collaboration gravitational wave detection U.C. Berkeley Radio search for E.T.I. and black hole evaporation Leiden Classical Leiden University Surface chemistry using classical dynamics
More projects CERN simulator of LHC, collisions Univ. of Muenster Quantum chemistry Bielefeld Univ. Sutdy nanoscale magnetism Leiden Univ. Number theory
Biomed-related BOINC projects University of Washington Rosetta: Protein folding, docking, and design Tanpaku Tokyo Univ. of Science Protein structure prediction using Brownian dynamics MalariaControl The Swiss Tropical Institute Epidemiological simulation
More projects Scripps Institute CHARMM, protein structure prediction SIMAP Tech. Univ. of Munich Protein similarity matrix Technion Genetic linkage analysis using Bayesian networks
More projects (IBM WCG) Dengue fever drug discovery U. of Texas, U. of Chicago Autodock Human Proteome Folding New York University Rosetta Scripps Institute Autodock
Future work How to get more volunteers? media bundling social networks How to get more projects? How to use future hardware? multicore CPUs GPUs video game consoles (e.g., PS3/Cell) set-top boxes mobile devices
Campus-level “meta-project” Applications 6 pilot apps: climate, fluid dynamics, nanotechnology, genetics, Volunteers 1,000 instructional PCs 5,000 faculty/staff 30,000 students 400,000 alumni general public NSF proposal submitted
Citizen Cyber-Science Distributed thinking Clickworkers, GalaxyZoo protein-folding game New software initiatives: Bolt and Bossa
Conclusion Volunteer computing: a new paradigm Distinct research problems, software requirements Computing power More Cheaper Democratic allocation Social impact Contact me about: Using BOINC Research based on BOINC