David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC
Public-resource computing GIMPS, distributed.net climateprediction.net names: public-resource computing peer-to-peer computing (no!) public distributed computing computing your computers academ ic business home PCs
The potential of public computing ● 500,000 CPUs, 65 TeraFLOPs ● 1 billion Internet-connected PCs in 2010, 50% privately owned ● If 100M participate: – ~ 100 PetaFLOPs – ~ 1 Exabyte (10^18) storage public computing Grid computing cluster computing supercomputin g p CPU power, storage capacity cost
Public/Grid differences
Economics (0 th order) cluster/Grid computingpublic-resource computing resources ($$) resources (free) you Internet ($$) Network (free) $1 buys 1 computer/day or 20 GB data transfer on commercial Internet Suppose processing 1 GB data takes X computer days Cost of processing 1 GB: cluster/Grid: $X PRC: $1/20 So PRC is cheaper if X > 1/20 X = 1,000)
Economics revisited Underutilized free Internet (e.g. Internet2) you commodity Internet... other institutions Bursty, underutilized flat-rate ISP connection Traffic shapers can send at zero priority ==> bandwidth may be free also
Why isn't PRC more widely used? ● Lack of platform – jxta, Jabber: not a solution – Java: apps are in C, FORTRAN – commercial platforms: business issues – cosm, XtremWeb: not complete ● Need to make PRC technology easy to use for scientists
BOINC: Berkeley Open Infrastructure for Network Computing ● Goals for computing projects – easy/cheap to create and operate projects – wide range of applications possible – no central authority ● Goals for participants – easy to participate in multiple projects – invisible use of disk, CPU, network ● NSF-funded; open source; in beta test –
requirements ideal: current: commercial Internet Berkeley participants tapes Internet2 commercial Internet Berkeley Stanford USC participants 50 Mbps 0.3 MB = 8 hrs CPU
Climateprediction.net ● Global climate study (Oxford Univ.) ● Input: ~10MB executable, 1MB data ● CPU time: 2-3 months (can't migrate) ● Output per workunit: – 10 MB summary (always upload) – 1 GB detail file (archive on client, may upload) ● Chaotic (incomparable results)
(planned) ● Gravity wave detection; LIGO; UW/CalTech ● 30, MB data sets ● Each data set is analyzed w/ 40,000 different parameter sets; each takes ~6 hrs CPU ● Data distribution: replicated 2TB servers ● Scheduling problem is more complex than “bag of tasks”
Intel/UCB Network Study (planned) ● Goal: map/measure the Internet ● Each workunit lasts for 1 day but is active only briefly (pings, UDP) ● Need to control time-of-day when active ● Need to turn off other apps ● Need to measure system load indices (network/CPU/VM)
General structure of BOINC ● Project: ● Participant: Scheduling server (C++) BOINC DB (MySQL) Work generation data server (HTTP) App data server (HTTP) Web interfaces (PHP) Core client (C++) Project back end Retry generation Result validation Result processing Garbage collection
Project web site features ● Download core client ● Create account ● Edit preferences – General: disk usage, work limits, buffering – Project-specific: allocation, graphics – venues (home/school/work) ● Profiles ● Teams ● Message boards, adaptive FAQs
General preferences
Project-specific preferences
Data architecture ● Files – immutable, replicated – may originate on client or project – may remain resident on client ● Executables are digitally signed ● Upload certificates: prevent DOS arecibo_ _jun_23_ uwi7eyufiw8e972h8f9w
Computation abstractions ● Applications ● Platforms ● Application versions – may involve many files ● Work units: inputs to a computation – soft deadline; CPU/disk/mem estimates ● Results: outputs of a computation
Scheduling: pull model scheduling server core client data server request X seconds of work host description result 1... result n download upload...compute...
Redundant computing replicator assimilator validator work generator canonical result clients scheduler select canonical result assign credit
BOINC core client core client file transfers restartable concurrent user limited program execution semi-sandboxed graphics control checkpoint control % done, CPU time app API app API shared mem
User interface screensaver control panel core client control/state RPCs activate screensaver app graphics
Anonymous platform mechanism ● User compiles applications from source, registers them with core client ● Report platform as “anonymous” to scheduler ● Purposes: – obscure platforms – security-conscious participants – performance tuning of applications
Project management tools ● Python scripts for project creation/start/stop ● Remote debugging – collect/store crash info (stack trace) – web-based browsing interface ● Strip charts – record, graph system performance metrics ● Watchdogs – detect system failures; dial pager
Conclusion ● Public-resource computing is a distinct paradigm from Grid computing ● PRC has tremendous potential for many applications (computing and storage) ● BOINC: enabling technology for PRC –