David P. Anderson Space Sciences Laboratory University of California – Berkeley Public and Grid Computing
History GIMPS, distributed.net climateprediction.net names: public-resource computing peer-to-peer computing public distributed computing computing
The potential of public computing ● 1 billion Internet-connected PCs in 2010 ● 50% privately owned ● If 10% participate: – At least 100 PetaFLOPs, 1 Exabyte (10^18) storage public computing Grid computing cluster computing supercomputin g p CPU power, storage capacity cost
Economics (simplified) ● Public: – you buy bandwidth, computers are free ● Grid – you buy computers, bandwidth is free ● $1 buys 24 GB transfer or 1 CPU day ● Let X = CPU hrs/GB – if X > 1, public computing may be cheaper – X = 3,000
Why hasn't PRC taken off? ● Lack of platform – jxta, Jabber: not a solution – Java: apps are in C, FORTRAN – commercial platforms: money issues – cosm: not complete ● Need to connect scientists to technology
Public/Grid differences
● Running since May 1999 ● ~500,000 active participants ● ~60 TeraFLOPs (grows w/ Moore's Law) ● Problems with current software – hard to change/add algorithms – inflexible data architecture – can't share participants w/ other projects
data architecture ideal: current: commercial Internet Berkeley participants tapes Internet2 commercial Internet Berkeley Stanford USC participants 50 Mbps
BOINC: Berkeley Open Infrastructure for Network Computing ● Goals for computing projects – easy/cheap to create and operate DC projects – wide range of applications possible – no central authority ● Goals for participants – easy to participate in multiple projects – invisible use of disk, CPU, network ● NSF-funded; open source; in beta test –
General structure of BOINC ● Project: ● Participant: Scheduling server (C++) BOINC DB (MySQL) Work generation data server (HTTP) App agent data server (HTTP) Web interfaces (PHP) Core agent (C++) Project back end Retry generation Result validation Result processing Garbage collection
BOINC features ● Computation model – redundant computing, soft deadlines ● Data model – flexible; long-term storage on clients ● Programming environment ● Administrative tools ● Minimal API (support legacy apps) ● Credit system ● Participant web features
Projects ● Current (at UCB Space Sciences Lab) – – Astropulse ● In progress – (Stanford) – Climateprediction.net (Oxford) ● Planned – LIGO (physics) – CERN
Conclusion ● Public-resource computing is a distinct paradigm from Grid computing ● PRC has tremendous potential for some applications (computing and storage) ● BOINC: enabling technology for PRC