Presentation is loading. Please wait.

Presentation is loading. Please wait.

Volunteer Computing in Biology David P. Anderson Space Sciences Lab U.C. Berkeley 10 Sept 2007.

Similar presentations


Presentation on theme: "Volunteer Computing in Biology David P. Anderson Space Sciences Lab U.C. Berkeley 10 Sept 2007."— Presentation transcript:

1 Volunteer Computing in Biology David P. Anderson Space Sciences Lab U.C. Berkeley 10 Sept 2007

2 Outline ● Goals of volunteer computing ● How BOINC works ● Some biology projects using BOINC ● Some new directions

3 Goal: Use all the computers in the world to do worthwhile things ● What do we mean by “computers”? ● Who owns the computers? – Individuals (60% and rising) – Organizations ● What does “worthwhile” mean?

4 BOINC (Berkeley Open Infrastructure for Network Computing) ● Middleware for volunteer computing ● Open-source (LGPL) ● Application-driven PC Projects Accounts Attachments with resource share 60% 40 %

5 The volunteer computing game Internet Projects Volunteers ● Do more science ● Involve public in science

6 Computing power ● Folding@home: – 650 TeraFLOPS ● 200 from PCs; 50 from GPUs; 400 from PS3 ● BOINC-based projects:

7 Cost per TeraFLOPS-year ● Cluster (6.8 TeraFLOPS) – power and A/C: $750K – network hardware: $175K – computing hardware (780 nodes): $1000K – storage (300 TB RAID-6): $250K – power: $140K/year – sysadmin: $150K/year – total: $124K ● Amazon EC2: $1.75M ● Average BOINC project: $1.25K

8 Volunteer computing <> Grid computing Resource owners Managed systems? Clients behind firewall? anonymous, unaccountable; need to check results no – need plug & play software yes – pull model yes – software stack requirements OK no – push model identified, accountable ISP bill? ye s nono... nor is it “peer-to-peer computing”

9 How BOINC works: server DB Platforms Application s Job s Job instances Account s App versions Host s

10 Job replication ● Problem: can’t trust volunteers – computational result – claimed credit ● No replication, application-specific checks ● Replicated computing – do N copies, require that M of them agree – not bulletproof (collusion) time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 created validate; assimilate Job x x x created sent success Instance 1 x x---------------x created sent error Instance 2 x x--------x created sent success Instance 3 x x-------------------x created sent success Instance 4 x x----------------------x

11 How to compare results? ● Problem: numerical discrepancies ● Stable problems: fuzzy comparison ● Unstable problems – Eliminate discrepancies ● compiler/flags/libraries – Homogeneous replication ● send instances only to numerically equivalent hosts (equivalence may depend on app)

12 Work flow work generator (creates stream or batches of jobs) assimilator (handles correct result) validator (compares replicas, selects “correct” result) BOINC

13 Ways to create a BOINC project ● Set up a server manually ● Use the BOINC virtual server ● Use the BOINC VM for Amazon EC2 – (in development) ● Apply to IBM World Community Grid

14 Volunteer’s view ● 1-click install, zero configuration ● All platforms ● Invisible, autonomic

15 BOINC client structure core client application BOINC library GUI screensave r local TCP schedulers, data servers Runtime system user preferences, control

16 Communication: “Pull” model client scheduler I can run Win32 and Win64 512 MB RAM 20GB free disk 2.5 GFLOPS CPU (description of current work) Here are three jobs. Job 1 has application files A,B,C, input files C,D,E and output file F...

17 Biomed-related BOINC projects ● Rosetta@home – University of Washington – Rosetta: Protein folding, docking, and design – 90,000 hosts, 37 TeraFLOPS ● Tanpaku – Tokyo Univ. of Science – Protein structure prediction using Brownian dynamics ● MalariaControl – The Swiss Tropical Institute – Epidemiological simulation

18 More projects ● Predictor@home – Scripps Institute – CHARMM, protein structure prediction ● SIMAP – Tech. Univ. of Munich – Protein similarity matrix ● Superlink@Technion – Technion – Genetic linkage analysis using Bayesian networks

19 More projects (IBM WCG) ● Dengue fever drug discovery – U. of Texas, U. of Chicago – Autodock ● Human Proteome Folding – New York University – Rosetta ● FightAIDS@home – Scripps Institute – Autodock

20 Berkeley@home ● Campus-level “meta-project” ● Applications – 6 pilot apps: climate, fluid dynamics, nanotechnology, genetics, ● Volunteers – 1,000 instructional PCs – 5,000 faculty/staff – 30,000 students – 400,000 alumni – general public ● NSF proposal submitted

21 Rosetta@home plan ● Protein structure prediction – low-res: combinatorial, spatial, intuitive; humans do better than computers – high-res: computers do better ● Interactive “protein manipulation” program ● Teams as management structures – tasks are given to (possibly multiple) teams – managers organize and schedule sub-groups with particular skills or resources – communication paths between sub-groups

22 Multi-threading support ● What’s in a $1000 PC? – 2007: dual-core CPU, 4 GFLOPS, 1 GB RAM – 2010: 80-core CPU, 100 GFLOPS, 8 GB RAM – Volunteer computing provides a use for all those cores, but you may run out of RAM ● BOINC support for multi-thread apps ● Languages/libraries for parallel programming – Open MP – Titanium, Cilk, RapidMind, PeakStream... core client app Try to use N cores OK, I’m using M cores

23 Skill aggregation (human computing) ● Web-based vision tasks – Stardust@home, Clickworkers, galaxy classification ● Amazon “Mechanical Turk” ● Validation ● Formulation as multi-person game – Louis von Ahn: image tagging ● Motivational axes: competitio n communit y

24 Berkeley Open Learning Technology (BOLT) ● DB-driven CMS and analytics engine for web-based teaching content (lessons, exercises) course structure (XML) teaching engine (PHP) Sequencing, navigation student info, interaction DB Student s analytical tools Educator s

25 DB and web integration Accounts, teams and groups Communication Credit and competition BOINC hosts applicatio ns jobs BOLT lessons courses BOSSA tasks

26 Conclusion ● Volunteer computing: a new paradigm – distinct research problems, software requirements – big accomplishments, potential ● Social impacts ● Contact me about: – Using BOINC – Research based on BOINC davea@ssl.berkeley.edu


Download ppt "Volunteer Computing in Biology David P. Anderson Space Sciences Lab U.C. Berkeley 10 Sept 2007."

Similar presentations


Ads by Google