Download presentation
Presentation is loading. Please wait.
Published byMatilda Powers Modified over 9 years ago
1
Celebrating Diversity in Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley Sept. 1, 2008
2
Background Volunteer computing distributed scientific computing using volunteered resources (desktops, laptops, game consoles, cell phones, etc.) BOINC middleware for volunteer (and desktop grid) computing
3
Diversity of resources CPU type, number, speed RAM, disk Coprocessors OS type and version network performance availability proxies system availability reliability crashes, invalid results, cheating
4
Diversity of applications Resource requirements CPU, coprocessors, RAM, storage, network Completion time constraints Numerical properties same result on all CPUs a little different unboundedly different
5
IBM World Community Grid “Umbrella” project sponsored by IBM Rice genome study: Univ. of Washington Protein X-ray crystallography: Ontario Cancer Inst. African climate study: Univ. of Capetown Dengue fever drug discovery: Univ. of Texas Human protein folding: NYU, Univ. of Washington HIV drug discovery: Scripps Institute Started Nov. 2004 390,000 volunteers total 167,000 years of CPU time Currently ~170 TeraFLOPS
6
CPU type
7
# cores
8
OS type
9
RAM
10
Free disk space
11
Availability
12
Job error rate
13
Average turnaround time
14
Current WCG applications
15
Job dispatching 1M jobs schedulerclient Goals maximize system throughput minimize time to batch completion minimize time to grant credit scale to >100 requests/sec
16
BOINC scheduler architecture Job queue (DB) Scheduler client Feeder Job cache (shared memory) Issues: what if cache fills up with unsendable jobs? what is client needs a job not in cache?
17
Homogeneous replication Different platforms do FP math differently makes result validation difficult Divide platforms into equivalence classes, send instances of a job to a single class “Census” program computes distribution Scheduler: send committed jobs if possible Win/IntelWin/AMDetc. uncommitted
18
Retry acceleration Retries needed when: job times out error (crash) returned results fail to validate Send retries to hosts that are: fast (low turnaround) reliable Shorten latency bound of retries
19
Volunteer app selection Volunteers can select apps opt to accept jobs from non-selected apps
20
Fast feasibility checks (no DB) Client sends: hardware spec availability info list of jobs queued, in progress Resource checks Completion time check EDF simulation deadlines missed?
21
Slow feasibility checks (DB) Is job still needed? Has another replica been sent to this volunteer?
22
job Application Platform mechanism Jobs are associated with apps, not versions Win/x86Win/x64 Linux/x8 6 App versions Request message: platform 0: Win64 platform 1: Win32 Application Win/x86Win/x64 Linux/x86 App versions job
23
Host punishment The problem: hosts that error out all jobs Maintain M(h): max jobs per day for host h On each error, decrement M(h) On valid job, double M(h)
24
Anonymous platform mechanism Rather than downloading apps from server, client has preexisting local apps. Scheduler: if client has its own apps, only send it jobs for those apps. Usage scenarios: Computers with unsupported platforms People who optimize apps Security-conscious people who want to inspect the source code
25
Old scheduling policy Job cache scan start from random point do fast feasibility checks lock job, do slow feasibility checks Multiple scans send jobs committed to an HR class if fast host, send retries send work for selected apps is allowed, send work for non-selected apps Problems rigid policy app == 1 CPU
26
Coprocessor and multi-thread apps How to select the best version for a given host? How to estimate performance on the host? Win/x86 single- threaded multi- threaded CUDA
27
Multithread/coprocessor (cont.) How to decide which app version to use? app versions have “plan class” string scheduler has project-supplied function bool app_plan(SCHEDULER_REQUEST &sreq, char* plan_class, HOST_USAGE&); returns: whether host can run app coprocessor usage CPU usage (possibly fractional) expected FLOPS cmdline to pass to app embodies knowledge about sublinear speedup, etc. Scheduler: call app_plan() for each version, use the one with highest expected FLOPS
28
Multithread/coprocessor (cont.) Client coprocessor handling (currently just CUDA) hardware check/report scheduling (coprocessors not timesliced) CPU scheduling run enough apps to use at least N cores
29
Score-based scheduling random N rank by score feasible jobs send M highest-scoring jobs
30
Terms in the score function Bonus if host is fast and job is a retry job is committed to HR class app was selected by volunteer
31
Job size matching Goal: send large jobs to fast hosts, small jobs to slow hosts reduce credit-granting delay reduce server occupancy time Census program maintains host statistics Feeder maintains job size statistics Score penalty: |job - host| 2
32
Adaptive replication Goal: achieve a target level of reliability while reducing replication to 1+ε Idea: replicate less (but always some) as a host becomes more trusted Policy: maintain “invalid rate” E(h) per host. if E(h) > X, replicate (e.g., 2-fold) else replicate with probability E(h)/X Is there a counterstrategy?
33
Server simulation How do we know these policies are any good? How can we study alternatives? In situ study is difficult SIMBA emulator (U. of Delaware): SIMBA (emulates N clients) BOINC server (not emulated)
34
Upcoming scheduler changes Problems: only use 1 app version completion-time simulation is antiquated (doesn’t reflect multithread, coprocessor, RAM limitations) New concept: resource signature #CPUs, #coprocessors, RAM Do simulation based on “greedy EDF scheduling” using resource signature Select app version that can use available resources
35
Conclusion Volunteer computing has diverse resources and workloads BOINC has mechanisms that deal effectively and efficiently with this diversity Lots of fun research problems here! davea@ssl.berkeley.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.