Volunteer Computing with BOINC Dr. David P. Anderson University of California, Berkeley SC10 Nov. 14, 2010.

Slides:



Advertisements
Similar presentations
BOINC: A System for Public-Resource Computing and Storage David P. Anderson University of California, Berkeley.
Advertisements

BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.
Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 2, 2007.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
The 9 th Annual Workshop September 2013 INRIA, Grenoble, France
Scientific Computing on Smartphones David P. Anderson Space Sciences Lab University of California, Berkeley April 17, 2014.
Volunteer Computing and Hubs David P. Anderson Space Sciences Lab University of California, Berkeley HUBbub September 26, 2013.
OM. Brad Gall Senior Consultant
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
1 port BOSS on Wenjing Wu (IHEP-CC)
Achievements and Opportunities in Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley 18 April 2008.
A Guided Tour of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley TACC November 8, 2013.
HTCondor and BOINC. › Berkeley Open Infrastructure for Network Computing › Grew out of began in 2002 › Middleware system for volunteer computing.
A Distributed Computing System Based on BOINC September - CHEP 2004 Pedro Andrade António Amorim Jaime Villate.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 7, 2008.
Volunteer Computing with BOINC David P. Anderson Space Sciences Laboratory University of California, Berkeley.
Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Introduction to the BOINC software David P. Anderson Space Sciences Laboratory University of California, Berkeley.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
07:44:46Service Oriented Cyberinfrastructure Lab, Introduction to BOINC By: Andrew J Younge
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
A Brief Documentation.  Provides basic information about connection, server, and client.
Volunteer Computing with GPUs David P. Anderson Space Sciences Laboratory U.C. Berkeley.
and Citizen Cyber-Science David P. Anderson Space Sciences Laboratory U.C. Berkeley.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public and Grid Computing.
TEMPLATE DESIGN © BOINC: Middleware for Volunteer Computing David P. Anderson Space Sciences Laboratory University of.
Intro to Datazen.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
BOINC: An Open Platform for Public-Resource Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Volunteer Computing with BOINC Dr. David P. Anderson University of California, Berkeley SC10 Nov. 14, 2010.
Volunteer Computing and BOINC Dr. David P. Anderson University of California, Berkeley Dec 3, 2010.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec
The Future of Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab UH CS Dept. March 22, 2007.
Volunteer Computing in the Next Decade David Anderson Space Sciences Lab University of California, Berkeley 4 May 2012.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
Volunteer Computing: Involving the World in Science David P. Anderson U.C. Berkeley Space Sciences Lab February 16, 2007.
Volunteer Computing: the Ultimate Cloud Dr. David P. Anderson University of California, Berkeley Oct 19, 2010.
The Limits of Volunteer Computing Dr. David P. Anderson University of California, Berkeley March 20, 2011.
Volunteer Computing and Large-Scale Simulation David P. Anderson U.C. Berkeley Space Sciences Lab February 3, 2007.
Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.
Technology for Citizen Cyberscience Dr. David P. Anderson University of California, Berkeley May 2011.
Volunteer Computing with BOINC: a Tutorial David P. Anderson Space Sciences Laboratory University of California – Berkeley May 16, 2006.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov
Session 2: Creating a BOINC project. The BOINC database (MySQL) ● Application ● Application version ● Platform ● Workunit ● Result ● Host ● User ● Trickle.
Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab January 30, 2007.
An Overview of Volunteer Computing
A Brief History of BOINC
Volunteer Computing and BOINC
The Future of Volunteer Computing
University of California, Berkeley
Volunteer Computing: SETI and Beyond David P
Volunteer Computing for Science Gateways
Designing a Runtime System for Volunteer Computing David P
Exa-Scale Volunteer Computing
Platform as a Service.
Cloud based Open Source Backup/Restore Tool
The software infrastructure of II
University of California, Berkeley
Azure Container Service
Presentation transcript:

Volunteer Computing with BOINC Dr. David P. Anderson University of California, Berkeley SC10 Nov. 14, 2010

Goals Explain volunteer computing Teach how to create a volunteer computing project using BOINC Target audience: High-throughput computing users Technical skills: Basic Linux/Apache sysadmin, familiarity with PHP, SQL and XML, C/C++ (optional)

Outline Why use volunteer computing? Basic concepts of BOINC Developing BOINC applications (15 minute break) Deploying a BOINC server Deploying applications Submitting jobs Organizational issues

Part 1: Why use volunteer computing?

The Consumer Digital Infrastructure 1 billion PCs current GPUs: 1 TeraFLOPS (1,000 ExaFLOPS total) Storage: ~1,000 Exabytes Commodity Internet: 10-1,000 Mbps to home Consumers pay for hardware sysadmin network costs electricity

Volunteer computing PC owners donate computing resources to projects (e.g., computational science) Applications run at zero priority while PC in use, and/or while PC is not in use

Examples Projectstartwhereareapeak #hosts GIMPS1994math10,000 distributed.net1995cryptography100,000 I1999UCBSETI600,000 United Devices2002commercialbiomedicine200,000 CPDN2003Oxfordclimate change150,000 WCG2004commercialbiomedicine200,000 II2005UCBSETI850,000 Washbiology100,000 SIMAP2005T.U. Munichbioinformatics10,

Current status ~50 projects 500,000 vounteers 800,000 computers

High-throughput computing High-performance computing cluster (MPI) supercomputer cluster (batch) Grid Commercial cloud Volunteer computing single job # processors multiple jobs 10K-1M

Volunteer computing is different You don’t buy resources; you ask for them Resources are:  heterogeneous  sporadically available and connected  untrusted and not private  behind firewalls/NATs/proxies

Part 2: Basic concepts of BOINC

About BOINC Funded by NSF since 2002 Open-source (LGPL) Based at UC Berkeley Few staff, but lots of volunteers software testing translation documentation support ( lists, message boards, Skype)

Volunteers and projects volunteers projects CPDN WCG attachments

BOINC software overview client apps screensaver GUI scheduler MySQL data server daemons volunteer host project server HTTP

BOINC scheduler applications Win32 + NVIDIA Win64 Mac OS X app versions jobs instances Win32 N-core Win32 - HW, SW description - existing workload - per resource type: # of instances requested # of seconds requested - app version descriptions - job descriptions

Job replication Job instances may fail or return wrong results Job replication: do 2, see if they agree  “agree” may be fuzzy Homogeneous replication  numerical equivalence of hosts Adaptive replication  reduce replication for hosts that seem trustworthy

The job pipeline work generator BOINC validator assimilator

The BOINC data model App versions, job inputs, job output can consist of arbitrarily many files Each file has a physical name (unique, immutable); each reference to a file has a “logical name” Files have various attributes (e.g., sticky) Each file can have one or more URLs, and are transferred via HTTP App version files are digitally signed

What kinds of jobs can BOINC handle? Pretty much anything you’d run on a Grid Bag of tasks (but IPC support soon) Short/long jobs Data intensive, up to a point Geared towards  Few apps, many jobs (high startup cost per app)  Jobs with high slack time

Part 3: Application development for BOINC

The BOINC runtime environment processes files

Native BOINC applications boinc_init()  create runtime system thread boinc_finish()  write finish file boinc_resolve_filename(logical, physical) boinc_fraction_done(x)

Checkpointing bool boinc_time_to_checkpoint()  call when in checkpointable state boinc_checkpoint_done()

The BOINC wrapper Can use for legacy apps XML input file lists sub-jobs  executable, input files What it does:  interfaces to BOINC client  copies files to/from slot directory  runs executables  does checkpointing at sub-job level

Building app versions Linux  gcc Windows  Visual Studio  minGW (gcc) Mac OS X  xcode

Multithread apps boinc_init_parallel() Allows suspend/resume of all threads  Unix: fork/exec  Windows: direct thread control

GPU app versions Develop for NVIDIA or ATI, with CUDA, CAL, OpenCL, etc. (BOINC supplies samples) Each version has a “plan class” For each plan class, supply a function that determines  can app run on this host? hardware, driver version, etc.  what resources will it use? #CPUs, #GPUs, GPU RAM, etc.

VM apps Develop apps on your favorite OS Create a VirtualBox VM image App version consists of  VM wrapper (supplied by BOINC)  VM image  app executable

Part 4: Deploying a BOINC server

Hardware options Native Linux host  download/compile BOINC software BOINC server VM (VMware/Debian) BOINC Amazon EC2 image

Components of a project Master URL name MySQL database Directory hierarchy A set of daemon processes and cron jobs

Processes work generator validator assimilator feeder MySQL DB scheduler transitioner file deleter DB purger clients

Project directory hierarchy apps/application files bin/daemon programs cgi-bin/BOINC scheduler and upload GCI config.xmlconfiguration file download/downloadable files html/web site; master URL points here keys/keys for code signing, upload auth log_(hostname)daemon log files project.xmllist of platforms and apps upload/uploaded files

BOINC database platform app app_version user host workunit result...

Creating a project make_project name creates  directory hierarchy  DB  mods for httpd.conf  crontab entry

Project configuration and control config.xml  scheduling and other options  list of daemons  list of periodic tasks project control  bin/start: start daemons, enable scheduler  bin/stop: stop daemons, disable scheduler  bin/status

Scaling a BOINC server Components can run on different machines sharing a file system Each component can be distributed MySQL server is typically the bottleneck 1 server machine can issue ~100K jobs/day; 4 machines can issue > 1 million

Part 5: Deploying applications

Adding an application edit project.xml run bin/xadd multi_thread Test multi-thread apps

Adding an application version Create application version directory Sign files on offline computer run bin/update_versions apps/ uppercase/ uppercase_6.14_windows_intelx86__cuda.exe/ uppercase_6.14_windows_intelx86__cuda.exe graphics_app=uppercase_graphics_6.14_windows_intelx86.exe logo.jpg Helvetica.txf

Part 6: Submitting jobs

Describing job inputs Input template file 0 0 in 1 -cpu_time

Describing job outputs Output template file out

Submitting a job Stage input files Submit job create_work –appname A –wu_name B –wu_template C –result_template D cp test_files/12ja04aa `bin/dir_hier_path 12ja04aa`

Part 7: Organizational issues

Single-scientist projects Need to: Port apps Get publicity interface with public maintain servers Not many research groups have the resources And it creates a lot of competing “brands”

Umbrella projects Example: IBM World Community Grid Project publicity web development sysadmin app porting

The model A university has – scientists – a powerful “brand” – PR resources – IT infrastructure – lots of alumni (UCB: 500,000)

Hubs nanoHUB: “science portal” for nanoscience – social network + “app store” – sharing of ideas, data, software – computational portal HUBzero: generalization to other areas – currently ~20 hubs Integration of BOINC with HUBzero – each hub has a volunteer computing project