Volunteer Computing in the Next Decade David Anderson Space Sciences Lab University of California, Berkeley 4 May 2012.

Slides:



Advertisements
Similar presentations
BOINC Berkeley Open Infrastructure for Network Computing An open-source middleware system for volunteer and grid computing (much of the images and text.
Advertisements

BOINC The Year in Review David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009.
Scientific Computing on Smartphones David P. Anderson Space Sciences Lab University of California, Berkeley April 17, 2014.
Volunteer Computing and Hubs David P. Anderson Space Sciences Lab University of California, Berkeley HUBbub September 26, 2013.
1 port BOSS on Wenjing Wu (IHEP-CC)
Achievements and Opportunities in Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley 18 April 2008.
A Guided Tour of BOINC David P. Anderson Space Sciences Lab University of California, Berkeley TACC November 8, 2013.
Volunteer Computing with BOINC David P. Anderson Space Sciences Laboratory University of California, Berkeley.
Scientific Computing in the Consumer Digital Infrastructure David P. Anderson Space Sciences Lab University of California, Berkeley The Austin Forum November.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Introduction to the BOINC software David P. Anderson Space Sciences Laboratory University of California, Berkeley.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Volunteer Computing with BOINC Dr. David P. Anderson University of California, Berkeley SC10 Nov. 14, 2010.
Volunteer Computing with GPUs David P. Anderson Space Sciences Laboratory U.C. Berkeley.
and Citizen Cyber-Science David P. Anderson Space Sciences Laboratory U.C. Berkeley.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public and Grid Computing.
TEMPLATE DESIGN © BOINC: Middleware for Volunteer Computing David P. Anderson Space Sciences Laboratory University of.
Internet2 AdvCollab Apps 1 Access Grid Vision To create virtual spaces where distributed people can work together. Challenges:
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
BOINC: An Open Platform for Public-Resource Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Lecture 1: Network Operating Systems (NOS)
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
A Tour of Citizen Cyber-Science David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Volunteer Computing and BOINC Dr. David P. Anderson University of California, Berkeley Dec 3, 2010.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec
The Future of Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab UH CS Dept. March 22, 2007.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
David P. Anderson Space Sciences Laboratory University of California – Berkeley A Million Years of Computing.
Volunteer Computing: Involving the World in Science David P. Anderson U.C. Berkeley Space Sciences Lab February 16, 2007.
Volunteer Computing: the Ultimate Cloud Dr. David P. Anderson University of California, Berkeley Oct 19, 2010.
A Brief History of (CPU) Time -or- Ten Years of Multitude David P. Anderson Spaces Sciences Lab University of California, Berkeley 2 Sept 2010.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Supercomputing with Personal Computers.
The Limits of Volunteer Computing Dr. David P. Anderson University of California, Berkeley March 20, 2011.
Volunteer Computing and Large-Scale Simulation David P. Anderson U.C. Berkeley Space Sciences Lab February 3, 2007.
Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.
Technology for Citizen Cyberscience Dr. David P. Anderson University of California, Berkeley May 2011.
Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab Nov. 15, 2006.
Volunteer Computing with BOINC: a Tutorial David P. Anderson Space Sciences Laboratory University of California – Berkeley May 16, 2006.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov
Volunteer Computing David P. Anderson U.C. Berkeley Space Sciences Lab January 30, 2007.
An Overview of Volunteer Computing
A Brief History of BOINC
Volunteer Computing and BOINC
The Future of Volunteer Computing
The 9th Annual BOINC Workshop
University of California, Berkeley
Building a Global Brain David P. Anderson U. C
Volunteer computing PC owners donate idle cycles to science projects
Chapter 1: Introduction
Volunteer Computing: SETI and Beyond David P
Volunteer Computing for Science Gateways
Designing a Runtime System for Volunteer Computing David P
A Roadmap for Volunteer Computing in the U.S.
Types of Operating System
Network Operating Systems (NOS)
Exa-Scale Volunteer Computing
David P. Anderson Space Sciences Lab UC Berkeley LASER
Lecture 1: Network Operating Systems (NOS)
The Global Status of Citizen Cyberscience
Prepared by: Ms. Amira al-Ghanem Prepared for: Ms. Omarine
TYPES OFF OPERATING SYSTEM
Objectives Overview Explain why computer literacy is vital to success in today’s world Define the term, computer, and describe the relationship between.
University of California, Berkeley
LO2 – Understand Computer Software
Presentation transcript:

Volunteer Computing in the Next Decade David Anderson Space Sciences Lab University of California, Berkeley 4 May 2012

The consumer digital infrastructure (CDI) Consumer ● 1.5 billion PCs – GPU-equipped – paid for by owner ● 5 billion mobile devices ● Commodity Internet Organizational ● ~ 5 million cluster/cloud nodes ● supercomputers ● Research networks

Why not use the CDI for scientific computing? ● How to get access to PCs? – Poll: would you pay $5/month to help ● do cancer research ● predict climate change ● find the Higgs boson – about 5% of people answer “yes” ● 5% of 1.5B = 75M

Many applications have two levels of parallelism ● Don’t need to have all your FLOPS in one cabinet or room jobs (millions, few dependencies)... HTC HPC threads (1-1000)

Examples ● Analysis of large data – radio/optical/gravitational instruments – particle colliders – genetic mapping and analysis ● Physical simulations – molecular, global, cosmic ● Biology-inspired computing – genetic optimization algorithms

Volunteer computing ● PC owners volunteer to let science projects use their computing resources ● Your PC can – simulate collisions for CERN – discover gravitational waves – discover binary pulsars – research human diseases – predict global climate change – etc.

BOINC: middleware for volunteer computing ● Supported by NSF since 2002 ● Open source (LGPL) ● Based at University of California, Berkeley ●

Issues handled by BOINC ● Heterogeneity – multiple platforms ● Handle untrusted, anonymous resources – Result validation ● replication, adaptive replication – Credit: amount of work done by a host or user ● Consumer-friendly client

Supported resource types ● Desktops, laptops, cluster nodes, servers – Windows, Mac OS X, Linux, other Unix – Can use GPUs as well as CPUs ● Tablets and smart phones – Android ● Video game consoles – Sony Playstation 3

How it works home PC BOINC client project get jobs download data, executables comput e upload outputs ● Can attach to multiple projects ● Works “in the background”; configurable ● Interruptible BOINC server

Volunteer computing today ● 700,000 active computers – 70% availability – 40% with usable GPUs ● 12 PetaFLOPS actual throughput ● Projects – – IBM World Community Grid – –... ~ 50 others

Cost The cost of 10 TeraFLOPS for 1 year: ● CPU cluster: $1.5M ● Amazon EC2: $4M – 5,000 instances ● Volunteer: ~ $0.1M

The future of the CDI ● Positives – Tech R&D is driven by the consumer market – Ever-increasing numbers of devices ● Concerns – thin clients – mobile (battery-powered) devices – energy-saving mechanisms – proprietary software environments ● Apple iOS, Windows RT, Xbox, Playstation

The capacity of the CDI in 5 years ● Participation: 50M hosts? ● Processing – CPU: 50M * 1 TFLOPS = 50 ExaFLOPS – GPU: 50M * 20 TFLOPS = 1000 ExaFLOPS ● Storage: – 50M * 10 TB = 500 ExaBytes ● Network bandwidth – 50M * 100Mbps = 5 Petabit/sec

Using GPUs ● BOINC detects and schedules GPUs – NVIDIA, AMD, soon Intel – multiple/mixed GPUs – various hardware capabilities – various software capabilities (CUDA, OpenCL, CAL) ● Problems – non-preemptive GPU scheduling – no paging of GPU memory

Using VM technology ● CDI platforms: – 85% Windows – 7% Linux – 7% Mac OS X ● Developing/maintaining versions for different platforms is hard ● Even making a portable Linux executable is hard

BOINC VM support ● Create a VM image for your favorite environment ● Create executables for that environment BOINC client VirtualBox executive Vbox wrapper VM instance shared directory: executable input, output files

Data-intensive computing ● Examples – LHC data analysis – Square Kilometer Array – Wide-field survey telescopes – Genetic mapping and analysis – Storage of simulation results ● Performance issues – network – storage space on clients

Networking landscape Commodity Internet PCs ~10 Mbps Institution s 1-10 Gbps Research networks (free) $$ ● Most PCs are behind NATs/firewalls – only use outgoing HTTP

Properties of clients ● Availability – hosts may be turned off – hosts may be unavailable by user preference ● time of day ● PC is busy or not busy ● Churn – The active life of hosts is exponentially distributed with mean ~100 days ● Heterogeneity – wide range of hardware/software properties

BOINC storage architecture Data archival Storage application s Locality scheduling Dataset storage BOINC storage infrastructure Disk space management; allocation among projects framework for application- specific policies file transfer

Storage applications ● Storage datasets – static – streaming ● Locality scheduling ● Archival of simulation results ● Data archival

Static dataset storage ● Goal: – Minimize turnaround time of queries against a dataset cached on clients ● Scheduling policies – whether to use a given host – how much data to store on a given host ● should be proportional to its processing speed – update these decisions as hosts come and go

Streaming datasets ● Goal: maximize the window of data availability, subject to a bound on data loss ● Example: Square Kilometer Array (SKA) – ~1000 radio telescopes, 2 Gbps each – beam-forming via correlation – store data on clients in “stripes” across all telescopes – with 10M clients, 5 TB each, can store several years of data and perform a variety of analysis functions

Locality scheduling ● Each file in a large dataset is input for a large number of jobs ● Goal: process the dataset using the least network traffic ● Example: analysis of LIGO data

Locality scheduling ● Processing jobs sequentially is pessimal – every file gets sent to every client jobs... PCs files

Locality scheduling: ideal ● Each file is downloaded to 1 host ● Problems – Typically need job replication – Widely variable host throughput jobs PCs jobs

Locality scheduling: compromise ● New hosts are assigned to slowest team ● Teams are merged when they collide ● Each file is downloaded to ~20 hosts jobs teams jobs

Organizational issues ● What’s needed to operate an effective volunteer computing project? – learn about BOINC – sysadmin, DB admin, application porting – publicity/marketing: web, media – provide a continuous supply of jobs ● Most research projects can’t do all of these

Umbrella projects ● One project serves many scientists ● Examples – (Chinese Academy of Scientists) – World Community Grid (IBM) – U. of Westminster (desktop grid) – Ibercivis (Spanish consortium)

How to use volunteer computing ● Think about future computing needs – what can be mapped to volunteer computing? – how could you use unlimited processing power? ● Think about your organizational structure – mobilize at the appropriate level ● Plan and gather resources – include in your next grant proposal

Conclusion ● Volunteer computing is the shortest path to ExaFLOPS computing ● It’s usable for many applications ● It enables breakthrough research ● It requires a leap of faith