Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.

Slides:



Advertisements
Similar presentations
High Performance Computing Course Notes Grid Computing.
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Scientific Computing on Smartphones David P. Anderson Space Sciences Lab University of California, Berkeley April 17, 2014.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Volunteer Computing and Hubs David P. Anderson Space Sciences Lab University of California, Berkeley HUBbub September 26, 2013.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Volunteer Computing with BOINC David P. Anderson Space Sciences Laboratory University of California, Berkeley.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
and Citizen Cyber-Science David P. Anderson Space Sciences Laboratory U.C. Berkeley.
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Designing Middleware for Volunteer Computing.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public and Grid Computing.
TEMPLATE DESIGN © BOINC: Middleware for Volunteer Computing David P. Anderson Space Sciences Laboratory University of.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
BOINC: An Open Platform for Public-Resource Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.
Volunteer Computing and BOINC Dr. David P. Anderson University of California, Berkeley Dec 3, 2010.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec
Volunteer Computing in the Next Decade David Anderson Space Sciences Lab University of California, Berkeley 4 May 2012.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
Volunteer Computing: Involving the World in Science David P. Anderson U.C. Berkeley Space Sciences Lab February 16, 2007.
Volunteer Computing: the Ultimate Cloud Dr. David P. Anderson University of California, Berkeley Oct 19, 2010.
A Brief History of (CPU) Time -or- Ten Years of Multitude David P. Anderson Spaces Sciences Lab University of California, Berkeley 2 Sept 2010.
David P. Anderson Space Sciences Laboratory University of California – Berkeley Supercomputing with Personal Computers.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
The Limits of Volunteer Computing Dr. David P. Anderson University of California, Berkeley March 20, 2011.
David P. Anderson UC Berkeley Gilles Fedak INRIA The Computational and Storage Potential of Volunteer Computing.
Volunteer Computing and Large-Scale Simulation David P. Anderson U.C. Berkeley Space Sciences Lab February 3, 2007.
Technology for Citizen Cyberscience Dr. David P. Anderson University of California, Berkeley May 2011.
Volunteer Computing with BOINC: a Tutorial David P. Anderson Space Sciences Laboratory University of California – Berkeley May 16, 2006.
Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 28 Nov
An Overview of Volunteer Computing
Volunteer Computing and BOINC
Chapter 1 Characterization of Distributed Systems
The Future of Volunteer Computing
The 9th Annual BOINC Workshop
University of California, Berkeley
Volunteer computing PC owners donate idle cycles to science projects
Volunteer Computing: SETI and Beyond David P
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Volunteer Computing for Science Gateways
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Designing a Runtime System for Volunteer Computing David P
A Roadmap for Volunteer Computing in the U.S.
Large-scale file systems and Map-Reduce
Exa-Scale Volunteer Computing
The Global Status of Citizen Cyberscience
Network Requirements Javier Orellana
Introduction to Networks
TYPES OFF OPERATING SYSTEM
The software infrastructure of II
Grid Canada Testbed using HEP applications
RAID RAID Mukesh N Tekwani
Distributed P2P File System
ExaO: Software Defined Data Distribution for Exascale Sciences
Ch 4. The Evolution of Analytic Scalability
CLUSTER COMPUTING.
Initial job submission and monitoring efforts with JClarens
Chapter 2: Operating-System Structures
Wide Area Workload Management Work Package DATAGRID project
RAID RAID Mukesh N Tekwani April 23, 2019
Database System Architectures
Chapter 2: Operating-System Structures
Exploring Multi-Core on
Presentation transcript:

Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012

The consumer digital infrastructure Consumer ● 1.5 billion PCs – GPU-equipped – paid for by owner ● 5 billion mobile devices ● Commodity Internet Organizational ● 2 million cluster/cloud nodes (no GPUs) ● supercomputers ● Research networks

Volunteer computing home PC BOINC client scheduler project data server scheduler data server get jobs download data, executables comput e upload outputs ● Clients can be attached to multiple projects

Volunteer computing status ● 700,000 active PCs – 50% with usable GPUs ● 12 PetaFLOPS actual throughput ● Projects – – IBM World Community Grid – – others

Data-intensive computing ● Examples – LHC experiments – Square Kilometer Array – Genetic analysis – Storage of simulation results ● Performance issues – network – storage space on clients

Networking landscape Commodity Internet PCs ~10 Mbps Institution s 1-10 Gbps Researc h networks (free) $$ $ ● Most PCs are behind NATs/firewalls – only use outgoing HTTP

Disk space on clients ● Current – average 50 GB available per client – 35 PetaBytes total ● Trends – disk sizes increasing exponentially, faster than processors – 1 TB * 1M clients = 1 Exabyte

Properties of clients ● Availability – hosts may be turned off – hosts may be unavailable by user preference ● time of day ● PC is busy or not busy ● Churn – The active life of hosts follows an exponential distribution with mean ~100 days ● Heterogeneity – wide range of hardware/software properties

BOINC storage architecture Data archival Application s Locality scheduling Dataset storage BOINC storage infrastructure

BOINC storage infrastructure: managing client space ● Volunteer preference: keep at least X% free ● This determines BOINC’s total allocation ● Allocation among projects is based on volunteer- specified “resource share” Non-BOINCfreeBOINC

BOINC storage infrastructure: file management home PC BOINC client scheduler project scheduler application- specific logic Project disk usage Project disk share List of sticky files Files to delete Files to upload Files to download ● “Sticky file” mechanism

Storage applications ● Temporary storage of simulation results ● Dataset storage ● Locality scheduling ● Data archival

Temporary storage of simulation results ● Many simulations (HEP, molecular, climate, cosmic) produce small “final state” file, large “trajectory” file. ● Depending on contents of final state file, scientist may want to examine the trajectory file ● Implementation – make trajectory files sticky, non-uploaded – interface for uploading trajectory files – Interface for retiring trajectory files

Dataset storage ● Goal: – submit queries against a dataset cached on clients (main copy is on server) – Minimize turnaround time for queries ● Scheduling policies – whether to use a given host – how much data to store on a given host ● should be proportional to its processing speed – update these decisions as hosts come and go

Locality scheduling ● Have a large dataset ● Each file in the dataset is input for a large number of jobs ● Goal: process the dataset using the least network traffic ● Example: analysis of LIGO gravity- wave detector data

Locality scheduling ● Processing jobs sequentially is pessimal – every file gets sent to every client jobs... PCs

Locality scheduling: ideal ● Each file is downloaded to 1 host ● Problems – Typically need job replication – Widely variable host throughput jobs PCs jobs

Locality scheduling: actual ● New hosts are assigned to slowest time ● Teams are merged when they collide ● Each file is downloaded to ~10 hosts jobs teams jobs

Data archival ● Files originate on server ● Chunks of files are stored on clients ● Files can be reconstructed on server (with high latency) ● Goals: – arbitrarily high reliability (99.999) – support large files

How to achieve reliability? ● Replication – Divide file into N chunks – Store each chunk on M clients – If a client fails ● upload another replica to server ● download to a new client... N M = 2

Problems with replication ● Hard to achieve high reliability C = probability of losing a particular chunk F = probability of losing some chunk F = 1 - (1-C) N 0.36 = 1 – ( ) 1000 ● High space overhead – use Mx space to store an x-byte file

Reed-Solomon Coding ● A way of dividing a file into N+K chunks ● The original file can be reconstructed from any N of these chunks. ● Example: N=40, K=20 – can tolerate simultaneous failure of 20 clients – space overhead is only 50% N = 4K = 2

The problem with coding ● When any chunk fails, need to upload all other chunks to server ● High network load at server ● High transient disk usage at server

Reducing coding overhead ● Only need to upload 1/M of file on failure M

Two-level coding ● Can tolerate K 2 client failures ● Space overhead: 125% M

Two-level coding + replication ● Most recoveries involve only 1 chunk ● Space overhead: 250% M M = 2

Volunteer storage simulator ● Predicts the performance of coding/replication policies ● Inputs: – description of host population – policy, file size ● Outputs: – disk usage at server – upload/download traffic at server – fault tolerance level

Implementation status ● Storage infrastructure: done (in 7.0 client) ● Storage applications: – Data archival: 1-2 months away – Locality scheduling: ● used by but need to reimplement – Others: in design stage

Conclusion ● Volunteer computing can be data-intensive – With 200K clients, could handle the work of all LHC Tier 1 and 2 sites – In 2020, can potentially provide Square Kilometer Array (1 Exabyte/day) with 100X more storage than on-site resources ● Using coding and replication, we can efficiently transform a large # of unreliable storage nodes into a highly reliable storage service

Future work ● How to handle multiple competing storage applications within a project? ● How to grant credit for storage? – how to make it cheat-proof? ● How to integrate peer-to-peer file distribution mechanisms – Bittorrent, Attic