On-demand Grid Storage Using Scavenging Sudharshan Vazhkudai Network and Cluster Computing, CSMD Oak Ridge National Laboratory

Slides:

Advertisements

Similar presentations

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.

Advertisements

Distributed Processing, Client/Server and Clusters

Cloud Computing and Grid Computing 360-Degree Compared

Database Architectures and the Web

High Performance Computing Course Notes Grid Computing.

Data Grids Darshan R. Kapadia Gregor von Laszewski

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY Coupling Prefix Caching and Collective Downloads for.

Chapter 9 Designing Systems for Diverse Environments.

Distributed Processing, Client/Server, and Clusters

Positioning Dynamic Storage Caches for Transient Data Sudharshan VazhkudaiOak Ridge National Lab Douglas ThainUniversity of Notre Dame Xiaosong Ma North.

Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.

1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.

Simo Niskala Teemu Pasanen

Grid Computing Net 535.

Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.

Federated, Available, and Reliable Storage for an Incompletely Trusted Environment Atul Adya, Bill Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken,

1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.

1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

1 FreeLoader: borrowing desktop resources for large transient data Vincent Freeh 1 Xiaosong Ma 1,2 Stephen Scott 2 Jonathan Strickland 1 Nandan Tammineedi.

Grid Computing. What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 –“A computational grid is a hardware.

N. GSU Slide 1 Chapter 02 Cloud Computing Systems N. Xiong Georgia State University.

DISTRIBUTED COMPUTING

1 Introduction to Grid Computing. 2 What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 “A computational.

1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong

1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.

Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda

Cloud Computing in NASA Missions Dan Whorton CTO, Stinger Ghaffarian Technologies June 25, 2010 All material in RED will be updated.

Hadoop & Condor Dhruba Borthakur Project Lead, Hadoop Distributed File System Presented at the The Israeli Association of Grid Technologies.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.

Copyright © 2002 Intel Corporation. Intel Labs Towards Balanced Computing Weaving Peer-to-Peer Technologies into the Fabric of Computing over the Net Presented.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

1 FreeLoader: Lightweight Data Management for Scientific Visualization Vincent Freeh 1 Xiaosong Ma 1,2 Nandan Tammineedi 1 Jonathan Strickland 1 Sudharshan.

DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.

GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.

OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY FreeLoader: Scavenging Desktop Storage Resources for.

Authors: Ronnie Julio Cole David

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

VMware vSphere Configuration and Management v6

Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.

1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,

Introduction to The Storage Resource.

7. Grid Computing Systems and Resource Management

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

International Conference on Autonomic Computing Governor: Autonomic Throttling for Aggressive Idle Resource Scavenging Jonathan Strickland (1) Vincent.

GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Tackling I/O Issues 1 David Race 16 March 2010.

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.

Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.

Grid Computing.

GRID COMPUTING PRESENTED BY : Richa Chaudhary.

Introduction to Cloud Computing

Optimizing End-User Data Delivery Using Storage Virtualization

CSE 451: Operating Systems Spring 2005 Module 20 Distributed Systems

CSE 451: Operating Systems Winter 2004 Module 19 Distributed Systems

CSE 451: Operating Systems Winter 2007 Module 21 Distributed Systems

Presentation transcript:

On-demand Grid Storage Using Scavenging Sudharshan Vazhkudai Network and Cluster Computing, CSMD Oak Ridge National Laboratory PDPTA June 21 th, 2004 Acknowledgments: ORNL Collaborators: Dr. Xiaosong Ma and Dr. Vincent Freeh (NCSU)

Outline Grid Storage Fabric Background The Evolving Computing Landscape—An Analogy Storage Scavenging of User Desktop Workstations Use Cases Related Work and Design Choices Architecture – Storage Layer – Management Layer – Current Status

Grid Storage Fabric Background Scientific discoveries driven by analyses of massively distributed, bulk data. – Proliferation of high-end mass storage systems, SANs and datacenters – Providers such as IBM, HP, Panasas, etc. Merits: – Excellent price/performance ratio – Good storage speeds and access control – Support intelligent parallel file systems – Optimized for wide-area, bulk transfers – Reliability! – Successful demonstration in production Grids: DOE Science Grid, Earth System Grid, TeraGrid, etc. Drawbacks: – Increasing deployment/maintenance/administrative costs – Specialized software and central points of failure – Costs and specialized features prohibit wider acceptability and limits to select few research labs & organizations – Aforementioned production Grids are hardly half-a-dozen sites…!! Meta-Message: If grids are to become prevalent and grow beyond the confines of a few organizations, exploiting commodity fabric features is absolutely essential!

The Evolving HPC Landscape Computing fabric for the Grid: Storage Fabric…?? Meta-Message: Proprietary systems are being replaced with commodity clusters, delivering new levels of performance and availability at dramatically affordable price point. Tightly Coupled Loosely Coupled Time flies… Supercomputers Beowulf Style Aggregating idle CPU cycles from Commodity PCs Loosely Coupled Tightly Coupled Time flies again… Datacenters RAID-like aggregation Aggregating idle storage space from Commodity PCs Volatility Trust Performance

Storage Scavenging of User Desktop Workstations Harnessing collective storage potential of individual workstations ~ Harnessing idle CPU cycles Why Storage Scavenging can be viable? – Economics of buying gigabytes of storage is increasingly affordable – Space usage to Available storage ratio is significantly low – Increasing numbers of workstations are online most of the time – Even a modest contribution (Contribution << Available) can amass collective, staggering aggregate storage! Concerns: – Vagaries of volatility… – Question of Trust: datasets on arbitrary user workstations – Performance of such aggregate storage Meta-Message: Despite the high maintenance and administrative costs, a factor that attracts the Grid community to high-end storage and data centers is their ability to deliver sustained high-throughput for data operations.

Use Cases Storage cloud as a: – Cache – Intermediate hop – Local, client-side scratch – Grid replica – RAS for Terascale Supercomputers

Related Work and Design Choices Related Work: – Network/Distributed File Systems (NFS, LOCUS) – Parallel File Systems (PVFS, XFS) – Serverless File Systems (FARSITE, xFS, GFS) – Peer-to-Peer Storage (OceanStore, PAST, CFS) – Grid Storage Services (LegionFS, SRB, IBP, SRM, GASS) Design Choices & Assumptions: – Scalability: O(100) or O(1000) – Commodity Components: Quality & Quantity – User Autonomy – Well connected & Secure – Heterogeneity – Large, “write once read many” datasets – Transparent – Grid Aware

Architecture Pool n Morsel Access, Data Integrity, NonInvasiveness Management Layer Data Placement, Replication, Grid Awareness, Metadata Management Management Layer Data Placement, Replication, Grid Awareness, Metadata Management Pool A Registration Storage Layer Pool m Registration Grid Data Access Tools Meta-Message: Imagine “Condor” for Storage.

Storage Layer Benefactors: – Morsels as a unit of contribution – Basic morsel operations as RPC services [new(), free(), get(), put()…] – Space Reclaim: User withdrawal Which morsels to relocate/evict? Which benefactor workstations to relocate to? – Data Integrity through checksums – Performance Traces Pools: – Benefactor registrations (soft state) – Dataset distributions – Metadata – Selection heuristics File 1: 1 23 File n: 1a 2a 3a 4a 2a1a 21 4a3a 23 2a1a 3a1

Management Layer Manager: – Pool registrations – Metadata: datasets-to-pools; pools-to-benefactors, etc. – Availability: Redundant Array of Replicated Morsels Minimum replication factor for morsels Where to replicate? Which morsel replica to choose from in response to user file fetches? – Grid Awareness: Information Providers Space reservations Transfer protocol agnostic – Transparent Access: Namespace

Current Status Application Proxy Manager Benefactor OS Benefactor OS ftp/GridFTP rpc (A) rpc (C) rpc (B) rpc (D) reserve(): cancel() store() : open(); benefactorID.put() retrieve(): open(); benefactorID.get() delete() new() free() get() put() rpc (A) services: – Create/delete files – Reserve…. rpc (B) services: – File fetches – Hints… rpc (C) services: – Control – Dataset distributions – Benefactor alerts, warnings, alarms to manager – ………………… rpc (D) services: – Morsel relocations – Status info – Load balancing – ………………… rpc (E) services: – Morsel relocations to different pools – Under direction of manager – …………………

Philosophical Musings... It’s all about commoditizing… – Quality – Trust – Performance What the scavenged storage “is not”: – Not a replacement to high-end storage What it “is”: – Low cost, fault-tolerant alternative to be used in conjunction with high-end storage

Further Information My Website: –