Storage Research Meets The Grid Remzi Arpaci-Dusseau.

Slides:



Advertisements
Similar presentations
Key Metrics for Effective Storage Performance and Capacity Reporting.
Advertisements

Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.
T. E. Potok - University of Tennessee Software Engineering Dr. Thomas E. Potok Adjunct Professor UT Research Staff Member ORNL.
ECE 526 – Network Processing Systems Design
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Resource Management in Data-Intensive Systems Bernie Acs, Magda Balazinska, John Ford, Karthik Kambatla, Alex Labrinidis, Carlos Maltzahn, Rami Melhem,
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Knowledge is Power Remzi Arpaci-Dusseau University of Wisconsin, Madison.
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Introduction Journal Analysis and Optimization Journaling Uses and Benefits Understanding Costs and Implications Ongoing Management and Administration.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Exploiting Gray-Box Knowledge of Buffer Cache Management Nathan C. Burnett, John Bent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of.
Deconstructing Storage Arrays Timothy E. Denehy, John Bent, Florentina I. Popovici, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin,
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
John Bent Computer Sciences Department University of Wisconsin-Madison Explicit Control in a Batch-aware.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Bridging the Information Gap in Storage Protocol Stacks Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin,
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)
WSRR 111 Coerced Cache Eviction and Discreet Mode Journaling: Dealing with Misbehaving Disks Abhishek Rajimwale, Vijay Chidambaram, Deepak Ramamurthi Andrea.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
OPERATING SYSTEMS CS 3502 Fall 2017
RAID Redundant Arrays of Independent Disks
Applying Control Theory to Stream Processing Systems
Migratory File Services for Batch-Pipelined Workloads
Introduction.
STORK: A Scheduler for Data Placement Activities in Grid
Introduction to Operating Systems
Virtual Memory: Working Sets
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
File System Performance
Presentation transcript:

Storage Research Meets The Grid Remzi Arpaci-Dusseau

ADSL Where Gray-box techniques meet storage systems Storage Gray Box

The Who, How, and What of ADSL Who: Andrea and Remzi Arpaci-Dusseau And of course a bunch of students How: Gray-box Techniques Assume system is a “gray box” Leverage knowledge of its implementation to: Gain more information Control its behavior What: Storage Systems Smarter disks and RAIDs

Semantically-smart Disks Problem: Most disks don’t know much Block-based SCSI interface limits knowledge And what a waste of potential! Modern RAIDs have substantial processing, memory A semantically-smart disk system Figure out how file system is using it Exploits that to build new functionality into storage

Trend that Drives This Session: Data Demands on the Rise Focus of original batch queueing systems: CPU “cycle stealing” Compute clusters Distributed supercomputer But data demands of jobs are on the rise… Input, output, temp files and checkpoints Modern science is increasingly data centric

Focus of this talk: Traditional storage vs. Grid storage Most aspects of modern storage systems are designed with certain domain in mind Local area environment, presence of admin, etc. Grid changes almost every assumption Wide area, no admin, etc. Conclusion: Must reexamine how to build storage systems from the ground up

Outline Introduction Traditional vs. Grid Storage Data reliability Management Caching and Overlap Evaluation Conclusions

Data Reliability: Traditional All data treated equally, and is sacred Most users tolerate some amount of data loss (30 second delay before flush to disk) Losing one byte after flush is catastrophic Strong implications for design: Backup + disaster recovery

Data Reliability: Grid Different types of I/O, treat accordingly Einstein’s Matter-Energy equivalence: E=MC^2 Grid analogy: Data-Computation equivalence E(M) = C Knowledge is key: If you can refetch M, you can recompute C

Management: Traditional Storage administrators control system Performance tuning Problem fixing User handling Human intelligence can be applied to make things run smoothly

Management: Grid No administrator to help out Though may have to live within administrative limitations System must automatically handle problems Tune to environment Deal with failures Give reasonable feedback to users upon errors and other problem scenarios

Buffering and Overlap: Traditional Used throughout systems for performance Important cache: Client-side NFS: Memory AFS: Disk (and memory) Caches are managed transparently Overlap: Disk->memory, across network, also transparent Result: Operations can run as if they are local ClientServer $$

Buffering and Overlap: Grid Used throughout for performance, reliability Many more levels of cache Not just clients/servers Caches managed both transparently and not transparently Overlap is more complex too (multiple users, resources) Have to deal with more issues: failure, cost differentials $$$ $ Home Site WAN

Evaluation: Traditional Traditional storage metrics: Myopic focus May miss “big picture” One example: Availability Defined as “uptime” of system What’s good: “5 9s” of availability (up %) Implications: Systems are engineered for enterprise use (and thus over-engineered for many uses)

Evaluation: Grid Grid metrics can focus on what’s important for Grid jobs: Job throughput Instead of availability, measure impact of failure on the aspect of system that matters most Result: An end-to-end perspective to evaluate merit of new approaches in the Grid space

Summary Grid changes storage systems Makes some things harder (caching, overlap, failures) Makes other things easier (better understanding of workload and metrics) How to make it all work? Exploit knowledge: of workloads and systems to reduce difficult problems to tractable ones

The Data-centric Lineup Lots of exciting work going at Wisconsin in this space! First session: John Bent - “Batch-pipelined Workloads” Doug Thain - “Migratory File Services” Second Session Joseph Stanley - “NeST” Tevfik Kosar - “Stork” George Kola - “Disk Router” Guest speaker: Arie Shoshani - “Coscheduling Storage and CPUs”