Predictable Computer Systems Remzi Arpaci-Dusseau University of Wisconsin, Madison.

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
The Case for Drill-Ready Cloud Computing Vision Paper Tanakorn Leesatapornwongsa and Haryadi S. Gunawi 1.
Availability in Globally Distributed Storage Systems
Tolerating File-System Mistakes with EnvyFS Lakshmi N. Bairavasundaram NetApp, Inc. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau.
Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift.
1 EIO: Error-handling is Occasionally Correct Haryadi S. Gunawi, Cindy Rubio-González, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Ben Liblit University.
Making Services Fault Tolerant
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Parity Lost and Parity Regained Andrew Krioukov, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin.
Reliability on Web Services Pat Chan 31 Oct 2006.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Upstream Prerequisites
Swaminathan Sundararaman, Yupu Zhang, Sriram Subramanian, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau.
COMP25212 ARRAY OF DISKS Sergio Davies Feb/Mar 2014COMP25212 – Storage 2.
PMIT-6102 Advanced Database Systems
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
Objectives Understand the basic concepts and definitions relating to testing, like error, fault, failure, test case, test suite, test harness. Explore.
Deconstructing Commodity Storage Clusters Haryadi S. Gunawi, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Univ. of Wisconsin - Madison.
Division of IT Convergence Engineering Towards Unified Management A Common Approach for Telecommunication and Enterprise Usage Sung-Su Kim, Jae Yoon Chung,
Knowledge is Power Remzi Arpaci-Dusseau University of Wisconsin, Madison.
Source: George Colouris, Jean Dollimore, Tim Kinderberg & Gordon Blair (2012). Distributed Systems: Concepts & Design (5 th Ed.). Essex: Addison-Wesley.
X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs Lakshmi N. Bairavasundaram Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau.
Journal-guided Resynchronization for Software RAID
Sunday, October 15, 2000 JINI Pattern Language Workshop ACM OOPSLA 2000 Minneapolis, MN, USA Fault Tolerant CORBA Extensions for JINI Pattern Language.
Cloud Testing Haryadi Gunawi Towards thousands of failures and hundreds of specifications.
Evolving RPC for Active Storage Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin-Madison.
Exploiting Gray-Box Knowledge of Buffer Cache Management Nathan C. Burnett, John Bent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Deconstructing Storage Arrays Timothy E. Denehy, John Bent, Florentina I. Popovici, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin,
Chapter One (Database System) Objectives Introduction to Database Management Systems (DBMS) Data and Information History of DB Types of DB.
IRON File Systems Remzi Arpaci-Dusseau University of Wisconsin, Madison.
Semantically-Smart Disk Systems Muthian Sivathanu, Vijayan Prabhakaran, Florentina Popovici, Tim Denehy, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University.
Storage Research Meets The Grid Remzi Arpaci-Dusseau.
1 SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin.
IRON for JFFS2 Presented by: Abhinav Kumar Raja Ram Yadhav Ramakrishnan.
EXT2C: Increasing Disk Reliability Brian Pellin, Chloe Schulze CS736 Presentation May 3 th, 2005.
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
Simulation Remzi Arpaci-Dusseau, Peter Druschel, Vivek Pai, Karsten Schwan, David Clark, John Jannotti, Liuba Shrira, Mike Dahlin, Miguel Castro, Barbara.
Transforming Policies into Mechanisms with Infokernel Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Embedded System Lab. 최 진 화최 진 화 Kilmo Choi 최길모 A Study of Linux File System Evolution L. Lu, A. C. Arpaci-Dusseau, R. H. ArpaciDusseau,
COP 5611 Operating Systems Spring 2010 Dan C. Marinescu Office: HEC 439 B Office hours: M-Wd 1:00-2:00 PM.
Analysis and Evolution of Journaling File Systems By: Vijayan Prabhakaran, Andrea and Remzi Arpai-Dusseau Presented by: Andrew Quinn EECS 582 – W161.
Bridging the Information Gap in Storage Protocol Stacks Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin,
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Seminar On Rain Technology
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi.
Reaching for k Nines Miroslaw Malek Humboldt University Berlin, Germany
Fail-stutter Behavior Characterization of NFS
Embracing Failure: A Case for Recovery-Oriented Computing
Lessons from The File Copy Assignment
Migratory File Services for Batch-Pipelined Workloads
COP 5611 Operating Systems Fall 2011
Fault Tolerance Distributed Web-based Systems
EEC 688/788 Secure and Dependable Computing
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
COP 5611 Operating Systems Spring 2010
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Presentation Title Global-scale systems that know when they are behaving badly NSF workshop on grand challenges in distributed systems Jeff Mogul, HP.
Web Application Architectures
EEC 688/788 Secure and Dependable Computing
Towards Unified Management
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Abstractions for Fault Tolerance
Improving performance
Presentation transcript:

Predictable Computer Systems Remzi Arpaci-Dusseau University of Wisconsin, Madison

Trends

Complexity Cheap Components Everything Interconnected

Problems

Nothing Works As Expected Performance Fault-Tolerance Security

What Would Be Ideal

Ideal Assemble large-scale system from cheap, complex components System works in predictable manner

Key: How Components Interact

State of the Art APIs Protocols

Beyond APIs and Protocols: Understanding “Behavior”

A Small Example: Understanding the Failure Behavior of Local File Systems

Understanding FS Failure Type-aware fault injection Make fault injection layer aware of FS structures e.g., make an inode block fail Why useful Can infer how file system reacts to failures at different points in its code

Write Errors: Recovery Techniques Ext3, JFS don’t react to write failures ReiserFS (almost) always calls panic() Zero Stop Propagate Retry Redundancy Recovery Ext3ReiserFSJFS

What We Need

Vocabulary + Techniques + Tools Methods to = Understand Behavior Predictable -> Computer Systems

CSI: Computer Systems Investigation

ADvanced Systems Lab (ADSL) Gray-box Operating Systems and Storage Systems Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau

ADvanced Systems Lab (ADSL) Who does the real work: Nitin Agrawal Lakshmi Bairavasundaram John Bent Nathan Burnett Tim Denehy Camille Fournier Haryadi Gunawi Todd Jones James Nugent Ina Popovici Vijayan Prabhakaran Muthian Sivathanu Who does the real work: Nitin Agrawal Lakshmi Bairavasundaram John Bent Nathan Burnett Tim Denehy Camille Fournier Haryadi Gunawi Todd Jones James Nugent Ina Popovici Vijayan Prabhakaran Muthian Sivathanu

Goal: Building Distributed Systems

Large-Scale Distributed Systems D D W W W Front Ends C C C DBMS Net Online StorageArchival Storage Internet Clients

Ideal: Legos Top Side What You See Is What You Get