The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A CASE FOR REDUNDANT ARRAYS OF INEXPENSIVE DISKS (RAID) D. A. Patterson, G. A. Gibson, R. H. Katz University of California, Berkeley.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
CSCE430/830 Computer Architecture
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
2. Computer Clusters for Scalable Parallel Computing
The Zebra Striped Network File System Presentation by Joseph Thompson.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson,
OpenFlow-Based Server Load Balancing GoneWild
1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.
High Performance Computing Course Notes High Performance Storage.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on.
ECE 526 – Network Processing Systems Design
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Router Architectures An overview of router architectures.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Router Architectures An overview of router architectures.
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
PETAL: DISTRIBUTED VIRTUAL DISKS E. K. Lee C. A. Thekkath DEC SRC.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Redundant Array of Independent Disks
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
TPT-RAID: A High Performance Multi-Box Storage System
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
CHAPTER 11: Modern Computer Systems
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Module – 4 Intelligent storage system
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
System bus.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Copyright © Curt Hill, RAID What every server wants!
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Serverless Network File Systems Overview by Joseph Thompson.
CS 4396 Computer Networks Lab Router Architectures.
The concept of RAID in Databases By Junaid Ali Siddiqui.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
1 © 2002 hp Introduction to EVA Keith Parris Systems/Software Engineer HP Services Multivendor Systems Engineering Budapest, Hungary 23May 2003 Presentation.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 27 – Media Server (Part 2) Klara Nahrstedt Spring 2009.
Part IV I/O System Chapter 12: Mass Storage Structure.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Multiple Platters.
Disks and RAID.
Vladimir Stojanovic & Nicholas Weaver
The University of Adelaide, School of Computer Science
Parallel Data Laboratory, Carnegie Mellon University
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM
CLUSTER COMPUTING.
UNIT IV RAID.
The University of Adelaide, School of Computer Science
RAID RAID Mukesh N Tekwani April 23, 2019
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Improving performance
Presentation transcript:

The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs

RAID Architectures Traditional RAID architectures have –A central RAID controller interfacing to the host and processing all I/O requests –Disk drives organized in strings –One disk controller per disk string (mostly SCSI)

Limitations Capabilities of RAID controller are crucial to the performance of RAID –Can become memory-bound –Presents a single point of failure –Can become a bottleneck Having a spare controller is an expensive proposition

Our Solution Have a cooperating set of array controller nodes Major benefits are: – Fault-tolerance – Scalability – Smooth incremental growth – Flexibility: can mix and match components

TickerTAIP Host interconnects Controller nodes

TickerTAIP ( I) A TickerTAIP array consists of: Worker nodes connected with one or more local disks through a bus Originator nodes interfacing with host computer clients A high-performance small area network : –Mesh based switching network ( Datamesh ) –PCI backplanes for small networks

TickerTAIP ( II) Can combine or separate worker and originator nodes Parity calculations are done in decentralized fashion: –Bottleneck is memory bandwidth not CPU speed –Cheaper than having faster paths to a dedicated parity engine

Design Issues (I) Normal-mode reads are trivial to implement Normal mode writes : – three ways to calculate the new parity: full stripe: calculate parity from new data small stripe: requires at least four I/Os large stripe: if we rewrite more than half a stripe, we compute the parity by reading the unmodified data blocks

Design Issues (II) Parity can be calculated: –At originator node – Solely parity: at the parity node for the stripe Must ship all involved blocks to party node – At parity: same as solely parity but partial results for small stripe writes are computed at worker node and shipped to parity node Occasions less traffic than solely parity

Handling single failures (I) TickerTAIP must provide request atomicity Disk failures are treated as in standard RAID Worker failures: –Treated like disk failures –Detected by time-outs (assuming fail-silent nodes ) –A distributed consensus algorithm reaches consensus among remaining nodes

Handling single failures (II) Originator failures: –Worst case is failure of a originator/worker node during a write –TickerTAIP uses a two-phase commit protocol : –Two options: Late commit Early commit

Late commit/Early commit Late commit only commits after parity has been computed –Only the writes must be performed Early commit commits as soon as new data and old data have been replicated –Somewhat faster –Harder to implement

Handling multiple failures Power failures during writes can corrupt stripe being written: – Use UPS to eliminate them Must guarantee that some specific requests will always be executed in a given order: – Cannot write data blocks before updating the i-nodes containing block addresses –Uses request sequencing to achieve partial write ordering

Request sequencing (I) Each request –Is given a unique identifier –Can specify one or more requests on whose previous completion it depends ( explicit dependencies) TickerTAIP adds enough implicit dependencies to prevent concurrent execution of overlapping requests

Request sequencing (II) Sequencing is performed by a centralized sequencer – Several distributed solutions were considered but not selected because of the complexity of the recovery protocols they would require

Disk Scheduling Considered – First come first served (FCFS): implemented in the working prototype – Shortest seek time first (SSTF): – Shortest access time first (SATF): Considers both seek time and rotation time – Batched nearest neighbor (BNN): Runs SATF on all reuests in queue Not discussed in class in Fall 2005

Evaluation (I) Based upon – Working prototype Used seven relatively slow Parsytec cards each with its own disk drive – Event-driven simulator was used to test other configurations: Results were always within 6% of prototype measurements

Evaluation (II) Read performance: – 1MB/s links are enough unless the request sizes exceed 1MB

Evaluation (III) Write performance : – Large stripe policy always results in a slight improvement – At-parity significantly better than at-originator especially for link speeds below 10MB/s – Late commit protocol reduces throughput by at most 2% but can increase response time by up to 20% –E arly commit protoco l is not much better TickerTAIP always outperforms a comparable centralized RAID architecture best disk scheduling policy is Batched Nearest Neighbor

Evaluation (IV) TickerTAIP always outperforms a comparable centralized RAID architecture Best disk scheduling policy is Batched Nearest Neighbor (BNN)

Conclusion Can use physical redundancy to eliminate single points of failure Can use eleven 5 MIPS processors instead of single 50 MIPS Can use off-the-shelf processors for parity computations Disk drives remain the bottleneck for small request sizes