THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA.

Slides:



Advertisements
Similar presentations
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
Advertisements

The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan “virtualized disk gets smart…”
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
The TickerTAIP Parallel RAID Architecture P. Cao, S. B. Lim S. Venkatraman, J. Wilkes HP Labs.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories Presented by Sri.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson,
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
Cse Feb-001 CSE 451 Section February 24, 2000 Project 3 – VM.
Device Management.
Computer Organization and Architecture
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
RAID Ref: Stallings. Introduction The rate in improvement in secondary storage performance has been considerably less than the rate for processors and.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Redundant Array of Independent Disks
Toolbox for Dimensioning Windows Storage Systems Jalil Boukhobza, Claude Timsit 12/09/2006 Versailles Saint Quentin University.
Computing Hardware Starter.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
TRACK-ALIGNED EXTENTS: MATCHING ACCESS PATTERNS TO DISK DRIVE CHARACTERISTICS J. Schindler J.-L.Griffin C. R. Lumb G. R. Ganger Carnegie Mellon University.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
1 I/O Management and Disk Scheduling Chapter Categories of I/O Devices Human readable Used to communicate with the user Printers Video display terminals.
RAPID-Cache – A Reliable and Inexpensive Write Cache for Disk I/O Systems Yiming Hu Qing Yang Tycho Nightingale.
Lecture 19: Virtual Memory
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
4.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 4: Organizing a Disk for Data.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Hewlett-Packard Laboratories.
1Fall 2008, Chapter 12 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
Properties of Layouts Single failure correcting: no two units of same stripe are mapped to same disk –Enables recovery from single disk crash Distributed.
The HP AutoRAID Hierarchical Storage System John Wilkes, Richard Golding, Carl Staelin, and Tim Sullivan Presented by Arthur Strutzenberg.
EECS 262a Advanced Topics in Computer Systems Lecture 3 Filesystems (Con’t) September 10 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
The concept of RAID in Databases By Junaid Ali Siddiqui.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
1 © 2002 hp Introduction to EVA Keith Parris Systems/Software Engineer HP Services Multivendor Systems Engineering Budapest, Hungary 23May 2003 Presentation.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Lecture 3 Secondary Storage and System Software I
HP AutoRAID (Lecture 5, cs262a)
Memory Management.
Memory COMPUTER ARCHITECTURE
Operating System I/O System Monday, August 11, 2008.
CSI 400/500 Operating Systems Spring 2009
HP AutoRAID (Lecture 5, cs262a)
Computer-System Architecture
THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM
Overview Continuation from Monday (File system implementation)
TECHNICAL SEMINAR PRESENTATION
UNIT IV RAID.
John Kubiatowicz Electrical Engineering and Computer Sciences
Contents Memory types & memory hierarchy Virtual memory (VM)
John Kubiatowicz Electrical Engineering and Computer Sciences
Presentation transcript:

THE HP AUTORAID HIERARCHICAL STORAGE SYSTEM J. Wilkes, R. Golding, C. Staelin T. Sullivan HP Laboratories, Palo Alto, CA

INTRODUCTION must protect data against disk failures: too frequent and too hard to repair possible solutions: for small numbers of disks: mirroring for larger number of disks: RAID

RAID Typical RAID Organizations Level 3: bit or byte level interleaved with dedicated parity disk Level 5: block interleaved with parity blocks stored on all disks

LIMITATIONS OF RAID (I) Each RAID level performs well for a narrow range of workloads Too many parameters to configure: data- and parity-layout, stripe depth, stripe width, cache sizes, write-back policies,...

LIMITATIONS OF RAID (II) Changing from one layout to another or adding capacity requires downloading and reloading the data Spare disks remain unused until a failure occurs

A BETTER SOLUTION A managed storage hierarchy: mirror active data store in a RAID 5 less active data This requires locality of reference: active subset must be rather stable: found to be true in several studies

IMPLEMENTATION LEVEL Storage hierarchy could be implemented Manually: can use the most knowledge but cannot adapt quickly In the file system: offers best balance of knowledge and implementation freedom but specific to a particular file system Through a smart array controller: easiest to deploy (HP AutoRAID)

MAJOR FEATURES (I) Mapping of host block addresses to physical disk locations Mirroring of write-active data Adaptation to changes in the amount of data stored: Starts RAID 5 when array becomes full Adaptation to workload changes: Hot-pluggable disks, fans, power supplies and controllers

MAJOR FEATURES (II) On-line storage capacity expansion: system switches then to mirroring Can mix or match disk capacities Controlled fail-over: can have dual controllers (primary/standby) Active hot spares: used for more mirroring Simple administration and setup: appears to host as one or more logical units Log-structured RAID 5 writes

RELATED WORK (I) Storage Technology Corporation Iceberg: also uses redirection but based on RAID 6 handles variable size records emphasis on very high reliability

RELATED WORK (II) Floating parity scheme from IBM Almaden: Relocated parity blocks and uses distributed sparing Work on log-structured file systems at U.C. Berkeley and cleaning policies

RELATED WORK (III) Whole literature on hierarchical storage systems Schemes compressing inactive data Use of non-volatile memory (NVRAM) for optimizing writes Allows reliable delayed writes

OVERVIEW Control 2x10MB/s bus Parity Logic Processor, RAM and Control Logic Matching RAM SCSI Controller 20 MB/s Host Computer DRAM Read Cache Other RAM NVRAM Write Cache

PHYSICAL DATA LAYOUT Data space on disks is broken up into large Physical EXTents (PEXes): Typical size is 1 MB PEXes can be combined to form Physical Extent Groups (PEGs) containing at least three PEXes on three different disks PEGs can be assigned to the mirrored storage class or to the RAID 5 storage class Segments are the units on contiguous space on a disk (128 KB in prototype)

LOGICAL DATA LAYOUT Logical allocation and migration unit is the Relocation Block (RB) Size in prototype was 64 KB: Smaller RB’s require more mapping information but larger RB’s increase migration costs after small updates Each PEG holds a fixed number of RB’s

MAPPING STRUCTURES Map addresses from virtual volumes to PEGs, PEXes and physical disk addresses Optimized for finding fast the physical address of a RB given its logical address : Each logical unit has a virtual device table listing all RB’s in the logical unit and pointing to their PEG Each PEG has a PEG Table listing all RB’s in the PEG and the PEXes used to store them

NORMAL OPERATIONS (I) Requests are sent to the controller in SCSI Command Descriptor Blocks (CDB): Up to 32 CB’s can be simultaneously active and 2048 other ones queued Long requests are broken into 64 KB segments

NORMAL OPERATIONS (II) Read requests: Test first to see if data are not already in read cache or in non-volatile write cache Otherwise allocate space in cache and issue one or more requests to back-end storage classes Write requests return as soon as data are modified in non-volatile write cache: Cache has a delayed write policy

NORMAL OPERATIONS (III) Flushing data from cache can involve; A back-end write to a mirrored storage class Promotion from RAID 5 to mirrored storage before the write Mirrored reads and writes are straightforward

NORMAL OPERATIONS (IV) RAID 5 reads are straightforward RAID 5 writes can be done: On a per-RB base: requires two reads and two writes In batched writes: more complex but cheaper

BACKGROUND OPERATIONS Triggered when array has been idle for some time Include Compaction of empty RB slots, Migration between storage classes (using an approximate LRU algorithm) and Load balancing between disks

MONITORING System also includes: An I/O logging tool and A management tool for analyzing the array performance

PERFORMANCE RESULTS (I) HP AutoRAID configuration with: 16 MB of controller data cache Twelve 2.0GB Seagate Barracuda disks (7200rpm) Compared with: Data General RAID array with 64 MB front-end cache Eleven individual disk drives implementing disk striping but without any redundancy

PERFORMANCE RESULTS (II) Results of OLTP database workload: AutoRAID was better than RAID array and comparable to set of non-redundant drives But whole database was stored in mirrored storage! Micro benchmarks: AutoRAID is always better than RAID array but has smaller I/O rates than set of drives

SIMULATION RESULTS (I) Increasing the disk speed improves the throughput: Especially if density remains constant Transfer rates matter more than rotational latency 64KB seems to be a good size for the Relocation Blocks: Around the size of a disk track

SIMULATION RESULTS (II) Best heuristics for selecting the mirrored copy to be read is shortest queue Allowing write cache overwrites has a HUGE impact on performance RB’s demoted to RAID should use existing holes when the system is not too loaded

SUMMARY (I) System is very easy to set up: Dynamic adaptation is a big win but it will not work for all workloads Software is what makes AutoRAID, not the hardware Being auto adaptive makes AutoRAID hard to benchmark

SUMMARY (II) Future work includes: System tuning especially Idle period detection Front-end cache management algorithms Developing better techniques for synthesizing traces