Using GPUs to Enable Highly Reliable Embedded Storage University of Alabama at Birmingham 115A Campbell Hall 1300 University Blvd. Birmingham, AL 35294-1170.

Slides:



Advertisements
Similar presentations
A Case for Redundant Arrays Of Inexpensive Disks Paper By David A Patterson Garth Gibson Randy H Katz University of California Berkeley.
Advertisements

RAID Redundant Array of Inexpensive Disks Presented by Greg Briggs.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID: Redundant Array of Inexpensive Disks Supplemental Material not in book.
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
CSCE430/830 Computer Architecture
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Elements of a Microprocessor system Central processing unit. This performs the arithmetic and logical operations, such as add/subtract, multiply/divide,
Efficacy of GPUs in RAID Parity Calculation 8/8/2007 Matthew Curry and Lee Ward Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Facebook f4 Steve Ko Computer Sciences and Engineering University at Buffalo.
An Overview of RAID Chris Erickson Graduate Student Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.
Computer ArchitectureFall 2007 © November 28, 2007 Karem A. Sakallah Lecture 24 Disk IO and RAID CS : Computer Architecture.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
GPGPU platforms GP - General Purpose computation using GPU
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
1 Solid State Storage (SSS) System Error Recovery LHO 08 For NASA Langley Research Center.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
RAID REDUNDANT ARRAY OF INEXPENSIVE DISKS. Why RAID?
1 Chapter 04 Authors: John Hennessy & David Patterson.
HPEC-1 MIT Lincoln Laboratory Session 1: GPU: Graphics Processing Units Miriam Leeser / Northeastern University HPEC Conference 15 September 2010.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
The concept of RAID in Databases By Junaid Ali Siddiqui.
RAID Disk Arrays Hank Levy. 212/5/2015 Basic Problems Disks are improving, but much less fast than CPUs We can use multiple disks for improving performance.
Computer Organization Instruction Set Architecture (ISA) Instruction Set Architecture (ISA), or simply Architecture, of a computer is the.
Tackling I/O Issues 1 David Race 16 March 2010.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Rethinking RAID for SSD based HPC Systems Yugendra R. Guvvala, Yong Chen, and Yu Zhuang Department of Computer Science, Texas Tech University, Lubbock,
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
Network-Attached Storage. Network-attached storage devices Attached to a local area network, generally an Ethernet-based network environment.
Distributed Processors
Steve Ko Computer Sciences and Engineering University at Buffalo
Steve Ko Computer Sciences and Engineering University at Buffalo
Vladimir Stojanovic & Nicholas Weaver
Architecture & Organization 1
RAID Disk Arrays Hank Levy 1.
ICOM 6005 – Database Management Systems Design
Architecture & Organization 1
(Architectural Support for) Semantically-Smart Disk Systems
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Spring 2005 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Autumn 2004 Redundant Arrays of Inexpensive Disks (RAID) Hank Levy 1.
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Computer Evolution and Performance
CSE 451: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Autumn 2009 Module 19 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
RAID Disk Arrays Hank Levy 1.
CSE 451: Operating Systems Winter 2004 Module 17 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
CSE 451: Operating Systems Winter 2006 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Ed Lazowska Allen Center 570.
Presentation transcript:

Using GPUs to Enable Highly Reliable Embedded Storage University of Alabama at Birmingham 115A Campbell Hall 1300 University Blvd. Birmingham, AL Computer Science Research Institute Sandia National Laboratory PO Box 5800 Albuquerque, NM Matthew Curry Lee Ward Anthony Skjellum Ron Brightwell High Performance Embedded Computing (HPEC) Workshop September 2008 Approved for public release; distribution is unlimited.

The Storage Reliability Problem Embedded environments are subject to harsh conditions where normal failure estimates may not apply Since many embedded systems are purposed for data collection, data integrity is of high priority Embedded systems often must contain as little hardware as possible (e.g. space applications)

Current Methods of Increasing Reliability RAID –RAID 1: Mirroring (Two-disk configuration) –RAID 5: Single Parity –RAID 6: Dual Parity Nested RAID –RAID 1+0: Stripe over multiple RAID 1 sets –RAID 5+0: Stripe over multiple RAID 5 sets –RAID 6+0: Stripe over multiple RAID 6 sets

Current Methods of Increasing Reliability RAID MTTDL (Mean Time to Data Loss) –RAID 1: MTTF 2 /2 –RAID 5: MTTF 2 /(D*(D-1)) –RAID 6: MTTF 3 /(D*(D-1)*(D-2)) Nested RAID MTTDL –RAID 1+0: MTTDL(RAID1)/N –RAID 5+0: MTTDL(RAID5)/N –RAID 6+0: MTTDL(RAID6)/N

Why N+3 (Or Higher) Isn’t Done Hardware RAID solutions largely don’t support it –Known Exception: RAID-TP from Accusys uses three parity disks Software RAID doesn’t support it –Reed-Solomon coding is CPU intensive and inefficient with CPU memory organization

An Overview of Reed-Solomon Coding General method of generating arbitrary amounts of parity data for n+m systems A vector of n data elements is multiplied by an n x m dispersal matrix, yielding m parity elements Finite field arithmetic

Multiplication Example {37} = = = x 5 + x 2 + x 0 Use Linear Shift Feedback Register to multiply an element by {02} x0x0 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7

Multiplication Example Direct arbitrary multiplication requires distributing so that only addition (XOR) and multiplication by two occur. –{57} x {37} –{57} x ({02} 5 + {02} 2 + {02}) –{57} x {02} 5 + {57} x {02} 2 + {57} x {02} Potentially dozens of elementary operations!

Optimization: Lookup Tables Similar to the relationship that holds for real numbers: e log(x)+log(y) = x * y This relationship translates (almost) directly to finite field arithmetic, with lookup tables for the logarithm and exponentiation operators Unfortunately, parallel table lookup capabilities aren’t common in commodity processors –Waiting patiently for SSE5

NVIDIA GPU Architecture GDDR3 Global Memory Multiprocessing Units One shared 8 KB memory region per multiprocessing unit (16 banks) Eight cores per multiprocessor

Integrating the GPU

3+3 Performance

29+3 Performance

Neglecting PCI Traffic: 3+3

Conclusion GPUs are an inexpensive way to increase the speed and reliability of software RAID By pipelining requests through the GPU, N+3 (and greater) are within reach –Requires minimal hardware investment –Provides greater reliability than available with current hardware solutions –Sustains high throughput compared to modern hard disks