Storage Cheap or Fast, Pick One. Storage Great--you can do a lot of computation. But this often generates a lot of data. Where are you going to put it?

Slides:



Advertisements
Similar presentations
The Linux Storage People Simple Fast Massively Scalable Network Storage Coraid EtherDrive ® Storage.
Advertisements

NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
1 CSC 486/586 Network Storage. 2 Objectives Familiarization with network data storage technologies Understanding of RAID concepts and RAID levels Discuss.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Storage area Network(SANs) Topics of presentation
SQL Server, Storage And You Part 2: SAN, NAS and IP Storage.
1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Mass-Storage Systems Revised Tao Yang.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
COEN 180 NAS / SAN. NAS Network Attached Storage (NAS) Each storage device has its own network interface. Filers: storage device that interfaces at the.
Storage Area Network (SAN)
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
One Stop Storage Art Center College of Design Pasadena, CA Theresa Zix, Senior Director of Network Services Manny Hernandez, Manager of Network Services.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Module – 7 network-attached storage (NAS)
COEN 180 NAS / SAN. Storage Trends Storage Trends: Money is spend on administration Morris, Truskowski: The evolution of storage systems, IBM Systems.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Mass Storage System EMELIZA R. YABUT MSIT. Overview of Mass Storage Structure Traditional magnetic disks structure ◦Platter- composed of one or more.
By : Nabeel Ahmed Superior University Grw Campus.
Storage System: RAID Questions answered in this lecture: What is RAID? How does one trade-off between: performance, capacity, and reliability? What is.
DAS Last Update Copyright Kenneth M. Chipps Ph.D. 1.
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
Storage Survey and Recent Acquisition at LAL Michel Jouvin LAL / IN2P3
Managing Storage Lesson 3.
Oracle 10g Database Storage Demystified Jeff Browning, O.C.P, R.H.C.A. Senior Manager Network Appliance, Inc. OracleWorld 2003 San Francisco.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Module 9: Configuring Storage
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Chapter 5 Section 2 : Storage Networking Technologies and Virtualization.
Virtualization for Storage Efficiency and Centralized Management Genevieve Sullivan Hewlett-Packard
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Storage Systems Market Analysis Dec 04. Storage Market & Technologies.
IST Storage & Backup Group 2011 Jack Shnell Supervisor Joe Silva Senior Storage Administrator Dennis Leong.
Using NAS as a Gateway to SAN Dave Rosenberg Hewlett-Packard Company th Street SW Loveland, CO 80537
The concept of RAID in Databases By Junaid Ali Siddiqui.
ISCSI. iSCSI Terms An iSCSI initiator is something that requests disk blocks, aka a client An iSCSI target is something that provides disk blocks, aka.
Storage and Backup Overview 15 February 2016TCS Internal.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
System Storage TM © 2007 IBM Corporation IBM System Storage™ DS3000 Series Jüri Joonsaar Tartu.
Managing Storage Module 3.
Storage HDD, SSD and RAID.
Open-E Data Storage Software (DSS V6)
Storage Area Networks The Basics.
Video Security Design Workshop:
Direct Attached Storage and Introduction to SCSI
Storage Networking.
Chapter 12: Mass-Storage Structure
Introduction to Networks
Introduction to Networks
Storage Virtualization
Direct Attached Storage and Introduction to SCSI
Storage Networking.
RAID RAID Mukesh N Tekwani
Chapter 12: Mass-Storage Systems
Storage Networks and Storage Devices
Storage Networking Protocols
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Cost Effective Network Storage Solutions
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Storage Cheap or Fast, Pick One

Storage Great--you can do a lot of computation. But this often generates a lot of data. Where are you going to put it? In the demo clusters we run, the front end just has a little more disk than the compute nodes. This is used for the NFS-mounted home directories. But single drives do not have very good performance and are vulnerable to single-point failure

RAID The next step up is RAID (Redundant Array of Inexpensive|Independent Disks) This uses a pool of disks to save data. Rather than spend billions building special high- capacity disks, greater capacity is achieved by simply putting PC disks into RAIDs. Typically you need a RAID controller on the host There are several types/levels of RAID

RAID 0 RAID 0 writes blocks to multiple disks without redundancy Because the data is being written to multiple disks the controller can work in parallel on both read and write, improving performance If any error occurs data can be lost Don’t use on mission critical data; only for performance Ideally you have one drive per controller

RAID 1 This is mirroring. The same data is written to two disks. If either disk fails a complete copy of the data is available at the other disk Uses 2X the storage space, can get better performance because the OS can pick the disk with the least seek or rotational latency

RAID-5 RAID5 uses “parity” or redundant information. If a block fails, enough parity information is available to recover the data The parity information is spread across all the disks High read rate, medium write rate A disk failure requires a rebuild as the parity information is used to re-create the data lost

RAID-10 RAID-10 is striping plus mirroring, so you get good performance plus a fully mirrored data, at the expense of 2X disk

Storage RAID-5 is a reasonable choice most of the time. There are many commodity vendors of RAID arrays SCSI RAID arrays are expensive, the disks are expensive, and the disks have low capacity, but the RAID arrays have good performance ATA raid arrays have excellent price (1/3-1/2 that of SCSI drives) and capacity, somewhat lower performance Apple ATA RAID: 7 TB, $11.5K Promise Vtrak 15110: $4K plus GB SATA disks at $300 = 6 TB for $8,500

Storage in Clusters OK, so new you’ve got a RAID array. Now what? A pretty typical approach is to attach the RAID array to a host and then have that host serve NFS. Sometimes called “Network Attached Storage” or NAS Rather than being attached to the front end this may be a node of its own in the cluster C1 FE C2 S1 RAID

NAS You can easily set up a Windows or open source NAS box and put it on the network. In the linux world this is just a linux box running Samba for Windows directory shares You can also buy pre-configured “appliances” for disk space. These are often boxes that run a stripped- down Linux variant and have a bunch of disks stuffed into them, along with a web-based interface.

NAS Appliances Sun StorEdge 5210 Dell PowerVault 745N Snap Server 4200 Linksys, Iomega NAS (SOHO, combined with wireless & print server) NetApp (enterprise class) Emerging class: a NAS front-end/gateway to a SAN

Storage Area Networks SANs are high performance, and you pay for it. A SAN has storage devices on a special network that carries only I/O traffic. The storage devices are connected to a set of servers. All the servers share all the storage devices on the SAN. In effect each server is attached to two networks: one for communication with other hosts, the other reserved for communicating with storage devices

Storage Area Network

SANs SANs let you add storage to a pool that can be shared by all servers connected to the SAN This can be a hard problem to solve due to parallelism. The original SANs sometimes had a single server attached to a single disk device to prevent simultaneous access More recent SANs use a distributed filesystem to avoid concurrency problems

Single Machine Filesystems Attached Disk Server This is a fairly simple problem to solve because there is only one entity making requests of the disk-- “give me 200 blocks”. The server can keep track of block allocations and file names and make that information permanent by writing it to disk

DFS SAN Disk Server 1 Server 2 FC Switch S1 asks for 500 disk blocks to hold a file named “Foo”. At the same time S2 asks for 200 blocks to hold a file named “Foo” Which wins? How can we ensure that none of the blocks for S1 are also given to S2? Effectively we need to serialize the requests to the SAN disk, and we need something to keep track of file names and block allocations that can be queried by all the servers

MetaData A solution is to use metadata. This is “data about data”. In effect it keeps track of things like filenames and block allocations, just like a single machine filesystem does The metadata has to be available to all the servers who want to participate in the SAN, so it is typically written on the SAN itself Meta SAN Disk SAN Disk FC Switch Server 1 Server 2

Metadata Comms As an added twist, most implementations send the metadata over a separate physical network from the fibrechannel--usually gigabit ethernet. So every device in the SAN is usually connected by two networks San Disk Server 1Server 2 San Disk Meta Data FC Switch Gig E Switch

SAN Disks As you can see, the SAN “disks” are getting pretty complex. In reality the “disks” are usually RAID enclosures The RAID enclosure has a FC port, an ethernet port, and a RAID controller for its disks The SAN is really building on top of RAID building blocks

SAN This arrangement has many advantages: Servers and storage devices can be spread farther apart All the storage devices go into a single, shared pool Very high performance; can do serverless backups Storage device availability not tied to the uptime of a host Can tie applications to specific storage types (ATA RAID for high volume, SCSI RAID for high traffic transactional databases) OTOH, SANs probably won’t go commodity any time soon, since they have limited applicability to the desktop. Which means that they will remain expensive and complex

iSCSI The most common SAN network is fibre channel (FC). The protocol used is often SCSI. You can send SCSI over other network protocols; an emerging option is iSCSI, which puts SCSI in IP over gigabit ethernet. This is slower but exploits the existing IP infrastructure and enables WAN SANs. Why recreate another, separate network for data traffic using a different hardware standard that requires new training? Why not just use well-understood IP networks that your people are already trained on? The drawback is that iSCSI has somewhat higher latency, which is an issue for disk access

iSCSI iSCSI simply sends SCSI commands encapsulated inside TCP/IP, just as FC SANs send SCSI commands encapsulated inside FC frames This can enable wide-area SANs (if you have the bandwidth and are willing to live with the latency) since IP can be routed; a SAN distributed across an entire state or country is possible Putting the SCSI commands inside TCP/IP can add a couple layers to the software stack, which unavoidably increases latency; Gbit Ethernet also has less bandwidth than FC Implementations seem a bit immature for the enterprise; might ony make a splash with 10 gbit ethernet Another option is ATA over Ethernet (AOE)

Fibre Channel Costs Uses copper or optical fibre at 2+ Gbits/sec Dell/EMC AX100, 3 TB SATA disks, 1 8 port fibre channel switch, 1 FC card = $16K FC cards approx. $500 each, 8 port FC switch aprox. $2,500 Filers with hundreds of TB are available if you’ve got the checkbook Often SANs use SCSI disk arrays to maximize performance Optical FC has a range up to a few KM, so this can be spread across a campus

SAN and Clusters FE C4 C1 C2 C3 FC Storage FC Storage FC Switch Very high performance--the compute nodes participate in the SAN and can share in the benefits of the high speed storage network Interconnect Network Fiber Channel Switching Fabric FC Jukebox

SAN and Clusters If you an afford this you can get very high performance; you might have four network interfaces (Gigabit ethernet, Infiniband, Gigabit ethernet for metadata, and FC) on each node May work well in situations with heavy database access, very heavy image processing Biological research, etc.

SAN Example Apple sells XSAN, their OEM’d storage area network software; we are setting this up here FC Switch Beatnik Bongo Raid 1 Raid2 Ethernet Switch

XSAN Each RAID enclosure of 14 disks may be divided up into multiple Logical Unit Numbers (LUNs). One LUN corresponds to one set of RAID disks--a single RAID box may have more than one collection of RAID sets, which may be of different types. One of the RAID boxes has three LUNs: One set of seven disks in a RAID-5 configuration, one set of five disks in a RAID-5 configuration, and one set of two disks in a RAID-1 configuration

XSAN XSAN builds “Volumes” (a virtual disk) out of “storage pools”. A storage pool consists of one or more LUNs.

XSAN A storage pool is simply a collection of LUNs, and a volume consists of one or more storage pools

XSAN The LUNs are themselves used as RAID elements; XSAN treats each of them as disks and does a RAID-0 across them. This means that LUNs in the same storage pool should be of the same size

FC Switch Config The FC switch requires some configuration; this is done via a web-based interface. It’s not quite as easy as plug-and-go. “Initiators” are computers, and “targets” are RAID enclosures

Non-Apple Non-Apple computers can participate in the SAN; there are Linux implementations of the XSAN software available from third parties Each computer participating in the SAN must have the XSAN software; edu price is about $500 per copy (more for Linux)

Summary Buy some disk