Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mass Storage & Information Retrieval

Similar presentations


Presentation on theme: "Mass Storage & Information Retrieval"— Presentation transcript:

1 Mass Storage & Information Retrieval
Paul J Mazzotte Union University April 02, 2004

2 Agenda Background Storage Paradigms Storage and Backup What’s Next
RAID and JBOD SCSI and FC Storage Paradigms DAS (Direct Attached Storage) NAS (Networked Attached Storage) SAN (Storage Area Networks) Performance and Cost – NAS vs SAN Storage and Backup Backup Software Tape Technologies DAS and Backup SAN and Backup What’s Next Aprile 2, 2004

3 Background Aprile 2, 2004

4 RAID and JBOD Aprile 2, 2004

5 RAID and JBOD JBOD: “Just a Bunch Of Disks”
Drives independently attached to the I/O channel Scaleable, but requires server to manage multiple volumes Does not provide protection in case of failure RAID: “Redundant Array of Inexpensive Disks” Fault-tolerant grouping of disks that server sees as a single volume Combination of parity-checking, mirroring, and striping Self-contained manageable unit of storage Inexpensive? 72 GB FC 10K RPM Drive $1,350 from Compaq (2/03) $1,200 from SGI (12/02) Aprile 2, 2004

6 RAID Multiple RAID Levels to choose from:
0, 1, 2, 3, 4, 5, 6, 10 Each level has certain inherent advantages and disadvantages. RAID 0 - Disk striping (performance) RAID 1 - Disk mirroring (security) RAID 2 - Disk striping with ECC RAID 3 - Disk striping with ECC stored as parity on one drive (better performance for large data block transfers) RAID 4 - Disk striping large blocks; parity stored on one drive (better performance for large data block transfers) RAID 5 - Disk striping with parity distributed across multiple disks (better performance for small data block transfers) RAID 6 - Similar to RAID 5 but with additional parity information to recover from a two drive failure. RAID10 (RAID 0 + 1) - Combination of RAID0 (striping) and RAID1 (mirroring). Aprile 2, 2004

7 RAID Levels Advantages – Performance when multiple controllers used
Data is subdivided and each division is written to a different disk drive. Advantages – Performance when multiple controllers used Disadvantages - Not a true raid Minimum 2 drives Data is written to two different drives. Advantages – 100% Redundant 1 write, 2 reads possible Disadvantages – Highest Disk Overhead Minimum 2 drives Aprile 2, 2004

8 RAID Levels Advantage – Medium read, High write performance
The data block is subdivided ("striped") and written on the data disks. The stripe parity is generated on writes, recorded on the parity disk and checked on reads. Advantage – Medium read, High write performance Disadvantages - Rebuild time (Compared to Raid 1) Minimum 3 drives Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. Advantage – High read, medium write performance Disadvantages – Rebuild time (Compared to Raid 1) Minimum 3 drives Aprile 2, 2004

9 SCSI and FC Aprile 2, 2004

10 SCSI Version Databus Speed Cable 1 (1986) 8 bit 5 MB/s (slow) 6 meters
2 (1994) 8 bit (narrow) 10 MB/s (fast) 25 meters 16 bit (wide) 20 MB/s 25 meters 3 [Ultra](1995) 8 bit 20 MB/s (fast-20) 25 meters 16 bit 40 MB/s 25 meters [Ultra-2](1998) 8 bit 40 MB/s (fast-40) 25 meters 16 bit 80 MB/s 25 meters [Ultra-3](1999) 8 bit 80 MB/s (fast-80) 25 meters 16 bit 160 MB/s (ultra-160) 25 meters [Ultra-4](2003) 8 bit MB/s (fast-160) 25 meters 16 bit 320 MB/s (ultra-320) 25 meters Small Computer System Interface Version 1 - Single Ended (one wire driven against ground) 50 Pin Centronics type connector (Alternative 2, A-connector) Passive Termination Version 2 - Differential (voltage difference between two wires) HVD (5 Volts) 50 pin high density connector (Alternative 1, A-connector) Active Termination for SE Version 3 - No longer 1 document but a collection of documents SPI pin high density connector (Alternative 3 P-connector) No longer need for two cables for wide SCSI SPI2 - LVD (3 Volts) Most LVD devices are LVD/SE Single Ended cannot go faster than Ultra speeds Very High Density Cable (VHDCI) (Alternative 4 P-connector) SPI3 - Removed HVD Aprile 2, 2004

11 Fibre Channel Aprile 2, 2004 Point-to-Point
200 MB Point-to-Point 200 MB - Topology is a word borrowed from mathematics and used here to describe the way the nodes on a network are connected. - SAN technology supports three basic topologies: point-to-point, arbitrated loop, and switched fabric. 1.) Point-to-point is a simple topology that allows bi-directional communication between two nodes, in this case a storage system and a server. Point-to-point, like all SAN topologies, benefit from the long reach possible with Fiber Optic connections. 2.) The arbitrated loop is a ring topology where each node passes data to adjacent nodes. Like a Token Ring LAN, the SAN hub arbitrates requests for data to make optimum use of the available bandwidth. 3.) Switched fabric is a SAN term used to describe extensive storage networks where large numbers of servers and storage systems are connected using Fiber Optic switches. Switches can be cascaded and combined with loops to create highly interwoven networks known as fabrics. Arbitrated Loop Switch 200 MB Aprile 2, 2004 Switched Fabric

12 SCSI and FC Fibre Fibre Parallel Channel Channel AL SCSI
Connections 16 Million Distance 10 km 10 km 25 m Bandwidth 200 MB/s 200 MB/s 320 MB/s Per connection Shared Shared Hut Plug Yes Yes No Multiple Protocols Yes Yes No Aprile 2, 2004

13 NOT SCSI vs FC ATM IP SCSI-3 ULP (Upper Level Protocol) FC - 4 FC - 3
FC Link Encapsulation FC - LE ULP (Upper Level Protocol) SCSI-3 SCSI - 3 Command Set Mapping FC - 4 IPI - 3 Command Set Mapping (IPI-3 STD) FC - 3 Common Services FC - 0 FC - 1 FC - 2 Fibre Channel Physical & Signaling Interface ( FC- PH, FC-PH2, FC-PH3 ) Physical Variant Encode / Decode Framing Protocol FC - AL 8B/10B Encoding Copper, Optical FC - AL -2 Aprile 2, 2004

14 Storage Aprile 2, 2004

15 DAS, NAS, and SAN Aprile 2, 2004

16 DAS LAN Client Workstations
File I/O (NFS/CIFS) Application Server(s) File Server(s) - Today, DAS is still the most widely used form of storage architecture. - DAS is comprised of multiple storage disks or disk array units that are directly attached to a general-purpose server. - While DAS is traditionally easy to implement and even easier to understand, there are a couple big disadvantages: 1.) DAS yields a greater range of information distribution across a network. Leads to two issues management and utilization. Average utilization of open systems storage is less than 50%. – Gartner “Analyst said that for every $1 spent on tape or disk storage, it costs $4 to $7 more to manage it.” – Lucas Mearian, Computerworld 14 May 2001 2.) DAS structure leads to problems as far as file sharing (especially between platforms). With DAS, file sharing applications (such as NFS, CIFS, or Samba) need to be used on the server where the data resides. Block I/O (SCSI/FC-AL) Definition: DAS is composed of multiple storage disks or disk array units that are directly attached to a general purpose server. Aprile 2, 2004

17 DAS Issues Proliferation of “server and storage islands” which causes a large management burden File Sharing Issues Aprile 2, 2004

18 NAS LAN Client Workstations NAS Servers (filers)
File I/O (NFS/CIFS) - NAS is a special-purpose storage system that directly attaches to the LAN and responds to file I/O requests coming across the LAN from a device Contains: - Disk - Server (Filer) which has a optimized network operating system usually a UNIX/Linux kernel, which is fine-tuned especially for this one function. Problem 1.) Management /Inefficient Storage Use - The large management burden and inefficient storage utilization problem inherent in the DAS architecture is solved to some extent in a NAS configuration because all storage is centralized in large NAS units. Problem 2) File Sharing NAS solves the file sharing issues since most “good” NAS boxes can share files to both UNIX/PC clients. However, since the data is being shared across the LAN the file sharing protocols (NFS/CIFS) are used. - Performance, by means of network traffic, is the biggest concern. This issue is due to the file-level access protocols (NFS/CIFS) used with NAS subsystems are inherently slow. DAFS (Direct Access File Systems) NAS Servers (filers) Definition: NAS is a special-purpose storage system that directly attaches to the LAN and responds to file I/O requests coming across the LAN from a device. Aprile 2, 2004

19 Same as DAS – Not Exactly
Tuned Network Operating System (NOS) Supports Multiple Protocols (NFS, CIFS, NCP) Aprile 2, 2004

20 Does NAS Solve DAS Issues
Simplify Management – Yes (for the most part) Allows storage to be consolidated but only up to the size of the NAS box (~5 to 15 TB) File Sharing – Yes “True NAS” servers will have support for multiple protocols. Aprile 2, 2004

21 NAS Issue Performance Network bandwidth / Network Traffic
Protocol Inefficiencies Aprile 2, 2004

22 SAN Client Management Station LAN Application Servers Management Server(s) - SAN is the newest paradigm for attaching and managing storage Made up of: - Disk - Fibre Channel Switchs (or hubs) Problem 1.) Management /Inefficient Storage Use - The large management burden and inefficient storage utilization problem is solved a little bit better in a SAN setup than a NAS setup. SAN storage is seen as one large island instead of a couple of large islands. Problem 2) File Sharing - SAN does not fully address the file sharing issue as NAS does. Basically, SAN is like DAS when file sharing to a client it must use NFS/CIFS (but unlike NAS there is no tuned kernel). However, with some SAN implementations it is possible to “mount” volumes on multiple machines attached to the SAN. Block I/O (FC) FC Network Definition: SAN is a high-speed network dedicated to interfacing storage subsystems to servers. Disk Aprile 2, 2004

23 Zoning arranges FC connected devices
Zoning (1 of 2) Zoning arranges FC connected devices into logical groups FC Switch Network Node Zone X Zone Y Aprile 2, 2004

24 Zoning (2 of 2) Operation Zone members “see” only other members of the zone Zones are configured dynamically Devices can be members of more than one zone Switched fabric zoning can take place at the port or device level Benefits Secured device access Allows operating system co-existence Aprile 2, 2004

25 Does SAN Solve DAS Issues
Simplify Management – Yes Allows storage to be consolidated (seen as one big island instead of a couple large islands like NAS) File Sharing – Not Yet Still waiting for the development of a CFS. Aprile 2, 2004

26 SAN and NAS Recap SAN NAS Local storage access Remote file access
Private net for storage Storage protocols Centralized management NAS Remote file access Shares user net Network protocols “Centralized” management Good for hosting large databases Good for file sharing (“home directories”) Aprile 2, 2004

27 SAN/NAS Performance Aprile 2, 2004 System File Server (SFS) committee
Standard Performance Evaluation Corporation (SPEC) Aprile 2, 2004 SPEC

28 SAN/NAS Cost Cost per MB 3 Year TCO (cents per MB) for 2 TB
1. Customer estimates of the number of TB of data that can be managed by a full-time administrator run from 1.5 to 5.0 for DAS, with 6.0 to 13.3 for NAS/SAN. 2. Additionally, while customers report up to 50 percent disk utilization on DAS, that utilization increases to up to 90 percent for SAN and NAS - “The Storage Report - Customer Perspectives & Industry Evolution - 19 June ” by Merrill Lynch & Co. and McKinsey & Company, Page 48, Chart 51 Aprile 2, 2004 “The Storage Report - Customer Perspectives & Industry Evolution - 19 June 2001” by Merrill Lynch & Co. and McKinsey & Company, Page 48, Chart 51

29 SAN/NAS Cost Cents per MB (2.5 TB) Cents per MB (5 TB) Cents per MB
Type Platform Netapp FAS960 NAS 7.2 ($176,722) 4.1 ($206,836) N/A SAN Compaq EVA 9.1 ($228,261) 5.5 ($275,266) 3.4 ($406,880) Note: SAN costs include two 16-port switches but no cabling. Aprile 2, 2004

30 SAN/NAS Business Trend
Data storage will account for 75% of all IT spending for the next five (5) years. - IDC (2001) Most external storage will be networked by 2005. - Nick Allen Gartner Group Most enterprises will gain more savings though consolidating storage than through servers until 2002 (0.9 probability). Aprile 2, 2004 “SNIA Presentation - 19 May 1999” by Nick Allen of Gartner Group

31 SAN/NAS Business Trend
Annual vendor revenue $B DAS SAN SAN % 2 4 6 8 10 12 14 16 2000 2001 2002 2003 2004 2005 2006 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Aprile 2, 2004 Source: “Worldwide external raid controller-based storage forecast, ”, Gartner, August 2002

32 Backup Aprile 2, 2004

33 Backup Software (Mid-Range)
Legato (Networker) Veritas (Netbackup) IBM (Tivoli) Aprile 2, 2004

34 Mid-Range Tape Technologies
AIT-3 SuperDLT LTO-1 Mammoth-2 Manufacturer Sony Quantum IBM/S/HP Exabyte Release Q Q Q Q1 2000 Technology Helical Linear Linear Helical Native Capacity (GB) Compressed Capacity (GB) Native Transfer Rate (MB/s) Compress Transfer Rate (MB/s) 12 Hr Window Trans Rate (GB) MTBF (Hours) 400, , , ,000 Head Life (Hours) 50,000 30,000 30,000 50,000 Media Life (Avg Passes) 30,000 1,000,000 1,000,000 20,000 Media Price per Cartridge $135 $134 $110 $89 Price per GB (Native) $1.35 $1.22 $1.10 $1.48 Drive Price $?,?00 $4,400 $4,300 $4,000 SCSI LVD LVD/HVD LVD/HVD LVD/HVD Fibre Channel NO NO YES YES The announced road maps are as follows: [Note: Year(Native Capacity, Compressed Capacity, Native Transfer Rate, Compressed Transfer Rate] Mammoth (M3, M4, M5) 2003(120,300,20,50) 2004(200,500,30,75) (400,1000,60,150) LTO (LTO-2, LTO-3, LTO-4) 2003(200,400,30,60) 2004(400,800,60,120) (800,1600,120,240) AIT (AIT-4, AIT-5, AIT-6) 2003(200,520,24,62) 2005(400,1040,48,124) 2007(800,2080,96,248) DLT (SDLT-2, SDLT-3) 2003(220,440,22,44) 2005(500,1000,44,88) 200?(???,????,??,???) Exabyte Mammoth1 - 20/40 GB (Theory) 35 GB (Actual) 10.8 GB/hr (Theory) 5 - 8 GB/hr (Actual) Exabyte Mammoth /150 GB (Theory) 100 GB(Actual) 43.2 GB/hr (Theory) GB/hr (Actual) IBM LTO /200GB (Theory) ?? GB(Actual) 54 GB/hr (Theory) ?? GB/hr (Actual) Aprile 2, 2004

35 Small Servers / Desktops
DAS and Backup LAN Backup Servers Jukebox More Servers Backup Client Nodes Small Servers / Desktops Aprile 2, 2004

36 SAN and Backup From Gigabit LAN Backup Server NAS Nodes Server Nodes
Files to Backup Backup File Index Disk Blocks SAN and Backup LAN Backup Server NAS Nodes Server Nodes FC Network Servers (Oracle, Mail, etc) From Gigabit Netapp Filers Aprile 2, 2004 SAN Disk Array(s) Tape Library

37 What’s Next Aprile 2, 2004

38 In The Near Future Storage iSCSI Backup Disk to Disk Backup
Aprile 2, 2004

39 Review RAID and JBOD SCSI and FC NAS and SAN Backup Aprile 2, 2004

40 The End Aprile 2, 2004


Download ppt "Mass Storage & Information Retrieval"

Similar presentations


Ads by Google