1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Storage: Alternate Futures Jim Gray Microsoft Research IBM Almaden,

Slides:

Advertisements

Similar presentations

1 Storage Bricks Jim Gray Microsoft Research FAST 2002 Monterey, CA, 29 Jan 2002 Acknowledgements : Dave Patterson.

Advertisements

1 The 5 Minute Rule Jim Gray Microsoft Research Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

IT253: Computer Organization

Hard Disks Low-level format- organizes both sides of each platter into tracks and sectors to define where items will be stored on the disk. Partitioning:

NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.

1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.

CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.

1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])

1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.

1  1998 Morgan Kaufmann Publishers Chapter 8 Storage, Networks and Other Peripherals.

High Performance Computing Course Notes High Performance Storage.

Secondary Storage CSCI 444/544 Operating Systems Fall 2008.

Hardware and Software Basics. Computer Hardware  Central Processing Unit - also called “The Chip”, a CPU, a processor, or a microprocessor  Memory (RAM)

Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.

Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.

12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.

Chapter 3 – Computer Hardware Computer Components – Hardware (cont.) Lecture 3.

“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath.

The Cost of Storage about 1K$/TB 12/1/1999 9/1/2000 9/1/2001 4/1/2002.

Computer Systems 1 Fundamentals of Computing

Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.

Computer Systems I’m ONLY a machine! Standard Grade Revision.

Computer Organization CS224 Fall 2012 Lesson 51. Measuring I/O Performance  I/O performance depends on l Hardware: CPU, memory, controllers, buses l.

DISKS IS421. DISK  A disk consists of Read/write head, and arm  A platter is divided into Tracks and sector  The R/W heads can R/W at the same time.

CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.

Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.

Stuart Cunningham - Computer Platforms COMPUTER PLATFORMS Input, Output, and Storage & Introduction to Basic Computer Architecture Week 2.

I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.

Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.

Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”

Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.

IT 344: Operating Systems Winter 2010 Module 13 Secondary Storage Chia-Chi Teng CTB 265.

Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,

DAC-FF The Ultimate Fibre-to-Fibre Channel External RAID Controller Solution for High Performance Servers, Clusters, and Storage Area Networks (SAN)

Computing and the Web Computer Hardware Components.

1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.

1 Store Everything Online In A Database Jim Gray Microsoft Research

1 Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000

1 Put Everything in Future (Disk) Controllers (it’s not “if”, it’s “when?”) Jim Gray Acknowledgements : Dave Patterson.

IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.

Csci 136 Computer Architecture II – IO and Storage Systems Xiuzhen Cheng

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Storage: Alternate Futures Jim Gray Microsoft Research Research.Micrsoft.com/~Gray/talks NetStore ’99 Seattle.

CS 6290 I/O and Storage Milos Prvulovic. Storage Systems I/O performance (bandwidth, latency) –Bandwidth improving, but not as fast as CPU –Latency improving.

Abstract Increases in CPU and memory will be wasted if not matched by similar performance in I/O SLED vs. RAID 5 levels of RAID and respective cost/performance.

Storage Systems CSE 598d, Spring 2007 Lecture ?: Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007.

CSE 451: Operating Systems Winter 2012 Secondary Storage Mark Zbikowski Gary Kimura.

1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.

COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.

Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.

CSE 451: Operating Systems Spring 2010 Module 12.5 Secondary Storage John Zahorjan Allen Center 534.

Hardware Technology Trends and Database Opportunities

IT 344: Operating Systems Winter 2008 Module 13 Secondary Storage

Introduction to Computers

Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.

CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage

CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage

CSE 451: Operating Systems Winter 2007 Module 13 Secondary Storage

CSE 451: Operating Systems Spring 2006 Module 13 Secondary Storage

CSE 451: Operating Systems Secondary Storage

CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage

CSE 451: Operating Systems Winter 2009 Module 12 Secondary Storage

CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage

CSE 451: Operating Systems Autumn 2004 Secondary Storage

CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage

Hard disk basics Prof:R.CHARLES SILVESTER JOE Departmet of Electronics St.Joseph’s College,Trichy.

CSE 451: Operating Systems Spring 2007 Module 11 Secondary Storage

Presentation transcript:

1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Storage: Alternate Futures Jim Gray Microsoft Research IBM Almaden, 1 December 1999

2 Acknowledgments: Thank You!! Dave Patterson: –Convinced me that processors are moving to the devices. Kim Keeton and Erik Riedell –Showed that many useful subtasks can be done by disk-processors, and quantified execution interval Remzi Dusseau –Re-validated Amdahl's laws

3 Outline The Surprise-Free Future (5 years) –500 mips cpus for 10$ –1 Gb RAM chips –MAD at 50 Gbpsi –10 GBps SANs are ubiquitous –1 GBps WANs are ubiquitous Some consequences –Absurd (?) consequences. –Auto-manage storage –Raid10 replaces Raid5 –Disc-packs –Disk is the archive media of choice A surprising future? –Disks (and other useful things) become supercomputers. –Apps run “in the disk”

4 The Surprise-free Storage Future 1 Gb RAM chips MAD at 50 Gbpsi Drives shrink one quantum Standard IO 10 GBps SANs are ubiquitous 1 Gbps WANs are ubiquitous 5 bips cpus for 1K$ and 500 mips cpus for 10$

5 1 Gb RAM Chips Moving to 256 Mb chips now 1Gb will be “standard” in 5 years, 4 Gb will be premium product. Note: –256Mb = 32MB: the smallest memory – 1 Gb = 128 MB: the smallest memory

6 System On A Chip Integrate Processing with memory on one chip –chip is 75% memory now –1MB cache >> 1960 supercomputers –256 Mb memory chip is 32 MB! –IRAM, CRAM, PIM,… projects abound Integrate Networking with processing on one chip –system bus is a kind of network –ATM, FiberChannel, Ethernet,.. Logic on chip. –Direct IO (no intermediate bus) Functionally specialized cards shrink to a chip.

7 500 mips System On A Chip for 10$ 486 now 7$ 233 MHz ARM for 10$ system on a chip AMD/Celeron 266 ~ 30$ In 5 years, today’s leading edge will be –System on chip (cpu, cache, mem ctlr, multiple IO) –Low cost –Low-power –Have integrated IO High end is 5 BIPS cpus

8 Standard IO in 5 Years Probably Replace PCI with something better will still need a mezzanine bus standard Multiple serial links directly from processor Fast (10 GBps/link) for a few meters System Area Networks (SANS) ubiquitous (VIA morphs to SIO?)

9 1 GBps Ubiquitous 10 GBps SANs in 5 years 1Gbps Ethernet are reality now. –Also FiberChannel,MyriNet, GigaNet, ServerNet,, ATM,… 10 Gbps x4 WDM deployed now (OC192) –3 Tbps WDM working in lab In 5 years, expect 10x, progress is astonishing Gilder’s law: Bandwidth grows 3x/year 5 MBps 20 Mbsp 40 MBps 80 MBps 120 MBps (1Gbps)

10 Thin Client’s mean HUGE servers AOL hosting customer pictures Hotmail allows 5 MB/user, 50 M users Web sites offer electronic vaulting for SOHO. IntelliMirror: replicate client state on server Terminal server: timesharing returns …. Many more.

11 Remember Your Roots?

12 MAD at 50 Gbpsi MAD: Magnetic Aerial Density: 3-10 Mbpsi in products 28 Mbpsi in lab 50 Mbpsi = paramagnetic limit but…. People have ideas. Capacity: rise 10x in 5 years (conservative) Bandwidth: rise 4x in 5 years (density+rpm) Disk: 50GB to 500 GB, 60-80MBps 1k$/TB 15 minute to 3 hour scan time.

13 The “Absurd” Disk 2.5 hr scan time (poor sequential access) 1 aps / 5 GB (VERY cold data) It’s a tape! 1 TB 100 MB/s 200 Kaps

14 Disk vs Tape Disk –47 GB –15 MBps – 5 ms seek time – 3 ms rotate latency – 9$/GB for drive 3$/GB for ctlrs/cabinet –4 TB/rack Tape –40 GB – 5 MBps –30 sec pick time –Many minute seek time –5$/GB for media 10$/GB for drive+library –10 TB/rack The price advantage of tape is narrowing, and the performance advantage of disk is growing Guestimates Cern: 200 TB 3480 tapes 2 col = 50GB Rack = 1 TB =20 drives

15 Standard Storage Metrics Capacity: –RAM: MB and $/MB: today at 512MB and 3$/MB –Disk:GB and $/GB: today at 50GB and 10$/GB –Tape: TB and $/TB: today at 50GB and 12k$/TB (nearline) Access time (latency) –RAM:100 ns –Disk: 10 ms –Tape: 30 second pick, 30 second position Transfer rate –RAM: 1 GB/s –Disk: 15 MB/s Arrays can go to 1GB/s –Tape: 5 MB/s striping is problematic, but “works”

16 New Storage Metrics: Kaps, Maps, SCAN? Kaps: How many kilobyte objects served per second –The file server, transaction processing metric –This is the OLD metric. Maps: How many megabyte objects served per second –The Multi-Media metric SCAN: How long to scan all the data –the data mining and utility metric And –Kaps/$, Maps/$, TBscan/$

17 For the Record (good 1999 devices packaged in system X 100 Tape is 1Tb with 4 DLT readers at 5MBps each.

18 For the Record (good 1999 devices packaged in system ) Tape is 1Tb with 4 DLT readers at 5MBps each.

19 The Access Time Myth The Myth: seek or pick time dominates The reality: (1) Queuing dominates (2) Transfer dominates BLOBs (3) Disk seeks often short Implication: many cheap servers better than one fast expensive server –shorter queues –parallel transfer –lower cost/access and cost/byte This is obvious for disk arrays This even more obvious for tape arrays Seek Rotate Transfer Seek Rotate Transfer Wait

20 Storage Ratios Changed 10x better access time 10x more bandwidth 4,000x lower media price DRAM/disk media price ratio changed – :1 – :1 – :1 –today ~ 0.1$pMB disk 30:1 3$pMB dram

21 Data on Disk Can Move to RAM in 8 years 30:1 6 years

22 Outline The Surprise-Free Future (5 years) –500 mips cpus for 10$ –1 Gb RAM chips –MAD at 50 Gbpsi –10 GBps SANs are ubiquitous –1 GBps WANs are ubiquitous Some consequences –Absurd (?) consequences. –Auto-manage storage –Raid10 replaces Raid5 –Disc-packs –Disk is the archive media of choice A surprising future? –Disks (and other useful things) become supercomputers. –Apps run “in the disk”.

23 The (absurd?) consequences 256 way nUMA? Huge main memories: now: 500MB - 64GB memories then: 10GB - 1TB memories Huge disks now: 5-50 GB 3.5” disks then: GB disks Petabyte storage farms –(that you can’t back up or restore). Disks >> tapes –“Small” disks: One platter one inch 10GB SAN convergence 1 GBps point to point is easy 1 GB RAM chips MAD at 50 Gbpsi Drives shrink one quantum 10 GBps SANs are ubiquitous 500 mips cpus for 10$ 5 bips cpus at high end

24 The Absurd? Consequences Further segregate processing from storage Poor locality Much useless data movement Amdahl’s laws: bus: 10 B/ips io: 1 b/ips Processors Disks ~ 1 Tips RAM Memory ~ 1 TB ~ 100TB 100 GBps 10 TBps

25 Storage Latency: How Far Away is the Data? Registers On Chip Cache On Board Cache Memory Disk Tape /Optical Robot Olympia This Hotel This Room My Head 10 min 1.5 hr 2 Years 1 min Pluto 2,000 Years Andromeda

26 Consequences AutoManage Storage Sixpacks (for arm-limited apps) Raid5-> Raid10 Disk-to-disk backup Smart disks

27 Auto Manage Storage 1980 rule of thumb: –A DataAdmin per 10GB, SysAdmin per mips 2000 rule of thumb –A DataAdmin per 5TB –SysAdmin per 100 clones (varies with app). Problem: –5TB is 60k$ today, 10k$ in a few years. –Admin cost >> storage cost??? Challenge: –Automate ALL storage admin tasks

28 The “Absurd” Disk 2.5 hr scan time (poor sequential access) 1 aps / 5 GB (VERY cold data) It’s a tape! 1 TB 100 MB/s 200 Kaps

29 Extreme case: 1TB disk: Alternatives Use all the heads in parallel –Scan in 30 minutes –Still one Kaps/5GB Use one platter per arm –Share power/sheetmetal –Scan in 30 minutes –One KAPS per GB 1 TB 500 MB/s 200 Kaps 200GBeach 500 MB/s 1,000 Kaps

30 Drives shrink (1.8”, 1”) 150 kaps for 500 GB is VERY cold data 3 GB/platter today, 30 GB/platter in 5years. Most disks are ½ full TPC benchmarks use 9GB drives (need arms or bandwidth). One solution: smaller form factor –More arms per GB –More arms per rack –More arms per Watt

31 Prediction: 6-packs One way or another, when disks get huge –Will be packaged as multiple arms – Parallel heads gives bandwidth –Independent arms gives bandwidth & aps Package shares power, package, interfaces…

32 Stripes, Mirrors, Parity (RAID 0,1, 5) RAID 0: Stripes –bandwidth RAID 1: Mirrors, Shadows,… –Fault tolerance –Reads faster, writes 2x slower RAID 5: Parity –Fault tolerance –Reads faster –Writes 4x or 6x slower. 0,3,6,..1,4,7,..2,5,8,.. 0,1,2,.. 0,2,P2,..1,P1,4,..P0,3,5,..

33 RAID 10 (strips of mirrors) Wins “wastes space, saves arms” RAID 5: Performance –225 reads/sec –70 writes/sec –Write 4 logical IO, 2 seek rotate SAVES SPACE Performance degrades on failure RAID1 Performance –250 reads/sec –100 writes/sec –Write 2 logical IO 2 seek 0.7 rotate SAVES ARMS Performance improves on failure

34 The Storage Rack Today 140 arms 4TB 24 racks 24 storage processors 6+1 in rack Disks = 2.5 GBps IO Controllers = 1.2 GBps IO Ports 500 MBps IO

35 Storage Rack in 5 years? 140 arms 50TB 24 racks 24 storage processors 6+1 in rack Disks = 14 GBps IO Controllers = 5 GBps IO Ports 1 GBps IO My suggestion: move the processors into the storage racks.

36 It’s hard to archive a PetaByte It takes a LONG time to restore it. Store it in two (or more) places online (on disk?). Scrub it continuously (look for errors) On failure, refresh lost copy from safe copy. Can organize the two copies differently (e.g.: one by time, one by space)

37 Crazy Disk Ideas Disk Farm on a card: surface mount disks Disk (magnetic store) on a chip: (micro machines in Silicon) Full Apps (e.g. SAP, Exchange/Notes,..) in the disk controller (a processor with 128 MB dram) ASIC The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail Clayton M. Christensen.ISBN:

38 The Disk Farm On a Card The 500GB disc card An array of discs Can be used as 100 discs 1 striped disc 50 Fault Tolerant discs....etc LOTS of accesses/second bandwidth 14"

39 Functionally Specialized Cards Storage Network Display M MB DRAM P mips processor ASIC Today: P=50 mips M= 2 MB In a few years P= 200 mips M= 64 MB

40 Data Gravity Processing Moves to Transducers Move Processing to data sources Move to where the power (and sheet metal) is Processor in –Modem –Display –Microphones (speech recognition) & cameras (vision) –Storage: Data storage and analysis

41 It’s Already True of Printers Peripheral = CyberBrick You buy a printer You get a –several network interfaces –A Postscript engine cpu, memory, software, a spooler (soon) –and… a print engine.

42 Disks Become Supercomputers 100x in 10 years 2 TB 3.5” drive Shrink to 1” is 200GB Disk replaces tape? Disk is super computer! Kilo Mega Giga Tera Peta Exa Zetta Yotta

43 Tera Byte Backplane TODAY –Disk controller is 10 mips risc engine with 2MB DRAM –NIC is similar power SOON –Will become 100 mips systems with 100 MB DRAM. They are nodes in a federation (can run Oracle on NT in disk controller). Advantages –Uniform programming model –Great tools –Security –Economics (cyberbricks) –Move computation to data (minimize traffic) All Device Controllers will be Cray 1’s Central Processor & Memory

44 With Tera Byte Interconnect and Super Computer Adapters Processing is incidental to –Networking –Storage –UI Disk Controller/NIC is –faster than device –close to device –Can borrow device package & power So use idle capacity for computation. Run app in device. Both Kim Keeton (UCB) and Erik Riedel (CMU) thesis investigate this show benefits of this approach. Tera Byte Backplane

45 Implications Offload device handling to NIC/HBA higher level protocols: I2O, NASD, VIA, IP, TCP… SMP and Cluster parallelism is important. Tera Byte Backplane Move app to NIC/device controller higher-higher level protocols: CORBA / COM+. Cluster parallelism is VERY important. Central Processor & Memory ConventionalRadical

46 How Do They Talk to Each Other? Each node has an OS Each node has local resources: A federation. Each node does not completely trust the others. Nodes use RPC to talk to each other –CORBA? COM+? RMI? –One or all of the above. Huge leverage in high-level interfaces. Same old distributed system story. SAN SIO streams datagrams RPC? Applications SIO streams datagrams RPC? Applications

47 Basic Argument for x-Disks Future disk controller is a super-computer. –1 bips processor –128 MB dram –100 GB disk plus one arm Connects to SAN via high-level protocols –RPC, HTTP, DCOM, Kerberos, Directory Services,…. –Commands are RPCs –management, security,…. –Services file/web/db/… requests –Managed by general-purpose OS with good dev environment Move apps to disk to save data movement –need programming environment in controller

48 The Slippery Slope If you add function to server Then you add more function to server Function gravitates to data. Nothing = Sector Server Everything = App Server Something = Fixed App Server

49 Why Not a Sector Server? (let’s get physical!) Good idea, that’s what we have today. But –cache added for performance –Sector remap added for fault tolerance –error reporting and diagnostics added –SCSI commends (reserve,.. are growing) –Sharing problematic (space mgmt, security,…) Slipping down the slope to a 2-D block server

50 Why Not a 1-D Block Server? Put A LITTLE on the Disk Server Tried and true design –HSC - VAX cluster –EMC –IBM Sysplex (3980?) But look inside –Has a cache –Has space management –Has error reporting & management –Has RAID 0, 1, 2, 3, 4, 5, 10, 50,… –Has locking –Has remote replication –Has an OS –Security is problematic –Low-level interface moves too many bytes

51 Why Not a 2-D Block Server? Put A LITTLE on the Disk Server Tried and true design –Cedar -> NFS –file server, cache, space,.. –Open file is many fewer msgs Grows to have –Directories + Naming –Authentication + access control –RAID 0, 1, 2, 3, 4, 5, 10, 50,… –Locking –Backup/restore/admin –Cooperative caching with client File Servers are a BIG hit: NetWare™ –SNAP! is my favorite today

52 Why Not a File Server? Put a Little on the Disk Server Tried and true design –Auspex, NetApp,... – Netware Yes, but look at NetWare –File interface gives you app invocation interface –Became an app server Mail, DB, Web,…. –Netware had a primitive OS Hard to program, so optimized wrong thing

53 Why Not Everything? Allow Everything on Disk Server (thin client’s) Tried and true design –Mainframes, Minis,... –Web servers,… –Encapsulates data –Minimizes data moves –Scaleable It is where everyone ends up. All the arguments against are short-term.

54 The Slippery Slope If you add function to server Then you add more function to server Function gravitates to data. Nothing = Sector Server Everything = App Server Something = Fixed App Server

55 Outline The Surprise-Free Future (5 years) –Astonishing hardware progress. Some consequences –Absurd (?) consequences. –Auto-manage storage –Raid10 replaces Raid5 –Disc-packs –Disk is the archive media of choice A surprising future? –Disks (and other useful things) become supercomputers. –Apps run “in the disk”