1 Storage Bricks Jim Gray Microsoft Research FAST 2002 Monterey, CA, 29 Jan 2002 Acknowledgements : Dave Patterson.

Slides:



Advertisements
Similar presentations
Hardware Lesson 3 Inside your computer.
Advertisements

SOLPHY POLSKA Product Presentation SOLPHY Home Storage.
Computer Technology Forecast Jim Gray Microsoft Research
1 Store Everything Online In A Database Jim Gray Microsoft Research
1 Designing for 20TB Disk Drives And enterprise storage Jim Gray, Microsoft research.
Data Centric Computing
1 Mixing Public and private clouds a Practical Perspective Maarten Koopmans Nordunet Conference 2009 Maarten Koopmans Nordunet Conference 2009.
How Much Do I Remember? Are you ready to play.....
So far Binary numbers Logic gates Digital circuits process data using gates – Half and full adder Data storage – Electronic memory – Magnetic memory –
Database Systems: Design, Implementation, and Management
Computing Infrastructure
INFORMATION TECHNOLOGY, THE INTERNET, AND YOU
Our Digital World Second Edition
Gold Country Computer Learning Center March 2006 Is Wireless for You? Roger Thornburn.
IT253: Computer Organization
Tasks in Setting Up a Hard Disk
88 CHAPTER SECONDARY STORAGE. © 2005 The McGraw-Hill Companies, Inc. All Rights Reserved. 8-2 Competencies Distinguish between primary & secondary storage.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Tom Hamilton – America’s Channel Database CSE
McGraw-Hill Technology Education © 2006 by the McGraw-Hill Companies, Inc. All rights reserved. 88 CHAPTER SECONDARY STORAGE.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Secondary Storage.
Disk & RAID The first HDD (1956) IBM 305 RAMAC 4 MB 50x24 disks 1200 rpm 100 ms access 35k$/y rent Included computer & accounting software.
Fast Crash Recovery in RAMCloud
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
1 Disks Introduction ***-. 2 Disks: summary / overview / abstract The following gives an introduction to external memory for computers, focusing mainly.
Basic Principles of PACS Networking Emily Seto Medical Engineering/SIMS Center for Global eHealth Innovation April 29, 2004.
The IP Revolution. Page 2 The IP Revolution IP Revolution Why now? The 3 Pillars of the IP Revolution How IP changes everything.
Storing Data Chapter 4.
Describing Storage Devices Store data when computer is off Two processes –Writing data –Reading data Storage terms –Media is the material storing data.
Presented to CUGG by Jamie Leben 10/9/10 IT-Works Computer Services
Discovering Computers Fundamentals, 2012 Edition
Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
® Microsoft Office 2010 Essential Computer Concepts.
Computing ESSENTIALS CHAPTER Copyright 2003 The McGraw-Hill Companies, Inc.Copyright 2003 The McGraw-Hill Companies, Inc Secondary Storage computing.
88 CHAPTER SECONDARY STORAGE. © 2005 The McGraw-Hill Companies, Inc. All Rights Reserved. 8-2 Competencies Distinguish between primary & secondary storage.
Storage and Disks.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Introduction to Computer Administration Introduction.
Chapter 9: The Client/Server Database Environment
Processes Management.
Networks : What is a network?
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Introduction to Computers and Information Systems CE 100: Module 1: Hardware.
Hardware and Software Basics. Computer Hardware  Central Processing Unit - also called “The Chip”, a CPU, a processor, or a microprocessor  Memory (RAM)
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Instructions Slides 3,4,5 are general questions that you should be able to answer. Use slides 6-27 to answer the questions. Write your answers in a separate.
Distinguish between primary and secondary storage.
The Cost of Storage about 1K$/TB 12/1/1999 9/1/2000 9/1/2001 4/1/2002.
Introduction to computers. What is a personal computer? Capacity: Large hard disks combined with a large working memory (RAM) Speed: Fast. Normally measured.
Getting to Know Today’s Computer. Computer Devices What your computer can do depends upon the hardware your computer has, and the software it runs.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
Introduction to Computers Personal Computing 10. What is a computer? Electronic device Performs instructions in a program Performs four functions –Accepts.
Secondary Storage Chapter 7.
The Dawning of the Age of Infinite Storage William Perrizo Dept of Computer Science North Dakota State Univ.
Chapter 1 1.  The computer system consists of: 1. Hardware: Physical Components, like the system unit,monitor,keyboard, mouse, camera, printer … etc.
Comp 335 – File Structures Why File Structures?. Goal of the Class To develop an understanding of the file I/O process. Software must be able to interact.
Eng.Abed Al Ghani H. Abu Jabal Introduction to computers.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
Persistent Storage (disk?) Requirements (For The Low End ==the bottom 99%of the market ) Jim Gray Microsoft Research.
Component 4: Introduction to Information and Computer Science
Computer Guts and Operating Systems CSCI 101 Week Two.
1 Put Everything in Future (Disk) Controllers (it’s not “if”, it’s “when?”) Jim Gray Acknowledgements : Dave Patterson.
1 Yotta Zetta Exa Peta Tera Giga Mega Kilo Storage: Alternate Futures Jim Gray Microsoft Research Research.Micrsoft.com/~Gray/talks NetStore ’99 Seattle.
Storage Systems CSE 598d, Spring 2007 Lecture ?: Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
2: Operating Systems Networking for Home & Small Business.
TYPES OFF OPERATING SYSTEM
Distinguish between primary and secondary storage.
Presentation transcript:

1 Storage Bricks Jim Gray Microsoft Research FAST 2002 Monterey, CA, 29 Jan 2002 Acknowledgements : Dave Patterson explained this to me long ago Leonard Chung Kim Keeton Erik Riedel Catharine Van Ingen Helped me sharpen these arguments

2 First Disk 1956 IBM 305 RAMAC 4 MB 50x24 disks 1200 rpm 100 ms access 35k$/y rent Included computer & accounting software (tubes not transistors)

3 10 years later 1.6 meters

4 Disk Evolution Capacity:100x in 10 years 1 TB 3.5 drive in GB 1 micro-drive System on a chip High-speed SAN Disk replacing tape Disk is super computer! Kilo Mega Giga Tera Peta Exa Zetta Yotta

5 Disks are becoming computers Smart drives Camera with micro-drive Replay / Tivo / Ultimate TV Phone with micro-drive MP3 players Tablet Xbox Many more… Disk Ctlr + 1Ghz cpu+ 1GB RAM Comm: Infiniband, Ethernet, radio… Applications Web, DBMS, Files OS

6 Data Gravity Processing Moves to Transducers smart displays, microphones, printers, NICs, disks Storage Network Display ASIC Today: P=50 mips M= 2 MB In a few years P= 500 mips M= 256 MB Processing decentralized Moving to data sources Moving to power sources Moving to sheet metal ? The end of computers ?

7 Its Already True of Printers Peripheral = CyberBrick You buy a printer You get a –several network interfaces –A Postscript engine cpu, memory, software, a spooler (soon) –and… a print engine.

8 The Absurd Design? Segregate processing from storage Poor locality Much useless data movement Amdahls laws: bus: 10 B/ips io: 1 b/ips Processors Disks ~ 1 Tips RAM ~ 1 TB ~ 100TB 100 GBps 10 TBps

9 The Absurd Disk 2.5 hr scan time (poor sequential access) 1 aps / 5 GB (VERY cold data) Its a tape! Optimizations: –Reduce management costs –Caching –Sequential 100x faster than random 1 TB 100 MB/s 200 Kaps 200$

10 Disk = Node magnetic storage (1TB) processor + RAM + LAN Management interface (HTTP + SOAP) Application execution environment Application –File –DB2/Oracle/SQL –Notes/Exchange/ TeamServer –SAP/Seibold/… –Quickbooks /Tivo/ PC.… OS Kernel LAN driverDisk driver File SystemRPC,... ServicesDBMS Applications

11 Implications Offload device handling to NIC/HBA higher level protocols: I2O, NASD, VIA, IP, TCP… SMP and Cluster parallelism is important. Terabyte/s Backplane Move app to NIC/device controller higher-higher level protocols: SOAP/DCOM/RMI.. Cluster parallelism is VERY important. Central Processor & Memory ConventionalRadical

12 Intermediate Step: Shared Logic Brick with 8-12 disk drives 200 mips/arm (or more) 2xGbpsEthernet General purpose OS 10k$/TB to 50k$/TB Shared –Sheet metal –Power –Support/Config –Security –Network ports These bricks could run applications (e.g. SQL or Mail or..) Snap ~1TB 12x80GB NAS NetApp ~.5TB 8x70GB NAS Maxstor ~2TB 12x160GB NAS

13 Example Homogenous machines leads to quick response through reallocation HP desktop machines, 320MB RAM, 3u high, 4 100GB IDE Drives $4k/TB (street), 2.5processors/TB, 1GB RAM/TB JIT storage & processing 3 weeks from order to deploy Slide courtesy of Brewster Archive.org

14 What if Disk Replaces Tape? How does it work? Backup/Restore –RAID (among the federation) –Snapshot copies (in most OSs) –remote replicas (standard in DBMS and FS) Archive –Use cold 95% of disk space Interchange –Send computers not disks.

15 Its Hard to Archive a Petabyte It takes a LONG time to restore it. At 1GBps it takes 12 days! Store it in two (or more) places online A geo-plex Scrub it continuously (look for errors) On failure, –use other copy until failure repaired, –refresh lost copy from safe copy. Can organize the two copies differently (e.g.: one by time, one by space)

16 Archive to Disk 100TB for 0.5M$ free petabytes If you have 100 TB active you need 10,000 mirrored disk arms (see tpcC) So you have 1.6 PB of (mirrored) storage (160GB drives) Use the empty 95% for archive storage. No extra space or extra power cost. Very fast access (milliseconds vs hours). Snapshot is read-only (software enforced ) Makes Admin easy (saves people costs)

17 Disk as Tape Archive Tape is unreliable, specialized, slow, low density, not improving fast, and expensive Using removable hard drives to replace tapes function has been successful When a tape is needed, the drive is put in a machine and it is online. No need to copy from tape before it is used. Portable, durable, fast, media cost = raw tapes, dense. Unknown longevity: suspected good. Slide courtesy of Brewster Archive.org

18 Disk as Tape Interchange Tape interchange is frustrating (often unreadable) Beyond 1-10 GB send media not data –FTP takes too long (hour/GB) –Bandwidth still very expensive (1$/GB) Writing DVD not much faster than Internet New technology could change this –100 GB 10MBps would be competitive. Write 1TB disk in 2.5 hrs (at 100MBps) But, how does interchange work?

19 Disk As Tape Interchange: What format? Today I send 160GB NTFS/SQL disks. But that is not a good format for Linux/DB2 users. Solution: Ship NFS/CIFS/ODBC servers (not disks) Plug disk into LAN. –DHCP then file or DB server via standard interface. –pull data from server.

20 Some Questions What is the product? How do I manage 10,000 nodes (disks)? How do I program 10,000 nodes (disks)? How does RAID work? How do I backup a PB? How do I restore a PB?

21 What is the Product? Concept: Plug it in and it works! Music/Video/Photo appliance (home) Game appliance PC File server appliance Data archive/interchange appliance Web server appliance DB server appliance Application appliance power network

22 How Does Scale Out Work? Files: well known designs: –rooted tree partitioned across nodes –Automatic cooling (migration) –Mirrors or Chained declustering –Snapshots for backup/archive Databases: well known designs –Partitioning, remote replication similar to files –distributed query processing. Applications: (hypothetical) –Must be designed as mobile objects –Middleware provides object migration system Objects externalize methods to migrate ( == backup/restore/archive) Web services seem to have key ideas (xml representation) –Example: object is mailbox

23 Auto Manage Storage 1980 rule of thumb: –A DataAdmin per 10GB, SysAdmin per mips 2000 rule of thumb –A DataAdmin per 5TB –SysAdmin per 100 clones (varies with app). Problem: –5TB is 50k$ today, 5k$ in a few years. –Admin cost >> storage cost !!!! Challenge: –Automate ALL storage admin tasks

24 Admin: TB and guessed $/TB (does not include cost of application, overhead, not substance) Google:1 :100TB 5k$/TB/y Yahoo!1 : 50TB 20k$/TB/y DB1 : 5TB 60k$/TB/y Wall St.1 : 1TB 400k$/TB/y (reported) hardware dominant cost Google. How can we waste hardware to save people cost?

25 How do I manage 10,000 nodes? You cant manage 10,000 x (for any x). They manage themselves. –You manage exceptional exceptions. Auto Manage –Plug & Play hardware –Auto-load balance & placement storage & processing –Simple parallel programming model –Fault masking

26 How do I program 10,000 nodes? You cant program 10,000 x (for any x). They program themselves. –You write embarrassingly parallel programs –Examples: SQL, Web, Google, Inktomi, HotMail,…. –PVM and MPI prove it must be automatic (unless you have a PhD)! Auto Parallelism is ESSENTIAL

27 Summary Disks will become supercomputers so –Lots of computing to optimize the arm –Can put app close to the data (better modularity, locality) –Storage appliances (self-organizing) The arm/capacity tradeoff: waste space to save access. –Compression (saves bandwidth) –Mirrors –Online backup/restore –Online archive (vault to other drives or geoplex if possible) Not disks replace tapes: Storage appliances replace tapes. Self-organizing storage servers (file systems) (prototypes of this software exist)