1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW 11.2-11.5], [Sanders03, 1.3-1.5], and [MaheshwariZeh03, 3.1-3.2])

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

IT253: Computer Organization
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics.
The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Typically magnetic disks, magneto­ optical (erasable), CD­ ROM.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
The Memory Hierarchy fastest, perhaps 1Mb
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
SECTION 13.3 Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
1 Chapter 6 Storage and Multimedia: The Facts and More.
1 Lecture 26: Storage Systems Topics: Storage Systems (Chapter 6), other innovations Final exam stats:  Highest: 95  Mean: 70, Median: 73  Toughest.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
12/3/2004EE 42 fall 2004 lecture 391 Lecture #39: Magnetic memory storage Last lecture: –Dynamic Ram –E 2 memory This lecture: –Future memory technologies.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 Introduction to Computers Day 4. 2 Storage device A functional unit into which data can be –placed –retained(stored) –retrieved(accessed)
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
File Processing : Storage Media 2015, Spring Pusan National University Ki-Joune Li.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Disk Basics CS Introduction to Operating Systems.
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Data Storage and Querying in Various Storage Devices.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
Lecture 11: DMBS Internals
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ]) Slides based on Notes 02: Hardware for Stanford CS 245, fall 2002 by Hector Garcia-Molina

2 Today  Hardware, external memory  Disk access times  The I/O model  Case study: Sorting  Lower bound for sorting  Optimizing disk usage

3 P M A Typical Computer Secondary Storage (Hard Disks)... C C P = Processor C = Cache M = Main Memory

4 Storage Capacity access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) from Gray & Reuter (2002)

5 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter (2002)

6 Caching Cache: Memory holding frequently used parts of a slower, larger memory. Examples:  A small (L1) cache holds a few kilobytes of the memory "most recently used" by the processor.  Most operating systems keep the most recently used "pages" of memory in main memory and put the rest on disk.

7 Virtual memory  In most operating systems, programs don't know if they access main memory or a page that resides on secondary memory.  This is called virtual memory (the book is a little fuzzy on this).  Database systems usually take explicit control over secondary memory accesses.

8 Secondary storage Many flavours: - Disk: Floppy, Winchester, Optical, CD-ROM (arrays), DVD-R,… - TapeReel, cartridge robots

9 Typical Disk Terms: - Platter, Head, Cylinder, Track Sector (all physical). - Block (logical).

10 Typical Numbers Diameter: 1 inch  15 inches Cylinders:40 (floppy)  Surfaces:1 (CDs)  2 (floppies)  30 Sector Size:512B  50K Capacity:1.4 MB (floppy)  60 GB (my laptop)

11 Disk Access Time block x in memory ? I want block X

12 Time = Seek Time + Rotational Delay + Transfer Time + Other

13 Seek Time 3 or 5x x 1N Cylinders travelled Time

14 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i Typical S: 10 ms  40 ms 10s of millions clock cycles!

15 Rotational Delay Head Here Block I Want

16 Average Rotational Delay R = 1/2 revolution T ypical R = 4.16 ms (7200 RPM)

17 Transfer Rate (t)  t ypical t: 10  50 MB/second  transfer time: block size t  E.g., block size 32 kB, t=32 MB/second gives transfer time 1 ms.

18 Other Delays  CPU time to issue I/O  Contention for controller  Contention for bus, memory Typical value: Nearly 0

19  So far: Random Block Access  What about: Reading the next block?

20 If we don't need to change cylinder Time to get = Block Size + Negligible block t - switch track - once in a while, next cylinder

21 Rule of thumb Random I/O: Expensive Sequential I/O: Less expensive  Example: 1 KB Block »Random I/O:  20 ms. »Sequential I/O:  1 ms. • However, the relative difference is smaller if we use larger blocks.

22 Cost for Writing similar to Reading If w e want to verify: Need to add (full) rotation + Block size t

23 To Modify a Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

24 Problem session We usually express the running time of algorithms as the number of operations.  Argue that this can be misleading when data is stored on disk. Consider:  Sorting a list of integers that fit in internal memory.  Adding a list of integers.  What should we count instead?  How do we do the counting?

25 The I/O model of computation  Count the number of disk blocks read or written by an algorithm (I/Os).  Ignore the number of operations! (?)  Explicit control of which blocks are in main memory.  Notation: M = size of main memory B = size of disk blocks N = size of data set

26 Problem session Consider the following sorting algorithms:  Mergesort  Quicksort  Bubble sort  What are the (worst case) running times in internal memory?  What is the number of I/Os if we use the algorithm on external memory - if no caching is done? - when storing the M/B most recently accessed blocks in main memory?

27 External memory sorting Let's see if we can do better..! (Perhaps as well as in [MaheshwariZeh03, 3.2])

28 Sorting in GUW  More practical viewpoint: Two passes over the data enough unless data is huge.  TPMMS = Two-pass multi-way mergesort.  More general treatment in [MaheshwariZeh03, 3.2]

29 External memory sorting Let's see if we did as well as we possibly could: A lower bound on the number of I/Os for sorting (based on [Sanders03, 1.5])

30 Optimizations Some practically oriented ways of speeding up external memory algorithms:  Disk Scheduling  e.g., using the "elevator algorithm"  reduces average seek time when there are multiple simultaneous "unpredictable" requests  Track buffer / cylinder buffer  good when data are arranged and accessed "by cylinder"

31  Pre-fetching  Speeds up access when the needed blocks are known, but the order of requests is data dependent  Disk arrays  Increases the rate at which data can be transferred  Striping: Blocks from each disk are grouped to form a large logical block  Mirrored disks  Same speed for writing  Several blocks can be read in parallel

32 Block Size Selection  Big Block  Amortize I/O Cost  Big Block  Read in more useless stuff! Takes longer to read  Trend: Blocks get bigger Unfortunately...

33 Problem session Which of the mentioned optimizations apply to (parts of) the sorting algorithm we developed earlier?  Disk scheduling  Cylinder buffer  Pre-fetching  Disk arrays  Mirrored disks

34 Summary  External memories are complicated.  Essential features captured by the I/O model (to be used henceforth).  We saw matching I/O upper and lower bounds for sorting.  A bit (and the last bit) about optimizing constant factors in external memory access.

35 Next week  How relations are stored on external memory.  Simple index structures.