CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: 845-4259 Notes #5.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Typically magnetic disks, magneto­ optical (erasable), CD­ ROM.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
12/3/2004EE 42 fall 2004 lecture 391 Lecture #39: Magnetic memory storage Last lecture: –Dynamic Ram –E 2 memory This lecture: –Future memory technologies.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 8Gb Typically magnetic disks, magneto­ optical (erasable),
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Access times in milliseconds, great variability. Unit.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Chapter 8 File Processing and External Sorting. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices.
External Sorting Problem: Sorting data sets too large to fit into main memory. –Assume data are stored on disk drive. To sort, portions of the data must.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
1 6 Further System Fundamentals (HL) 6.2 Magnetic Disk Storage.
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Storage.
Chapter 2. Data Storage Chapter 2.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Indexing.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13 (Sec ): Ramakrishnan & Gehrke and Chapter 11 (Sec ): G-M et al. (R2) OR.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Section 13.2 – Secondary storage management (Former Student’s Note)
Section 13.1 – Secondary storage management (Former Student’s Note)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Magnetic Disk Rotational latency Example Find the average rotational latency if the disk rotates at 20,000 rpm.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
Computer Systems Objectives: To gain an understanding of the types of computer systems. Be able to identify the main components. Understand the difference.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Computer Science 210 Computer Organization
CPSC-608 Database Systems
Disks and Files DBMS stores information on (“hard”) disks.
CPSC-608 Database Systems
Lecture 11: DMBS Internals
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
CPSC-310 Database Systems
Persistence: hard disk drive
CPSC-608 Database Systems
CPSC-608 Database Systems
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5

Computer Memory Hierarchy

CPU main memory Disk controller Secondary Storage... A Typical Computer bus disks

Main Memory fast small capacity (gigabytes) volatile Disks slow large capacity (100’s gigabytes) non-volatile

Typical Disk Terms: Platter, Head, Actuator, Cylinder, Track, Sector, Gap …

Top View Track Sector Gap

A “typical” disk 5 platters (thus 10 surfaces) A surface has 20,000 tracks A track has 500 sectors (million bytes) A sector has several thousand bytes Disk makes 5000 revolutions per minute (so about 10 millisecond per rotation)

Blocks A (logic) block = one or several sectors Block address Physical device Cylinder # Surface # Sector

Disk Access Time block X in memory ? I want block X Time = Seek Time + Rotational Delay + Transfer Time + Other

Seek Time 3 or 5x x 1N Cylinders Traveled Time

Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i typical seek time: 10 ms  40 ms

Rotational Delay Head here Block I want Average Rotational Delay R = 1/2 revolution typical rotational delay = 8 ms

Transfer Rate: typical: t = 80 MB/second = 80 KB/millisecond transfer time: block size / t

Other Delays CPU time to issue I/O Contention for controller Contention for bus, memory Typical value: ≈ 0

Thus, reading a block of 16K bytes: Time = Seek Time + Rotational Delay + Transfer Time + Other ~ 30 ms + 8 ms + 16/80 ms + 0 ~ 40 ms

Main Memory fast (read/write: nanosecond) small capacity (gigabytes) volatile Disks slow (read/write: 1~40 millisecond) large capacity (100’s gigabytes) non-volatile Disks are about 10 5 ~10 6 times slower than main memory

I/O Model of Computation Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation.

Example. Sorting on disk Each tuple (with a key) takes 160 bytes Each block holds 100 tuples (16KB) A relation R has 10M tuples (1.6 GB, 100K blocks) Main memory has 100MB (6400 blocks) A disk read/write: 40 ms

Main memory sorting algorithms heap sort: 10M * log 2 (10M) = 230M disk block read/write = 9200M ms = seconds > 100 day quick sort and merge sort: 2 * 100K (blocks) * log 2 (10M) = 4.6M disk block read/write = 184M ms = seconds > 2 day

Two-phase Multiway MergeSort Phase 1. making sorted sublist repeat fill the main memory with remaining tuples in R and sort them; write the sorted sublist (of 6400 blocks) back to disk Phase 2. Merging repeat bring in a block from each of the sorted sublist; merge them and put in an “output” block; write the “output” block back to disk when it is full

Two-phase Multiway MergeSort # sublists = 100K/6400 = 16 thus, in phase 2, we can easily hold a block for each sublist in the main memory Disk block read/write: 100K (blocks) * 4 = 400K disk block read/write = 16M ms = seconds < 4.5 hours