CS 405G: Introduction to Database Systems 25 Exercise Chen Qian University of Kentucky.

Slides:



Advertisements
Similar presentations
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Advertisements

B-tree. Why B-Trees When the data is too big, we will have to use disk storage instead of putting all the data in main memory In such case, we have to.
Tutorial 11 Reference for Hamming Code:
Data Organization - B-trees. A simple index Brighton A Downtown A Downtown A Mianus A Perry A A-101 A-102.
Hashing and Indexing John Ortiz.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
1 Lecture 8: Data structures for databases II Jose M. Peña
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 IT420: Database Management and Organization Storage and Indexing 14 April 2006 Adina Crăiniceanu
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Indexing.
COSC 2007 Data Structures II Chapter 15 External Methods.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
CS 405G: Introduction to Database Systems 20. Concurrent control, Storage.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Seminar 1 – CG171 Disk architecture. Intention Using the web and other resources: –Find out more about how disks work What do the terms seek time, rotational.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Storing Data Dina Said 1 1.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
CS 405G: Introduction to Database Systems
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CPSC 404 Assignment #1, Winter 2008 Term 2. Due: Wednesday, Feb 4, by 5 pm. Laks V.S. Lakshmanan.
Internal and External Sorting External Searching
1 Course 12 Problems. 2 Assessment 20 questions, each with 4 answers. Only one answer is correct. (20%) 3 problems (30%) Laboratory activity/test (50%)
CS 405G: Introduction to Database Systems Database Normalization.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
1 Indexing Lecture HW#3 & Project See course page for new instructions: submit source code and output of program on the given pairs of actors Can.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
CS 405G: Introduction to Database Systems 13b Exercise Chen Qian University of Kentucky.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Storing Data Dina Said.
Multiway Search Trees Data may not fit into main memory
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B+-Trees and Static Hashing
Midterm Review – Part I ( Disk, Buffer and Index )
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
B-Trees CSE 373 Data Structures CSE AU B-Trees.
Tree-Structured Indexes
Presentation transcript:

CS 405G: Introduction to Database Systems 25 Exercise Chen Qian University of Kentucky

6/5/2016Chen University of Kentucky2 Project presentation Demonstrate all functionalities of your application Complete some requests Introduce any additional features. The bonus part, if you have completed it.

Exercise: Functional dependency Suppose you are given a relation R with four attributes ABCD. For of the following sets of FDs, assuming those are the only dependencies that hold for R, do the following: (a) Identify the candidate key(s) for R. (b) Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF). Candidate keys: B R is in 2NF but not 3NF. 6/5/2016Chen University of Kentucky3

6/5/2016Chen University of Kentucky4 Review WorkOn (EID, Ename, , PID, hour) We say X -> Y is a partial dependency if there exist a X’  X such that X’ -> Y e.g. EID, PID -> Ename Otherwise, X -> Y is a full dependency e.g. EID, PID -> hours EIDPIDEname PnameHours John platform Ben 12349John Susan platform40

6/5/2016Chen University of Kentucky5 2 nd Normal Form Note about 2 nd Normal Form by definition, every nonprimary attribute is functionally dependent on every key of R In other words, R is in its 2 nd normal form if we could not find a partial dependency of a nonprimary key to a key in R.

Third normal form 3NF requires that there are no non-trivial functional dependencies of non-key attributes on something other than a superset of a candidate key. Recall: non-trivial FD means LHS has no intersection with RHS. In summary, all non-key attributes are mutually independent. 6/5/2016Chen University of Kentucky6

Exercise: Functional dependency Suppose you are given a relation R with four attributes ABCD. For of the following sets of FDs, assuming those are the only dependencies that hold for R, do the following: (a) Identify the candidate key(s) for R. (b) Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF). Candidate keys: BD R is in 1NF but not 2NF. 6/5/2016Chen University of Kentucky7

Exercise: Functional dependency Suppose you are given a relation R with four attributes ABCD. For of the following sets of FDs, assuming those are the only dependencies that hold for R, do the following: (a) Identify the candidate key(s) for R. (b) Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF). 3. ABC → D, D → A Candidate keys: ABC, BCD R is in 3NF but not BCNF 6/5/2016Chen University of Kentucky8

Boyce-Codd normal form (BCNF) BCNF requires that there are no non-trivial functional dependencies of attributes on something other than a superset of a candidate key (called a superkey). All attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). 6/5/2016Chen University of Kentucky9

Exercise: Functional dependency Suppose you are given a relation R with four attributes ABCD. For of the following sets of FDs, assuming those are the only dependencies that hold for R, do the following: (a) Identify the candidate key(s) for R. (b) Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF). 4. A → B, BC → D, A → C Candidate keys: A R is in 2NF but not 3NF (because of the FD: BC → D). 6/5/2016Chen University of Kentucky10

Exercise: Functional dependency Suppose you are given a relation R with four attributes ABCD. For of the following sets of FDs, assuming those are the only dependencies that hold for R, do the following: (a) Identify the candidate key(s) for R. (b) Identify the best normal form that R satisfies (1NF, 2NF, 3NF, or BCNF). 5. AB → C, AB → D, C → A, D → B (a) Candidate keys: AB, BC, CD, AD R is in 3NF but not BCNF (because of the FD: C → A). 6/5/2016Chen University of Kentucky11

Exercise: Concurrency control Consider the following actions taken by transaction T 1 on database objects X and Y : R(X), W(X), R(Y), W(Y) 1. Give an example of another transaction T 2 that, if run concurrently to transaction T without some form of concurrency control, could interfere with T 1. If the transaction T2 performed W(Y ) before T1 performed R(Y ), and then T2 aborted, the value read by T1 would be invalid and the abort would be cascaded to T1 (i.e. T1 would also have to abort). 6/5/2016Chen University of Kentucky12

Exercise: Concurrency control Consider the following actions taken by transaction T 1 on database objects X and Y : R(X), W(X), R(Y), W(Y) 2. Explain how the use of Strict 2PL would prevent interference between the two transactions. Strict 2PL would require T2 to obtain an exclusive lock on Y before writing to it. This lock would have to be held until T2 committed or aborted; this would block T1 from reading Y until T2 was finished, thus there would be no interference. 6/5/2016Chen University of Kentucky13

Exercise: disks Explain the terms seek time, rotational delay, and transfer time. 1. Seek time is the time taken to move the disk heads to the track on which a desired block is located. 2. Rotational delay is the waiting time for the desired block to rotate under the disk head; it is the time required for half a rotation on average, and is usually less than the seek time. 3. Transfer time is the time to actually read or write the data in the block once the head is positioned, i.e., the time for the disk to rotate over the block. 6/5/2016Chen University of Kentucky14

Exercise: disks If you have a large file that is frequently scanned sequentially, explain how you would store the pages in the file on a disk. A: The pages in the file should be stored ‘sequentially’ on a disk. We should put two ‘logically’ adjacent pages as close as possible. In decreasing order of closeness, they could be on the same track, the same cylinder, or an adjacent cylinder. 6/5/2016Chen University of Kentucky15

Exercise: disks Consider a disk with a sector size of 512 bytes, 2000 tracks per surface, 50 sectors per track, five double-sided platters, and average seek time of 10 msec. 1. What is the capacity of a track in bytes? What is the capacity of each surface? What is the capacity of the disk? bytes/track = bytes/sector × sectors/track = 512 × 50 = 25K bytes/surface = bytes/track × tracks/surface = 25K × 2000 = 50, 000K bytes/disk = bytes/surface× surfaces/disk = 50, 000K × 5 × 2 = 500, 000K 6/5/2016Chen University of Kentucky16

Exercise: disks Consider a disk with a sector size of 512 bytes, 2000 tracks per surface, 50 sectors per track, five double-sided platters, and average seek time of 10 msec. 3. If the disk platters rotate at 5400 rpm (revolutions per minute), what is the maximum rotational delay? If the disk platters rotate at 5400rpm, the time required for one complete rotation, which is the maximum rotational delay, is The average rotational delay is half of the rotation time, seconds. 6/5/2016Chen University of Kentucky17

Exercise: disks Consider a disk with a sector size of 512 bytes, 2000 tracks per surface, 50 sectors per track, five double-sided platters, and average seek time of 10 msec. 4. If one track of data can be transferred per revolution, what is the transfer rate? The capacity of a track is 25K bytes. Since one track of data can be transferred per revolution, the data transfer rate is 6/5/2016Chen University of Kentucky18

Exercise: disks Consider … average seek time of 10 msec. suppose that a block size of 1024 bytes is chosen. Suppose that a file containing 100,000 records of 100 bytes each is to be stored on such a disk and that no record is allowed to span two blocks. 5. What time is required to read a file containing 100,000 records of 100 bytes each sequentially? A file containing 100,000 records of 100 bytes needs 40 cylinders or 400 tracks in this disk. The transfer time of one track of data is seconds. Then it takes 400 × = 4.4seconds to transfer 400 tracks. This access seeks the track 40 times. The seek time is 40 × 0.01 = 0.4seconds. Therefore, total access time is = 4.8seconds. 6/5/2016Chen University of Kentucky19

Exercise: disks 6. What is the time required to read a file containing 100,000 records of 100 bytes each in a random order? Assume that each block request incurs the average seek time and rotational delay. For any block of data, averageaccesstime = seektime + rotationaldelay + transfertime The average access time for a block of data would be msec. For a file containing 100,000 records of 100 bytes, the total access time would be seconds. 6/5/2016Chen University of Kentucky20

Exercise: disks 6. What is the time required to read a file containing 100,000 records of 100 bytes each in a random order? Assume that each block request incurs the average seek time and rotational delay. For any block of data, averageaccesstime = seektime + rotationaldelay + transfertime The average access time for a block of data would be msec. For a file containing 100,000 records of 100 bytes, the total access time would be seconds. 6/5/2016Chen University of Kentucky21

Tree structure Each intermediate node can hold up to five pointers and four key values. Each leaf can hold up to four records Name all the tree nodes to be fetched to answer the following query: “Get all records with search key greater than 38.” 6/5/2016Chen University of Kentucky22

Tree structure Name all the tree nodes to be fetched to answer the following query: “Get all records with search key greater than 38.” I1, I2, and everything in the range [L2..L8]. 6/5/2016Chen University of Kentucky23

Tree structure inserting a record with search key 109 6/5/2016Chen University of Kentucky24

inserting a record with search key 109 6/5/2016Chen University of Kentucky25

Tree structure deleting the record with search key 81 from the original tree. 6/5/2016Chen University of Kentucky26

6/5/2016Chen University of Kentucky27

Tree structure Name a search key value such that inserting it into the (original) tree would cause an increase in the height of the tree. 6/5/2016Chen University of Kentucky28

Tree structure We can infer several things about subtrees A, B, and C. First of all, they each must have height one, since their “sibling” trees (those rooted at I2 and I3) have height one. Also, we know the ranges of these trees (assuming duplicates fit on the same leaf): subtree A holds search keys less than 10, B contains keys ≥ 10 and < 20, and C has keys ≥ 20 and < 30. In addition, each intermediate node has at least 2 key values and 3 pointers. 6/5/2016Chen University of Kentucky29

Tree structure Suppose that this is an ISAM index. What is the minimum number of insertions needed to create a chain of three overflow pages? 6/5/2016Chen University of Kentucky30

Tree structure If this is an ISAM tree, we would have to insert at least nine search keys in order to develop an overflow chain of length three. These keys could be any that would map to L4, L5, L7, or L8, all of which are full and thus would need overflow pages on the next insertion. The first insert to one of these pages would create the first overflow page, the fifth insert would create the second overflow page, and the ninth insert would create the third overflow page (for a total of one leaf and three overflow pages). 6/5/2016Chen University of Kentucky31

Want to present on 4/21? HW4 due on 4/21 6/5/2016Chen University of Kentucky32