Database: Review Sept. 2009Yangjun Chen ACS-39021 Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational.

Slides:



Advertisements
Similar presentations
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
Advertisements

File Organizations Sept. 2012Yangjun Chen ACS Outline: Hashing (5.9, 5.10, 3 rd. ed.; 13.8, 4 th, 5 th ed.; 17.8, 6 th ed.) external hashing static.
OUTLINE OF THE LECTURE PART I GOAL: Understand the Data Definition Statements in Fig 4.1 Step1: Columns of the Tables and Data types. Step2: Single column.
The Relational Algebra
SQL Lecture 10 Inst: Haya Sammaneh. Example Instance of Students Relation  Cardinality = 3, degree = 5, all rows distinct.
Overview Begin 6:00 Quiz15 mins6:15 Review Table Terms25 mins6:40 Short Break10 mins6:50 SQL: Creating Tables60 mins7:50 Break10 mins8:00 Lab – Creating.
B + -Trees Sept. 2012Yangjun Chen ACS B + -Tree Construction and Record Searching in Relational DBs Chapter 6 – 3rd (Chap. 14 – 4 th, 5 th ed.; Chap.
Exploring Microsoft Access 2003 Chapter 4 Proficiency: Relational Databases, External Data, Charts, Pivot, and the Switchboard.
Database system concepts and architecture Sept. 2012Yangjun Chen ACS Outline: Concepts and Architecture (Chapter 2 – 3 rd, 4 th, 5 th, and 6 th ed.)
Mapping an ERD to a Relational Database To map an ERD to a relational database, five rules are defined to govern how tables are constructed. 1)Rule for.
Databases and Database Users Sept. 2012Yangjun Chen ACS Outline: Introduction (Chapter 1 – 3 rd, 4 th, 5 th, 6 th ed.) What is a database? The main.
Data modeling using the entity-relationship model Sept. 2012Yangjun Chen ACS Outline: Data modeling using ER-model (Chapter 3 - 3rd, 4th, 5th ed.)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3 The Basic (Flat) Relational Model.
- relation schema, relations - database schema, database state
Review Database Application Development Access Database Development ER-diagram Forms Reports Queries.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Jyh-haw Yeh Dept. of Computer Science Boise State University
File Organizations March 2007R McFadyen ACS In SQL Server 2000 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Databases and Database Users Winter 2007Ron McFadyen ACS Database applications Database Database Management System Characteristics Actors Advantages.
File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical.
Relational Data Model Sept. 2014Yangjun Chen ACS Outline: Relational Data Model Relational Data Model -relation schema, relations -database schema,
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
Database Systems Relational Model Concepts Toqir Ahmad Rana Database Management Systems 1 Lecture 17.
Chapter 5 Relational Model Concepts Dr. Bernard Chen Ph.D. University of Central Arkansas.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Databases and Database Users Jan. 2008Yangjun Chen ACS Outline: Introduction What is a database? The main characters of a database system The basic.
CS 380 Introduction to Database Systems (Chapter 5: The Relational Data Model and Relational Database Constraints)
1 DATABASE By Mr. Abdalla A. Shaame.  What is a database?  The main characters of a database system  The basic database design method  The entity-relationship.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
1 The Relational Data Model, Relational Constraints, and The Relational Algebra.
Review: Application of Database Systems
Instructor: Churee Techawut Basic Concepts of Relational Database Chapter 5 CS (204)321 Database System I.
DatabaseIM ISU1 Fundamentals of Database Systems Chapter 5 The Relational Data Model.
1 CSE 480: Database Systems Lecture 5: Relational Data Model.
Relational Algebra - Chapter (7th ed )
CS 380 Introduction to Database Systems Chapter 7: The Relational Algebra and Relational Calculus.
SQL Structured Query Language Programming Course.
Onsdag The concepts in a relation data model SQL DDL DML.
CSCI 3140 Module 3 – Logical Database Design for the Relational Model Theodore Chiasson Dalhousie University.
1.1 CAS CS 460/660 Relational Model. 1.2 Review E/R Model: Entities, relationships, attributes Cardinalities: 1:1, 1:n, m:1, m:n Keys: superkeys, candidate.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
CSCI DBMS Environment1 The Database System Environment Dr. Awad Khalil Computer Science Department AUC.
Database: Review Sept. 2004Yangjun Chen Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational.
File Organizations Jan. 2008Yangjun Chen ACS Outline: Hashing (5.9, 5.10, 3 rd. ed.; 13.8, 4 th ed.) external hashing static hashing & dynamic hashing.
1 CS 430 Database Theory Winter 2005 Lecture 4: Relational Model.
CSE314 Database Systems Lecture 3 The Relational Data Model and Relational Database Constraints Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Review Database Application Development Access Database Development Theory Practice.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CS34311 The Relational Model. cs34312 Why Relational Model? Currently the most widely used Vendors: Oracle, Microsoft, IBM Older models still used IBM’s.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Lecture 03 Constraints. Example Schema CONSTRAINTS.
Constraints and Views Chap. 3-5 continued (7 th ed. 5-7)
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Chapter 71 The Relational Data Model, Relational Constraints & The Relational Algebra.
Introduction to the database systems (1)
376a. Database Design Dept. of Computer Science Vassar College
Outline: Relational Data Model
CS4222 Principles of Database System
B+-Trees and Static Hashing
static hashing & dynamic hashing hash function
CMPT 354: Database System I
Review: Application of Database Systems
Database Design: Relational Model
1. Explain the following concepts: (a) superkey (b) key
1.(5) Describe the working process with a database system.
Mapping an ERD to a Relational Database
1. Explain the following concepts of the ER data model:
Presentation transcript:

Database: Review Sept. 2009Yangjun Chen ACS Database Introduction system architecture, Basic concepts, ER-model, Data modeling, B+-tree Hashing Relational algebra, Relational data model SQL: DDL, DML not included

Database: Review Sept. 2009Yangjun Chen ACS Introduction to the database systems What is a database? The main characters of a database The basic database design method The entity-relationship data model for application modeling

Database: Review Sept. 2009Yangjun Chen ACS The main characteristics of the database approach: single repository of data sharable by multiple users concurrency control and transaction concept security and integrity constraints self-describing - system catalogue contains meta data program-data independence some changes to the database are transparent to programs/users multiple views of data - to support individual needs of programs/users

Database: Review Sept. 2009Yangjun Chen ACS Data modeling using ER-model Entity-relationship model -Entity types -strong entities -weak entities -Relationships among entities -Attributes - attribute classification -Constraints -cardinality constraints -participation constraints ER-to-Relation-mapping

Database: Review Sept. 2009Yangjun Chen ACS employee department project dependent ER-model: works for manages works on dependents of controls supervision bdate ssn name lname minit fname sex address salary birthdatenamesex relationship name numberlocation name number location number of employees startdate hours N supervisorsupervisee N M N 1 M N1 M

Database: Review Sept. 2009Yangjun Chen ACS Database schema, Schema evolution, Database state Working process with a database system Database system architecture Data independence concept Concepts and Architecture

Database: Review Sept. 2009Yangjun Chen ACS Database schema Relation schema Schema evolution Database state Student Name StNo Class Major Smith 17 1 CS Brown 8 2 CS Course CName CNo CrHrs Dept Database CS C CS Section SId CNo Semester Yr Instructor Spring 2000 Smith Winter 2000 Smith Spring 2000 Jones Grades StNo Sid Grade A B

Database: Review Sept. 2009Yangjun Chen ACS Working process with a database system: Definition record structure data elements names data types constraints etc Construction create database files populate the database with records Manipulation querying updating

Database: Review Sept. 2009Yangjun Chen ACS Database Management System (DBMS) collection of software facilitating the definition, construction and manipulation of databases Users/ actors Request manager Storage manager, Query evaluation Meta data Stored database DBMS

Database: Review Sept. 2009Yangjun Chen ACS Three-schema architecture External view External view Conceptual schema Internal schema Physical storage structures and details Describes the whole database for all users A specific user or groups view of the database

Database: Review Sept. 2009Yangjun Chen ACS external hashing static hashing & dynamic hashing hash function mathematical function that maps a key to a bucket address collisions collision resolution scheme - open addressing - chaining - multiple hashing linear hashing Hashing technique

Database: Review Sept. 2009Yangjun Chen ACS External hashing: the data are on the disk. Static hashing: using a hashing function to map keys to bucket addresses primary area can not be changed collision resolusion scheme: open addressing chaining multiple hashing Dynamic hashing: primary area can be changed linear hashing

Database: Review Sept. 2009Yangjun Chen ACS Linear hashing: 1.What is a phase? 2.When to split a bucket? 3.How to split a bucket? 4.What bucket will be chosen to split next? 5.How do we find a record inserted into a linear hashing file?

Database: Review Sept. 2009Yangjun Chen ACS Linear hashing: initially hash file contains M buckets h i = key mod (2 i  M) (i = 0, 1, 2,...) insertion process can be divided into several phases phase 1: insertion using h 0 = key mod M splitting using h 1 = key mod (2  M) splitting rule: overflow of a bucket or if load factor > constant (e.g., 0.70) overflow will be put in the overflow area or redistributed through splitting a bucket splitting buckets from n = 0 to n = M- 1 (after each splitting n is increased by 1. Phase 1 finishes when n = M (in this case, the primary area becomes 2  M buckets long)

Database: Review Sept. 2009Yangjun Chen ACS phase 2: insertion using h 1 = key mod (2  M) splitting using h 2 = key mod (4  M) splitting rule: overflow of a bucket or if load factor > constant (e.g., 0.70) overflow will be put in the overflow area or redistributed through splitting a bucket splitting buckets from n = 0 to n = 2  M- 1 (after each splitting n is increased by 1. Phase 1 finishes when n = 2  M (in this case, the primary area will contain 4  M buckets.) phase 3:... … h 2 = …, h 3 = …,...

Database: Review Sept. 2009Yangjun Chen ACS Linear Hashing including two Phases: - collision resolution strategy: chaining -split rule: load factor > 0.7 -initially M = 4 (M: size of the primary area) -hash functions: h i (key) = key mod 2 i  M (i = 0, 1, 2, …) -bucket capacity = 2 Trace the insertion process of the following keys into a linear hashing file: 3, 2, 4, 1, 8, 14, 5, 10, 7, 24, 17, 13, 15.

Database: Review Sept. 2009Yangjun Chen ACS The first phase – phase 0 when inserting the sixth record we would have but the load factor 6/8= 0.75 > 0.70 and so bucket 0 must be split (using h 1 = Key mod 2M): n=0 before the split (n is the point to the bucket to be split.) n=1 after the split load factor: 6/10=0.6 no split

Database: Review Sept. 2009Yangjun Chen ACS n=1 load factor: 7/10=0.7 no split insert(5)

Database: Review Sept. 2009Yangjun Chen ACS n=1 load factor: 8/10=0.8 split using h 1. insert(10) overflow

Database: Review Sept. 2009Yangjun Chen ACS n=2 load factor: 8/12=0.66 no split 10 overflow 5

Database: Review Sept. 2009Yangjun Chen ACS n=2 load factor: 9/12=0.75 split using h overflow overflow 5 insert(7)

Database: Review Sept. 2009Yangjun Chen ACS n=3 load factor: 9/14=0.642 no split insert(24)

Database: Review Sept. 2009Yangjun Chen ACS n=3 load factor: 10/14=0.71 split using h

Database: Review Sept. 2009Yangjun Chen ACS n= The second phase – phase n = 0; using h 1 = Key mod 2M to insert and h 2 = Key mod 4M to split. insert(17)

Database: Review Sept. 2009Yangjun Chen ACS n= The second phase – phase n = 0; using h 1 = Key mod 2M to insert and h 2 = Key mod 4M to split. insert(17)

Database: Review Sept. 2009Yangjun Chen ACS n=0 load factor: 11/16=0.687 no split insert(13)

Database: Review Sept. 2009Yangjun Chen ACS n=0 load factor: 12/16=0.75 split bucket 0, using h

Database: Review Sept. 2009Yangjun Chen ACS n=1 load factor: 13/18=0.722 split bucket 1, using h insert(15)

Database: Review Sept. 2009Yangjun Chen ACS tree - root, internal, leaf, subtree - parent, child, sibling balanced, unbalanced b + -tree - splits on overflow; merge on underflow - in practice it is usually 3 or 4 levels deep search, insert, delete algorithms Multi-level index

Database: Review Sept. 2009Yangjun Chen ACS B + -tree Structure non-leaf node (internal node or a root) (q  p internal ) K 1 < K 2 <... < K q-1 (i.e. it’s an ordered set) For any key value, X, in the subtree pointed to by P i K i-1 < X  K i for 1 < i < q X  K 1 for i = 1 K q-1 < X for i = q Each internal node has at most p internal pointers. Each node except root must have at least  p internal /2  pointers. The root, if it has some children, must have at least 2 pointers.

Database: Review Sept. 2009Yangjun Chen ACS B + -tree Structure leaf node (terminal node) K 1 < K 2 <... < K q-1 Pr i points to a record with key value K i, or Pr i points to a page containing a record with key value K i. Maximum of p leaf key/pointer pairs. Each leaf has at least  p leaf /2  keys. All leaves are at the same level (balanced). P next points to the next leaf node for key sequencing.

Database: Review Sept. 2009Yangjun Chen ACS A B + -tree Records in a file p internal = 3, p leaf = 2.

Database: Review Sept. 2009Yangjun Chen ACS B+-tree insertion: leaf node splitting, internal node splitting Leaf splitting When a leaf splits, a new leaf is allocated the original leaf is the left sibling, the new one is the right sibling key and pointer pairs are redistributed: the left sibling will have smaller keys than the right sibling a 'copy' of the key value which is the largest of the keys in the left sibling is promoted to the parent insert 31

Database: Review Sept. 2009Yangjun Chen ACS Internal node splitting If an internal node splits and it is not the root, insert the key and pointer and then determine the middle key a new 'right' sibling is allocated everything to its left stays in the left sibling everything to its right goes into the right sibling the middle key value along with the pointer to the new right sibling is promoted to the parent (the middle key value 'moves' to the parent to become the discriminator between this left and right sibling) Insert 26 33

Database: Review Sept. 2009Yangjun Chen ACS Internal node splitting When a new root is formed, a key value and two pointers must be placed into it Insert

Database: Review Sept. 2009Yangjun Chen ACS Deleting nodes from a B+-tree: 1. When deleting a key from a node A, check whether the number of the remaining keys (or pointers) is   p/2 . 2. If it is not the case, redistribute the keys in the left sibling B or in the right sibling C if it is possible. Otherwise, merge A and B or merge A and C. 3.When redistributing or merging, change the key values in the parent node so that the following condition is satisfied: K 1 < K 2 <... < K q-1 (i.e. it is an ordered set) for the key values, X, in the subtree pointed to by P i K i-1 < X <= K i for 1 < i < q X <= K 1 for i = 1 K q-1 < X for i = q

Database: Review Sept. 2009Yangjun Chen ACS A b + -tree Records p internal = 3, p leaf = 2.

Database: Review Sept. 2009Yangjun Chen ACS Entry deletion - deletion sequence: 8, 12, 9, Deleting 8 causes the node redistribute.

Database: Review Sept. 2009Yangjun Chen ACS Entry deletion - deletion sequence: 8, 12, 9, is removed.

Database: Review Sept. 2009Yangjun Chen ACS Entry deletion - deletion sequence: 8, 12, 9, is removed.

Database: Review Sept. 2009Yangjun Chen ACS Entry deletion - deletion sequence: 8, 12, 9, Deleting 7 makes this pointer no use. Therefore, a merge at the level above the leaf level occurs.

Database: Review Sept. 2009Yangjun Chen ACS Entry deletion - deletion sequence: 8, 12, 9, 7 53 For this merge, 5 will be taken as a key value in A since any key value in B is less than or equal to 5 but any key value in C is larger than A B C 5 This point becomes useless. The corresponding node should also be removed.

Database: Review Sept. 2009Yangjun Chen ACS Entry deletion - deletion sequence: 8, 12, 9,

Database: Review Sept. 2009Yangjun Chen ACS Data modeling using Relational model Relational algebra Relational Data Model -relation schema, relations -database schema (relational schema), database state -integrity constraints and updating Relational algebra -select, project, join, cartesian product -division -set operations: union, intersection, difference,

Database: Review Sept. 2009Yangjun Chen ACS Integrity Constraints any database will have some number of constraints that must be applied to ensure correct data (valid states) 1. domain constraints a domain is a restriction on the set of valid values domain constraints specify that the value of each attribute A must be an atomic value from the domain dom(A). 2. key constraints a superkey is any combination of attributes that uniquely identify a tuple: t 1 [superkey]  t 2 [superkey]. -Example: (in Employee) a key is superkey that has a minimal set of attributes -Example: (in Employee)

Database: Review Sept. 2009Yangjun Chen ACS Integrity Constraints If a relation schema has more than one key, each of them is called a candidate key. one candidate key is chosen as the primary key (PK) foreign key (FK) is defined as follows: i)Consider two relation schemas R 1 and R 2 ; ii) The attributes in FK in R 1 have the same domain(s) as the primary key attributes PK in R 2 ; the attributes FK are said to reference or refer to the relation R 2 ; iii) A value of FK in a tuple t 1 of the current state r(R 1 ) either occurs as a value of PK for some tuple t 2 in the current state r(R 2 ) or is null. In the former case, we have t 1 [FK] = t 2 [PK], and we say that the tuple t 1 references or refers to the tuple t 2. Example: Employee(SSN, …, Dno)Dept(Dno, … ) FK

Database: Review Sept. 2009Yangjun Chen ACS Integrity Constraints 3. entity integrity no part of a PK can be null 4. referential integrity domain of FK must be same as domain of PK FK must be null or have a value that appears as a PK value 5. semantic integrity other rules that the application domain requires: state constraint: gross salary > net income transition constraint: Widowed can only follow Married; salary of an employee cannot decrease

Database: Review Sept. 2009Yangjun Chen ACS Updating and constraints insert Insert the following tuple into EMPLOYEE: When inserting, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity update Update the SALARY of the EMPLOYEE tuple with ssn = ‘ ’ to When updating, the integrity constraints should be checked: domain, key, entity, referential, semantic integrity

Database: Review Sept. 2009Yangjun Chen ACS Updating and constraints delete Delete the WORK_ON tuple with Essn = ‘ ’ and pno = 10. When deleting, the referential constraint will be checked. -The following deletion is not acceptable: Delete the EMPLOYEE tuple with ssn = ‘ ’ - reject, cascade, modify

Database: Review Sept. 2009Yangjun Chen ACS cascade – a strategy to enforce referential integrity  ssn  Employee Essn Pno    delete Works-on delete

Database: Review Sept. 2009Yangjun Chen ACS cascade – a strategy to enforce referential integrity Employee delete ssn supervisor    null  Employee delete ssn supervisor    null  delete not reasonable

Database: Review Sept. 2009Yangjun Chen ACS Modify – a strategy to enforce referential integrity  ssn  Employee Essn Pno    delete Essn Pno null   This violates the entity constraint. Works-on

Database: Review Sept. 2009Yangjun Chen ACS Relational Algebra a set of relations a set of operations set operations relation specific select project join division union intersection difference cartesian product

Database: Review Sept. 2009Yangjun Chen ACS Relational algebra Retrieve for each female employee a list of the names of her dependents: FEMALE_EMPS   SEX = ‘F’ (EMPLOYEE) ACTUAL_DEPENDENTS  EMPNAMES EMPNAMES   FNAME,LNAME, SSN (FEMALE_EMPS) RESULT  FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPENDENTS ) DEPENDENT SSN = ESSN

Database: Review Sept. 2009Yangjun Chen ACS Query: Retrieve the name of employees who work on all the projects that ‘John Smith’ works on. SMITH   FNAME = ‘John’ and LNAME = ‘Smith’ (EMPLOYEE) SMITH_PNOs   PNO (WORK_ON ESSN = SSN SMITH) SSN_PNO   ESSN,PNO (WORK_ON) SSNS(SSN)  SSN_PNO : SMITH_PNOs RESULT   FNAME, LNAME (SSNS * EMPLOYEE)

Database: Review Sept. 2009Yangjun Chen ACS Division The DIVISION operator can be expressed as a sequence of , , and - operations as follows: Z = {A 1, …, A n, B 1, …, B m }, X = {B 1, …, B m }, Y = Z - X = {A 1, …, A n }, R(Z) S(X) : T 1   Y ( R) T 2   Y ((S  T 1 ) - R) T  T 1 - T 2 result

Database: Review Sept. 2009Yangjun Chen ACS DDL - creating schemas - modifying schemas DML - select-from-where clause - group by, having, order by - update - view SQL

Database: Review Sept. 2009Yangjun Chen ACS DDL - Examples: Create schema: Create schema COMPANY authorization JSMITH; Create table: Create table EMPLOYEE (FNAMEVARCHAR(15)NOT NULL, MINITCHAR, LNAMEVARCHAR(15)NOT NULL, SSNCHAR(9)NOT NULL, BDATEDATE, ADDRESSVARCHAR(30), SEXCHAR, SALARYDECIMAL(10, 2), SUPERSSNCHAR(9), DNOINTNOT NULL, PRIMARY KEY(SSN), FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN), FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));

Database: Review Sept. 2009Yangjun Chen ACS DDL - Examples: drop schema DROP SCHEMA CAMPANY CASCADE; DROP SCHEMA CAMPANY RESTRICT; drop table DROP TABLE DEPENDENT CASCADE; DROP TABLE DEPENDENT RESTRICT; alter table ALTER TABLE COMPANY.EMPLOYEE ADD JOB VARCHAR(12); ALTER TABLE COMPANY.EMPLOYEE DROP ADDRESS CASCADE;

Database: Review Sept. 2009Yangjun Chen ACS DML - select-from-where clause Retrieve a list of employees and the projects they are working on, ordered by department, within each department, ordered alphabetically by last name, first name: SELECTDNAME, LNAME, FNAME, PNAME FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT WHEREDNUMBER = DNO AND SSN = ESSN AND PNO = PNUMBER ORDER BY DNAME, LNAME, FNAME