Download presentation
Presentation is loading. Please wait.
Published byRandell Tyler Modified over 9 years ago
1
1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca http://www.site.uottawa.ca/~kiringa SITE 5072
2
2 Review of Databases I Chapters 2-5
3
3 What is a DBMS? A centralized database is a large collection of integrated data. Organizations face large quantities of data that must be efficiently managed. Many store GBs, even TBs of data Some scientific applications store PBs of data ! A Database Management System (DBMS) is a software package designed to store and manage databases.
4
4 Content of « Databases I » Foundations of DBMSs (Chap. 2-5, 19): : Conceptual design - Chap. 2: Input: requirements of an application Output: ER diagrams (i.e., ER model) Logical design – Chap. 3: Input: ER diagrammes Output: Relational model Normalization – Chap. 19: Input: Relational model Output: Relational model in normal forms Relational algebra and calculus – Chap. 4 SQL – Chap. 5 Application development (Chap. 6-7) Classic applications: Embeded SQL, JDBC, SQLJ, stored procedures Internet applications : HTTP, HTML, XML, 3-tier architecture
5
5 Content of « Databases I » (Cont’d) Storage and Indexing (Chap. 8-11): Disks and files – Chap. 9: Memory hierarchy Disk and file management Tree index – Chap. 10: ISAM tree B+ tree Hasch index – Chap. 11: static extendible linear
6
6 Content of « Databases II » Query evaluation (Chap. 12-15) External sort - Chap. 13 Evaluation of relational operators -- Chap. 14 System R: a sample evaluation algorithm – Chap. 15 Transactions (Chap. 16-18) Concurrency control – Chap. 17 Recovery – Chap. 18 Advanced topics (Chap. 22,25, 26, 23 ou 27) Distributed databases – Chap. 22 Data Warehousing – Chap. 25 Data Mining Chap. 26
7
7 Overview of Database Design Requirements analysis : Find out what users want to do with the database Conceptual design : Use the output of RA to develop a high-level description of the data to be stored, along with their constraints. Output of CD usually is an ER-diagram. Logical design: Choose a DBMS and map the conceptual schema (ER-diagram) into the data model of the DBMS. Output of this step is the logical schema of the data. Schema refinement: Analyze the logical schema, identify potential problems in it, and fix these by refining the logical schema using known normal forms. Physical design: Consider expected workloads that the database will support to further refine the design in order to meet desired performance criteria. Output here is the physical schema.
8
8 ER Model The model Basic elements: Entity: real-world object Relationship: association between entities Attributes: description of an entity or a relationship Advanced elements: Constraints: qualification of a relationship Weak entity : Identifiable only via another entity Hierarchy : similar to OO-languages Agregation: sort of macros Conceptual design using the ER model Should a given concept be modelled as an entity or an attribute? Should a given concept be modelled as an entity or a relationship? Should a given concept be modelled as a binary or as a ternary relationship? etc
9
9 Relational Database Concepts Relation: made up of 2 parts: Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity. Schema : specifies name of relation, plus name and domain (type) of each column (attribute). Can think of a relation as a set of rows or tuples (i.e., all rows are distinct), where each tuple has the same arity as the relation schema. Relational database: a set of relations, each with distinct name. Relational DB schema: set of schemas of relations in the DB. Relational DB instance: set of relation instances in the DB.
10
10 Sample Relation Cardinality = 3, arity = 5, all rows distinct. Commercial systems allow duplicates. Order of attributes may or may not matter! Do all columns in a relation instance have to be distinct? Depends on whether they are ordered or not. Schema : Students(sid: string, name: string, login: string, age: integer, gpa: real). Instance :
11
11 Creation and Alteration of Relations CREATE TABLE Students CREATE TABLE Enrolled (sid: CHAR(20), name: CHAR(20), cid: CHAR(20), login: CHAR(10), grade: CHAR(2)) age: INTEGER, gpa: REAL) DROP TABLE StudentsALTER TABLE Students ADD COLUMN firstYear: integer INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2) DELETE FROM Students S WHERE S.name = ‘Smith’
12
12 Primary Key and Foreign Key in SQL CREATE TABLE Enrolled (sid CHAR (20) cid CHAR(20), grade CHAR (2), PRIMARY KEY (sid,cid) ) CREATE TABLE Enrolled (sid CHAR (20) cid CHAR(20), grade CHAR (2), PRIMARY KEY (sid), UNIQUE (cid, grade) ) CREATE TABLE Enrolled (sid CHAR (20), cid CHAR(20), grade CHAR (2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students )
13
13 Logical DB Design: ER to Relational The ER model represent the initial, high-level database design. The task is to generate a relational database schema that is as close as possible to the ER model. The mapping is approximate since it is hard to translate all the constraints of the ER model into an efficient logical (relational) model.
14
14 Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra : More operational, very useful for representing execution plans. Relational Calculus : Lets users describe what they want, rather than how to compute it (Non- operational, declarative ).
15
15 Relational Algebra Basic operations: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Gives tuples in 1 st rel., but not in 2 nd rel. Union ( ) Gives tuples in rel. 1 and in rel. 2. Additional operations: Intersection, join, division, renaming: Not (theoretically) essential, but (practically) very useful. Since each operation returns a relation, operations can be composed ! (Algebra is “closed”.)
16
16 Cross-Product Each row of S1 is paired with each row of R1. Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. Conflict : Both S1 and R1 have a field called sid. Renaming operator :
17
17 Joins Condition Join : Result schema same as that of cross-product. Fewer tuples than cross-product, might be able to compute more efficiently Sometimes called a theta-join.
18
18 Joins (Cont’d) Equi-Join : A special case of condition join where the condition c contains only equalities. Result schema similar to cross-product, but only one copy of fields for which equality is specified. Natural Join : Equijoin on all fields having the same name in both relations.
19
19 Relational Calculus Has two flavors: Tuple relational calculus (TRC) Domain relational calculus (DRC). Has variables, constants, comparison ops, logical connectives, and quantifiers. TRC : Variables range over (i.e., get bound to) tuples. DRC : Variables range over domain elements (= field values). Both TRC and DRC are simple subsets of first-order logic. Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.
20
20 Tuple Relational Calculus Query has the form: Answer includes all tuples t that make the formula p(t) be true. Formula is recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger formulas using the logical connectives.
21
21 TRC Formulas Atomic formula: , or R.a op S.b, or R.a op constant op is one of Formula: an atomic formula, or , where p and q are formulas, or , where variable R is free in p(R), or , where variable R is free in p(R) The use of quantifiers and is said to bind R. A variable that is not bound is free.
22
22 Overview: Features of SQL Data definition language: used to create, destroy, and modify tables and views. Data manipulation language: used to pose queries, and to insert, delete, and modify rows. Triggers and advanced integrity constraints: used to specify actions that the DBMS will execute automatically. Embeddded SQL: allows SQL to be called from a host language, or Dynamic SQL : allows run-time creation and execution of queries.
23
23 Syntax of Basic SQL Query relation-list A list of relation names (possibly with a range-variable after each name). target-list A list of attributes of relations in relation-list qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT. DISTINCT is an optional keyword indicating that the answer should not contain duplicates. SELECT [DISTINCT] target-list FROM relation-list WHERE qualification
24
24 Sample Query and Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103
25
25 Aggregate Operators Significant extension of relational algebra COUNT (*) COUNT ( [ DISTINCT ] A) SUM ( [ DISTINCT ] A) AVG ( [ DISTINCT ] A) MAX (A) MIN (A) SELECT AVG (S.age) FROM Sailors S WHERE S.rating=10 SELECT COUNT (*) FROM Sailors S SELECT AVG ( DISTINCT S.age) FROM Sailors S WHERE S.rating=10 SELECT S.sname FROM Sailors S WHERE S.rating= ( SELECT MAX (S2.rating) FROM Sailors S2) single column SELECT COUNT ( DISTINCT S.rating) FROM Sailors S WHERE S.sname=‘Bob’
26
26 Queries With GROUP BY and HAVING The target-list contains (i) attribute names (ii) terms with aggregate operations (e.g., MIN ( S.age )). The attribute list (i) must be a subset of grouping-list. Intuitively, each answer tuple corresponds to a group, and these attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.) SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification
27
27 Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes ` unnecessary ’. 2nd column of result is unnamed. (Use AS to name it.) SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Answer relation
28
28 Summary SQL is more declarative than earlier, procedural query languages. is relationally complete; in fact, significantly more expressive power than relational algebra. In practice, users need to be aware of how queries are optimized and evaluated for best results. has any alternative ways to write a query; optimizer should look for most efficient evaluation plan. allows specification of rich integrity constraints. Provides triggers to respond to changes in the database.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.