1 COP 4710 Databases Fall, 2000 Today’s Topic Review for Final Exam David A. Gaitros November 6, 2000 Department of Computer Science Copyright by Dr. Greg Riccardi
2 Outline of Course n Study of principals and techniques of databases n Grades assigned as in information sheetinformation sheet n Examples of use of databases n Programming projects in database design and implementation –Programming in Microsoft Access –Programming in Java with a Unix database –Development of a web site with database support n Course notes in n Next class, Chapter 2
3 Representation of Information n Data is collections of bits –physical database n Information is data with meaning –logical database n Representation of meta-data –database system is self-describing n Database Management System (DBMS) –define information content –construct database –manipulate by queries, reports and updates –data plus software
4 Vocabulary n Glossary of terms n Define the terms as used in this subject –Database literature is filled with terms n Example of terms –Data, bits –Information, bits with meaning (type) –Entity –Schema
5 Data Modeling n A data model is a specification of the information content of a system –conceptual data model describes information in terms the users will understand –logical data model describes information in a way that can be used to build a database –physical data model describes information in terms of its representation in physical storage
6 Schemas and Instances n Schema is the structure of a database –intention or meaning of the data –data models are schemas –table definitions are schemas –class definitions are schemas n Instances are the contents of a database –extension or values of the data –objects are instances –objects in a database are typically rows in a table
7 Levels of database schemas n Different schemas are presented to different users
8 Database Languages n DDL, data definition language, conceptual schema –describe conceptual schemas n SDL, storage definition language, internal schema –describe file structures, indexes n VDL, view definition language, external schema n DML, data manipulation language –High-level or non-procedural (e.g. SQL) Select Last Name from Roster where Section = 2 –Low-level or procedural For r in Roster loop if r.section = 2 then result.Add ( r.lastname );
9 Principals of ER Modeling n Entities and classes –Entity, a thing in the real world –Entity Class, the structure of a collection of similar entities n Attributes –Attribute, a property of an entity –Each entity has a value for each of its attributes n Types of attributes –simple vs. composite, single-valued vs. multi- valued, stored vs. derived –domains of attributes
10 Relationships Between Entities n Relationship type defines a set of associations among given types. n Relationsip Instances are particular relationships among objects. n Examples of relationship types in company database –Manages: 1:1 between employee and department –Works-for: 1:N between department and employee –Controls: 1:N between department and project
11 Find the Entities, Attributes and Relationships
12 ER schema diagram for BigHit Video
13 Chapter 4 The Relational Data Model n A Relation is a two-dimensional table –Fixed list of columns –One object per row n An attribute represents a single column of a table and has a name and a type n A relation schema is the name and the list of attributes of a relation –Grade (studentId, assignmentId, points, dateSubmitted) n A tuple is a row of a table, one value for each attribute –(123, 14, 27, 5/28/98)
Characteristics of Relational Model n Relation is a set of tuples –No ordering of tuples –No duplicate tuples no two rows have all the same values n Each attribute value is atomic –hence no multiple-valued or composite attributes –called first normal form n Each relation is a set of assertions –Each represents a fact –Some facts are about relationships n That’s it! –no other data structures –no explicit representation of relationships
15 Representing E-R Model as Relations n Entity class Relation schema n Entity row of table –set of all entities of class table n Attribute column definition (attribute) –attribute value table element n Relationship type –relation schema –attribute(s) of relation schema
16 Rules for Relationship Types n One-to-many –For each one-to-many relationship type R between subject class S and target class T, add the key attributes of class S to class T as foreign keys. Name the attributes using the role that S plays in relationship type R. –Add the attributes of the relationship type R to class T. n One-to-one –choose one side and use above rule n Examples in class
17 Many-to-many relationship types n Create a relation schema for the relationship type –foreign key attributes for the key of the related schema –add attributes of the relationship type n Examples in class!
18 Representing relationships as attributes n One-to-many –For each one-to-many relationship type R subject class S (one side) target class T (many side), –add the key attributes of S to the schema of T as foreign keys. –Name the foreign key attributes ues the role that S plays in relationship type R. –Add the attributes of the relationship type R to schema for T. n One-to-one –choose one side and use above rule
19 Representing Weak Entity Classes n Create a relation schema –Add foreign key for each defining relationship type –Key is partial key plus defining foreign keys n Consider Fig. 2.5, weak class Rental n Schema: Rental (videoId,dateDue, dateRented, cost) –key videoId (foreign key)
20 Representing specialization hierarchies n Three possibilities –1. Create a table for the superclass with its attributes and a table for each subclass with its attributes –2. Create a table for the superclass with all of the subclass attributes –3. Create a table for each subclass that includes both subclass and superclass attributes
21 Functional Dependencies and Normalization n Begin by discussing good and bad relation schemas n Informal measures of the quality of relation schema design –Semantics of the attributes –Reducing the redundant values in tuples –Reducing the null values in tuples –Disallowing spurious tuples n Define Normal Forms as formal measures of the quality of schemas –restrictions on the form of relation schemas
22 Update Anomalies n Insertion Anomalies –When inserting a new owner, we must correctly insert the Manuf field, or will create inconsistencies –Cannot create a car without an owner –Cannot create a make without a car and an owner n Deletion Anomalies –Deletion of owner of a car also deletes make and manufacturer of car –Deletion of owner of the last Plymouth deletes relationship between Plymouth and Chrysler n Modification Anomalies –Changing the make of a car requires consistency check –Cannot change so that a Plymouth is made by Ford n Guideline 2: no insertion, deletion, or modification anomalies allowed!
23 Some definitions n superkey: a set of attributes of a relation whose values are unique within the relation. n key, a superkey in which removal of any attribute makes it not a superkey. If there is more than one key, they are called candidate keys. n primary key, arbitrarily designated candidate key, all other candidate keys are secondary keys. n prime attribute, one which is a member of any key. n nonprime attribute, one which is not prime.
24 Definition of Functional Dependency n A functional dependency is a constraint between 2 sets of attributes from the database –For each value of the first set there is a unique value of the second set n X-->Y restricts the tuples that can be instances of R n if t1 and t2 are instances of R –t1(X) = t2(X) then t1(Y) = t2(Y) n For example, –{DLNum} --> {Oname} –{CarId} --> {Make, Manuf} –{Make} --> {Manuf} n Candidate keys are left hand sides of functional dependencies
25 Second Normal Form (2NF) n X-->Y is a full functional dependency if the removal of any attribute A from X removes the dependency –not X-{A} --> Y n X-->Y is a partial dependency if some attribute A may be removed without removing the dependency –X-{A} --> Y n A relation schema R is in 2NF if every nonprime attribute is fully functionally dependent on the primary key of R
26 Putting the CarReg Schema into 2NF n Consider the Owner relation schema –{DLNum} is the primary key –Hence Owner is in 2NF n Consider the Car relation schema –{CarId, DLNum} is primary key (multiple owners) –{CarId} --> {Make, Model,...} –Hence Car is not 2NF n Create new relations –CarOwner = {CarId, Owner, PurchDate, TagNum, RegisDate} –Car = {CarId, Make, Model, Manuf, Year, Color} n Is it 2NF?
27 Rules for Functional Dependencies n Given a particular set of functional dependencies, we can find others using inference rules –Splitting/combining rules A -> B1 B2 A-> B1 and A->B2 –Trivial rules A B -> B, for all A, B –Transitive rule A -> B and B -> C => A B -> C n We are interested in the closure of the set of functional dependencies under these (and other) rules
28 Inference Rules for Functional Dependency n There are semantically obvious functional dependencies, usually specified by schema designer n Other functional dependencies can be inferred from those n Inference rules –Reflexive, X includes Y, X-->Y –Augmentation, X-->Y then XZ-->YZ –Transitive, X-->Y-->Z then X-->Z –Decomposition, X-->YZ then X-->Y –Union, X-->Y and X-->Z then X-->YZ –Pseudotransitive, X-->Y and WY-->Z then WX-->Z
29 Definition of Key n A set of one or more attributes {A1,...Ak} is a key for a relation R –Those attributes functionally determine all other attributes of R no 2 distinct tuples can agree on the key –no proper subset of {A1,... Ak} is a key of R a key must be minimal n There can be more than one key in a relation –Department (DeptName, DeptNo,...) since both are unique, both are keys n A superkey (superset of a key) is a set of attributes that functionally determine all other attributes of the relation.
30 Third Normal Form (3NF) n Based on transitive dependency, or non- key dependency n A functional dependency X-->Y is a transitive dependency if there is a set Z which is not a subset of any key, and for which X-->Z and Z- ->Y n A relation schema is in 3NF if there is no nonprime attribute which is functionally dependent on a non-key set of attributes. n Example of {make}-->{manuf} violates 3NF since make is not a key.
31 Section 6.1 Relational Algebra n Look at the formal basis for operations on the relational data model n An “algebra” is a collection of operations on some domain n Relational Algebra is a collection of operators –operands and results are relations –operators projection and selection remove parts of a relation set operators, union, intersection and difference joins and products combine the tuples of two relations –other operators follow
32 Join Operations n Natural join is based on the cartesian product –With a restriction on the tuples and attributes each common attribute appears once in result tuples are included only where the common attributes have the same values –R join S on A has those tuples of R S where R.A = S.A –Each tuple from R is joined to all tuples of S that have the same value for attribute A n Example –Every combination of Customer and Rental where the accountId fields match
33 Combining Operations to Form Queries n Can put all operations together –Names and grades of students who made took quiz 1 n We’ll see how this works in in Access n In class, time permitting –Demonstration of Queries in Access
34 Relational Expressions Select account 113, project videoId and dateDue – videoId, dateDue ( accountId=113 (Rental)) VideoId, title and date due for account 113 – videoId, title, dateDue (( accountId=113 (Rental)) videoId Videotape movieId Movie) – videoId, title, dateDue ( accountId=113 ( Rental videoId Videotape movieId Movie)) What is the order of evaluation?
35 Chapter 7: SQL n Standard Query Language –ANSI and ISO standard –SQL2 or SQL-92 is current standard n SQL is a data manipulation language (DML) and a data definition language (DDL) and a programming language n We can use SQL for –Logical database specification (database schema definitions –Physical database specifications (indexes, etc.) –Querying database contents –Modifying database contents
36 Relational Operations in SQL n Select statement –select from where n Projection in SQL using select clause –Select title from Movies n Selection in SQL using where clause –select * from Customer where lastName = 'Doe' –select distinct lastName, firstName from Customer no duplicates with distinct
37 Products and Joins in SQL n Cartesian product in SQL using from clause –Select * from Employee, Timecard n Join using from and where clauses –Select * from Employee, Timecard where Employee.ssn = Timecard.ssn n Join using join and on (non-standard) –Select * from Employee join TimeCard on Employee.ssn = TimeCard.ssn
38 Nested Queries n Nested select query –Select videoId, dateAcquired from Videotape where videoId = ( select videoId from Rental where dateRented=‘1/1/99’) n compare with –Select v.videoId, dateAcquired from Videotape v, Rental r where v.videoId = r.videoId and dateRented=‘1/1/99’) n Same result?
39 Select Using Group by and Having n Group by forms groups of rows with the same column values n What is the average hourly rate by store? –select storeId, avg(hourlyRate) from HourlyEmployee e, WorksAt w where e.ssn = w.ssn group by stroreId n How many employees work at each store? –select storeId, name, count (*) from Store s, WorksAt w where s.storeId = w.storeId group by storeId, name n Having filters the groups –having count (*)>2
40 Substrings, arithmetic and order n Find a movie with ‘Lion’ in the title –select title from Movie where title like ‘%Lion%’ n List the monthly salaries of salaried employees who work in in store 3 –select salary/12 from Employees e, WorksAt w where e.ssn=w.ssn and storeId=3 n Give the list of employees in store 3, ordered by salary –select firstName, lastName from Employees e, WorksAt w where e.ssn=w.ssn and storeId=3
41 Modifying Content with SQL n Insert queries –insert into Customer values (555, 'Yu', 'Jia','540 Magnolia Hall','Tallahassee', 'FL', '32306') –insert into Customer (firstName, lastName, accountId) values ('Jia', 'Yu', 555) n Update queries –update TimeCard set paid = true where paid = false –update HourlyEmployee set hourlyRate = hourlyRate *1.1 where ssn = ' ' n Samples in Access
42 Creating Pay Statements with SQL n Find the number of hours worked for each employee entry –select TimeCard.ssn, sum((endTime- startTime)*24) as hoursWorked from TimeCard where paid=false group by ssn n Create the Pay Statement entries for each Employee –select ssn, hourlyRate, hoursWorked, hoursWorked * hourlyRate as amountPaid, today from … n Insert into the PayStatement table –Insert into PayStatement select … n Look at the Access example in BigHit.mdb
43 Create Table Statement n create table Customer ( accountId int, lastName varchar(32), firstName varchar(32), street varchar(100), city varchar(32), state char(2), zipcode varchar(9) ) n Note that SQL has specific types
44 Key Constraints in SQL n Key declarations are part of create table –create table Store ( storeId int primary key, –create table Movie ( movieId varchar(10) primary key, –create table Rental ( accountId int, videoId varchar(10), primary key (accountId, videoId)
45 Java Objects and variables n Objects are dynamically allocated –Figures A.1 and A.2 show String variables Assignment (=) and equality (==)
46 Java DB Connectivity (JDBC) n Figure 8.4 Strategies for implementing JDBC packages
47 Executing Insert and Update Statements n Create new customer, using String + int rowcount = stmt.executeUpdate( ”insert into Customer ” +”(accountId,lastName,firstName) ” +”values (1239,’Brown’,’Mary’)”); if (rowcount == 0) // insert failed n Update –String updateSQL = “update TimeCard set “ +”TimeCard.paid = 'yes’ where “ +”paid<>'yes’”; int count = stmt.execute(updateSQL); // count is number of rows affected
48 Chapter 13 Query Processing n Strategies for processing queries n Query optimization n First: How to represent relational DB? –Each table is a file Record structure to store tuples File is a random access collection of records –Query is executed by reading records from files Read record, create object in memory Process object Write result as a file of records or keep in memory
49 Processing a range query n Figure 13.3 Illustration of query processing for query –select * from Customer where accountId >= 101 and accountId < 300
50 Using hashing to eliminate duplicates n A hash function partitions values so that –All values that are the same are in the same partition –Values that are different are often in different partitions n We can find duplicates by hashing –For each tuple in the table Mash all attribute values in the tuple into a single value Apply hash function –For each partition Compare all pairs of tuples Eliminate duplicates –Why does this work?
51 Processing join queries with indexes n Indexed nested loop join while (not customer.eof()) { Customer c= customer.read(); rental.reset(); while (not rental.eof()) { Rental r[] = rental.readByAcctId(c.accountId); for (int i=0; i<r.length; i++) { result.write(c,r[i]); result.write(c,r[I]); }}} Cost is B c + R r instead of B c + R c × B r without index n Reduce cost by processing a block at a time?
52 ACID Transactions n Atomicity: the property of a transaction that all of the updates are successful, or there is no update at all. n Consistency: each transaction should leave the database in a consistent state. Properties such as referential integrity must be preserved. n Isolation: each transaction when executed concurrently with other transactions should have the same affect as if it had been executed by itself. n Durability: once a transaction has completed successfully, its changes to the database should be permanent. Even serious failures should not affect the permanence of a transaction.
53 Example of transaction open transaction videoId video1 = select id of a copy of "Star Wars" if (video1 == null) rollback transaction insert row into Reservation for video1 videoId video2 = select id of a copy of "Return of the Jedi" if (video2 == null) rollback transaction insert row into Reservation for video2 videoId video3 = select id of a copy of "The Empire Strikes Back" if (video3 == null) rollback transaction insert row into Reservation for video3 commit transaction
54 Transaction isolation n Consider these transactions –Actions of T1 A: balance1 = (select balance from Customer where accountId = 101); balance1 += 5.00; B: update Customer set balance = ?balance1 where accountId = 101; –Actions of T2 A: balance2 = (select balance from Customer where accountId = 101); balance2 += 10.00; B: update Customer set balance = ?balance1 where accountId = 101; n Problems –Lost update: T1.a, T2.a, T1.b, T2.b –Dirty read: T1.a, T1.b, T2.a, T1.rollback, T2.b, and T2 commit –Incorrect Summary: example in class
55 Locking database objects n Allow transaction operations to lock objects –Read (shared) locks –Write (exclusive) locks n Lock granularity –What size object to lock? –Table, row, field, column n Effect on concurrency –T1:Select sum(balance) from Customers –T2: Update Customers set firstName=‘Joe’ where accountId=101 n Effect on size and cost –Smaller objects = more locks
56 Two phase locking (2PL) n Locks granted and released in two phases –Growing phase Request and upgrade locks –Request read on X –Request write on X –Shrinking phase Release and downgrade locks –Request read on X (downgrade from write) –Release read on X n 2PL guarantees serializability –Any conflicting operation is blocked
57 Transaction problems n Lost update –Two transactions update, last one persists n Dirty read –One transaction reads a value written by a transaction that subsequently rolls back n Incorrect summary –One transaction calculates an aggregate while another is updating n Unrepeatable read –One transaction reads the same object twice and receives two different values n Phantom read –A transaction reads a value inserted by another transaction that subsequently rolls back n Deadlock –Two transactions hold and request
58 Transactions in SQL n Transaction management statements –set transaction read only; –set transaction read write; –set transaction isolation level serializable; –commit transaction; –rollback transaction; n Executing SQL statement without opening transaction –autocommit mode
59 Causes of Failure, Possibilities of Recovery n Database server –computer crashes –server program crashes –disk drive corruption n Client failure –computer crashes –client program crashes n Network failure –connection fails, often temporary n Transaction failure –executes rollback (voluntary) –executes illegal operation (server created) –deadlock –introduces errors into the database
60 Recovery from failure n Primary technique, restart from consistent backup/checkpoint n Reprocessing –ask all committed transactions to execute again n Roll Forward –Back to consistent backup state –Apply redo transaction log n Roll Back –Remove the effect of each transaction with undo log –Can be used to cancel the effects of rogue transactions
61 Security in Relational Database Systems n Account security for validation of users –Database accounts –Operating system accounts n SQL statements for security –create user –alter user –create profile –create role –grant privileges to users, roles
62 Stored Procedures n Define numberRented function –create function numberRented (accId int) return int as select sum(*) from Rental where Customer.accountId = accId; n Define checkIn procedure –create procedure checkIn (vidId int, cost double) as begin insert into PreviousRental … n Grant privileges to procedures –grant update on PreviousRental to checkIn –grant checkIn to clerk –revoke update on PreviousRental to public n User in the clerk role can update the table, no one else can
63 Distributed Database Systems Os net = Network Communications portion of Operating System Os dm = Data management portion of Operating System DDBMS = Distributed Database System Database Database Database DDBMS AP 1 AP 2 OS net OS dm DDBMS AP 1 AP 2 OS net OS dm DDBMS AP 2 AP 3 OS net OS dm
64 Distributed Databases n Single schema with multiple servers –Not one application connecting to multiple servers –An application connects to a single server n Fragmentation of tables –Horizontal, rows in different servers –Vertical, columns in different servers –Replicated, some rows or columns in multiple servers n Distributed Transactions –Two phase commit –Discussion in class