Database Design: DBS CB, 2nd Edition

Database Design: DBS CB, 2nd Edition
Relational Model Overview Ch Ch. 2.1 – 2.5

Course Content Physical and logical database design Relational Algebra
Database programming. SQL, constraints and triggers, views and indexes. SQL Environment: embedded SQL, Stored procedure, UDF, CLI, JDBC High-level overview for SQL processing and SQL compiler Not Covered: Concurrency Control, SQL compiler details and optimization, Security details, storage and buffer management, OLAP, XQuery, or SPARQL

Textbook Database Systems: The Complete Book Second Edition, Authors: Hector Garcia-Molina, Jeffery Ullman, and Jennifer Widom. Publisher: Pearson – Prentice Hall ISBN-13:

Database Evolution Hierarchical 1960’s 1970's 1980's 1990’s 2000’s now
Relational Object Bases Knowledge Bases Network XQuery SPARQL

What is a RDBMS System? Manages very large amounts of data
Supports efficient access to very large amounts of data Supports concurrent access to very large amounts of data Supports secure access to very large amount of data Supports atomic (ACID) access to very large amount of data

Interesting applications about RDBMS?
It used to be about boring stuff like employee records, etc. Today, most interesting applications are based on RDBMS or RDBMS is behind it: Web search Data mining Scientific and medical databases Information integration Google search Queries at Amazon, eBay, etc. And more…

High-level Overview of RDBMS (1)
Data Definition Language (DDL): like “type defs” in C and typically is handled by the DBA. DDL manipulate the metadata to create/modify the schema Data Manipulation Language (DML): used to manipulate existing tables (Insert, Update, Delete) SQL Processing: user typically interact with RDBMS through either a query (answer a query) or a DML statements to manipulate the existing content of the database. A significant advantage of RDBMS is that the user specifies the “what” and the query processor decides the “how.”

A query processor consists of the following two main components: Query Compiler: translates SQL statement into internal representation called a “query plan.” The query plan is a sequence of actions that will be executed by the execution engine: Query parser: builds a tree structure from the SQL text Query preprocessor: performs semantic checking like relations accessed by the query actually exists, and transform the parse tree into tree of algebraic operators representing the query plan Query optimizer: transform the query plan into available sequence of operations on the actual data Execution engine: executes the sequence of operations in the query plan.

Transaction Processing: queries and DML statements are grouped into transactions to provide ACID properties: Concurrency control to assure atomicity and isolation of tx Logging and recovery manager to ensure durability Storage and Buffer Management: Data in the RDBMS reside on secondary storage (disk). To do anything useful we need to bring the data into memory (buffer cache). Storage mgr controls the placement of data on disk and movement between disk and main memory

Overview of Data Models (1)
What is a data model: Structure of the data Operations allowed on the data Constraints on the data Important data models: Relational model  RDBMS Semi-structured data model; XML document or tree  XQuery Semantic web, RDF/triples or graph  SPARQL

Relational data model in brief: Relation is a Table It is important to notice that the above representation is not necessarily the physical implementation of a Relation Attributes (column headers) name manf Winterbrew Pete’s Bud Lite Anheuser-Busch Beers Tuples (rows) Relation name

Operations on relational model form the “relational algebra” and they are table-oriented Constraints on the above data model may limit allowed valued to any of the columns Semi-structured data model Another data model, based on trees It is flexible model to represent data Motivated by sharing of documents among systems and databases Semantic web data model Another data model, based on graphs. Nodes are objects and arcs are attribute names or “predicates/properties” More flexible model and is meant for the web Motivated by machine processing of information

Example of an XML Document: A NAME subobject <?xml version = “1.0” encoding = “utf-8” ?> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS> A BEER subobject

Example of a Semantic Web Graph: Bud A.B. Maple Joe’s M’lob beer bar manf servedAt name addr root The bar object for Joe’s Bar The beer object for Bud Notice a new kind of data. Gold prize year award 1995

How these models compare? Semi-structured data model is more flexible than relational in representing hierarchical data RDBMS is still preferred for efficient access and modification to large amounts of data Semantic web graph model is the most flexible and the least mature; many of the current implementations are based on an RDBMS engine. Current specifications from W3C are limited to querying RDF stores; they are working on extending to include DML statements

Relational Model Basics (1)
Relation: is a table Attributes: is a column name in a table Schemas: name of relation and its associated attributes Beers(name:string, manf:string). An RDBMS consists of one or more relations defined by the database schema Tuples: rows of a relation; one component per attribute Domains: each component of a tuple is atomic (integer, string, etc.) and can’t be a record value. Equivalent Representation of a Relation: order of attributes in a relation is not relevant

Relational Model Basics (2)
Relation instance: it is the set of tuples that form an instance of the relation Keys of relations: key is a kind of a constraint on a relation. A key is a set of attributes whose values can belong to at most one tuple

Defining Schema in SQL (1)
Relations in SQL: Stored relations: traditional tables Views: relation defined by computation and are constructed when needed Materialized views: it is a persistent view to speed view access Temporary tables: creating as a side effect of executing queries & DML statements then thrown away Data types: Character strings of fixed (CHAR n) or varying length (VARCHAR (n)). Bit String of fixed or varying length. Similar to fixed/varying character strings, but their length are strings of bits rather than characters

BOOLEAN is a logical value of TRUE, FALSE, UNKNOWN INT or INTEGER denotes typical integer FLOAT or REAL denotes typical floating point number DOUBLE PRECISION denotes higher precision DECIMAL (n,d) denotes real numbers with fixed decimal point. Example is a possible value of type DECIMAL(6,2) DATE and TIME are character strings of a special form. Example-1: DATE ‘ ’ complies with the format ‘yyyy-mm-dd’ and means September 30, Example-2: TIME ‘15:30:02.5’ complies with the format ‘hh:mm:ss’ and means 2.5 seconds after 3:30pm

Simple Table Declaration: Simplest form is: CREATE TABLE <name> ( title CHAR(100), year INT, …. ); To delete a relation: DROP TABLE <name>;

Modifying Relation schemas: ALTER TABLE <name> ADD phone CHAR(16); ALTER TABLE <name> DROP phone; Default values: Title CHAR(100) DEFAULT “UNKNOWN”  column declaration ALTER TABLE <name> ADD phone CHAR(16) DEFAULT ‘unlisted’ Declaring keys: Key is an attribute or list of attributes Key types: PRIMARY KEY or UNIQUE The above keys says no 2 tuples of a relation will have same key

Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute. CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) ); name CHAR(20), manf CHAR(20), PRIMAY KEY (name)

The bar and beer together are the key (cluster key) for Sells: CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) );

SQL: An Algebraic Query Language (1)
Why do we need special query language? Relational algebra is useful as it is simpler than 3rd generation languages like C or Java What is an Algebra? It consists of operators (arithmetic and logical) and atomic operands (variable x or constant) Any algebra should allow us to build expressions such as “(x = y) *z” or “((x+7)/(y-3) + x” What is Relational Algebra Is another algebra with its atomic operands can be: Variables that stand for relations Constants

Overview of Relational Algebra: Operations that can be applied to relations – include Union (R U S), intersection (R ∩ S), and difference (R – S). R and S must have schemas with identical attributes and of same type Operations that remove parts of a relation – “selection” eliminates some rows (tuples) and “projection” eliminates some columns Operations that combine the tuples of two relations, including join and Cartesian product Operations that rename attributes or relations We shall call expressions of relational algebra as queries Projection - ∏A1,A2,…,An(R): is used to produce from relation R a new relation that has some of the R’s columns – ∏A1,A2,…,An(R) is a relation that has only attributes A1,A2, …,An

Selection - σc(R): is used to produce from relation R a new relation with subset of the tuples – σc(R). C is a conditional expression that is applied to every tuple “t” of R. If the condition C is true, then tuple “t” will appear in the result; otherwise “t” will not be in the result Select * from table where salary > 100; Cartesian (cross) product – R x S: Cross-product of 2 relations or sets R and S is denoted by R x S and it is the set of all pairs that can be formed by choosing every tuple in R with every tuple in S If R has 2 tuples and S has 4 tuples, then RxS will have 8 tuples. RxS = SxR

Natural Joins (R ⋈ S): We pair only those tuples from R and S that agree in all attributes that are common to the schemas of R and S. The result is called joined tuple, with one component for each of the attributes in the union of the schemas of R and S Theta Joins (R ⋈C S) Rather than pairing tuples using one specific condition, it is desirable to join based on an arbitrary condition (theta) represented by C. The result can be constructed as follows: Perform R x S Select from the cross-product only those tuples that satisfy condition C

SQL: An Algebraic Query Language(5)
Natural Joins (R ⋈ S) Example: Sells( bar, beer, price ) Bars( bar, addr ) Joe’s Bud Joe’s Maple St. Joe’s Miller Sue’s River Rd. Sue’s Bud 2.50 Sue’s Coors 3.00 BarInfo := Sells ⋈ Bars BarInfo( bar, beer, price, addr ) Joe’s Bud 2.50 Maple St. Joe’s Milller 2.75 Maple St. Sue’s Bud 2.50 River Rd. Sue’s Coors 3.00 River Rd.

Theta Joins (R ⋈C S) Example: Sells( bar, beer, price ) Bars( name, addr ) Joe’s Bud Joe’s Maple St. Joe’s Miller Sue’s River Rd. Sue’s Bud 2.50 Sue’s Coors 3.00 BarInfo := Sells ⋈Sells.bar = Bars.name Bars BarInfo( bar, beer, price, name, addr ) Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Joe’s Maple St. Sue’s Bud 2.50 Sue’s River Rd. Sue’s Coors 3.00 Sue’s River Rd.

Precedence of relational operators: [σ, π, ρ] (highest). [Χ, ⋈]. ∩. [∪, —]

Combining operations to form queries: Relational algebra allows us to form expressions of arbitrary complexity by applying operations on the result of other operations What are titles and years of movies made by Fox that are at least 100 minutes long select title, year from Movies where length >= 100 and studio-name = ‘Fox’; Expression tree: ∏ title, year ∩ σ length ≥ σ studio-name = ‘Fox’ Movies Movies Linear notations: ∏ title,year (σlength≥100(MOVIES) ∩ σstudio-name=‘Fox’(MOVIES))

Naming & renaming (ρS(A1,A2,…,An) (R)): Rename relation (R) to be relation (S) and rename its attributes to be A1,A2,…,An in that order Relationship among operations R ∩ S = R – (R – S) R ⋈C S = σc(RxS) R ⋈ S = ∏L (σc (RxS)), where “L” is list of attributes in “R” followed by those of “S” that are not also in “R” U ⋈A<D AND U.B ≠ V.B V = σA<D AND U.B ≠ V.B (U x V)

Linear notation for Algebraic Expressions: select title, year from Movies where length >= 100 and studio-name = ‘Fox’; Assume attributes of Movies relation are: (t,y,l,i,s,p) Expression tree: ∏ title, year ∩ σ length ≥ σ studio-name = ‘Fox’ Movies Movies Then we can represent the tree graph as follows: R(t,y,l,i,s,p) := σl ≥ 100 (Movies) S(R(t,y,l,i,s,p) := σs=‘Fox’ (Movies) T(t,y,l,i,s,p) := R ∩ S Answer(title,year) := ∏t,y (T)

Constraints on Relations(1)
Relational Algebra as a Constraint Language Two ways to use expressions of relational algebra to express constraints: If R is an expression, then R = 0 is a constraint – no tuples in the result R If R and S are expressions of relational algebra, then R  S is a constraint that says every tuple in R must also be in S. Of course S may contain additional tuples not in R Referential Integrity Constraints Referential Integrity constraint asserts that a value appearing in one context also appears in another related context ∏A(R) - ∏B(S) = or equivalently ∏A(R)  ∏B(S)

Constraints on Relations (2)
Key Constraints To express algebraically that an attribute or set of attributes is a key for a relation R Let us use two names for the above relation: R and S and the attribute key is name, and another random attribute is address σ R.name=S.name AND R.address ≠ S.address (R x S) = 0 Additional Constraints Assume gender of movie start has to be ‘M’ or ‘F’ only σgender≠’M’ AND gender≠’F’ (MovieStar) = 0

Database Design: DBS CB, 2nd Edition

Similar presentations

Presentation on theme: "Database Design: DBS CB, 2nd Edition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Design: DBS CB, 2nd Edition

Similar presentations

Presentation on theme: "Database Design: DBS CB, 2nd Edition"— Presentation transcript:

Similar presentations

About project

Feedback