Principles of Database Management Systems CSE 544

Slides:



Advertisements
Similar presentations
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
Advertisements

Introduction to Database Systems CSE 444 Lecture #1 January 5th, 1998.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Databases and Database Management System. 2 Goals comprehensive introduction to –the design of databases –database transaction processing –the use of.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Functional Dependencies and Relational Schema Design.
The Relational Data Model Database Model (ODL, E/R) Relational Schema Physical storage ODL definitions Diagrams (E/R) Tables: row names: attributes rows:
Principles of Database Management Systems CSE 544 Introduction March 31st, 1999.
CS462: Introduction to Database Systems. ©Silberschatz, Korth and Sudarshan1.2Database System Concepts Course Information Instructor  Kyoung-Don (KD)
Introduction. 
The Worlds of Database Systems Chapter 1. Database Management Systems (DBMS) DBMS: Powerful tool for creating and managing large amounts of data efficiently.
Introduction to Database Systems Fundamental Concepts Irvanizam Zamanhuri, M.Sc Computer Science Study Program Syiah Kuala University Website:
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 2: Intro to Relational.
Database Management Systems CSE 590DB Introduction March 30, 1998.
Christoph F. Eick Introduction Data Management Today 1. Introduction to Databases 2. Questionnaire 3. Course Information 4. Grading and Other Things.
CSE544 Introduction Monday, March 29, Staff Instructor: Dan Suciu –CSE 662, –Office hours: Tuesday, 1-2pm. TA: Nilesh Dalvi.
1 Lecture 08: E/R Diagrams and Functional Dependencies Friday, January 21, 2005.
CSE 326: Data Structures Lecture #22 Databases and Sorting Alon Halevy Spring Quarter 2001.
Introduction to Database Systems CSE 444 Lecture #1 September,
The Relational Data Model Database Model (ODL, E/R) Relational Schema Physical storage ODL definitions Diagrams (E/R) Tables: row names: attributes rows:
Lecture 11: Functional Dependencies
CPSC-310 Database Systems
Databases and DBMSs Todd S. Bacastow January
CS4222 Principles of Database System
Introduction to Database Systems CSE 444
Modeling Constraints Extracting constraints is what modeling is all about. But how do we express them? Examples: Keys: social security number uniquely.
COP5725 Database Management ER DIAGRAM AND RELATIONAL DATA MODEL
Datab ase Systems Week 1 by Zohaib Jan.
Introduction to the database systems (1)
CS 245: Database System Principles Notes 01: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
MODELS OF DATABASE AND DATABASE DESIGN
Database Management System
Chapter 4 Relational Databases
Introduction to Database Systems
Relational Algebra Chapter 4, Part A
Translation of ER-diagram into Relational Schema
Problems in Designing Schema
From ER to Relational Model
Basic Concepts in Data Management
Lecture 2: Database Modeling (end) The Relational Data Model
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Lecture 06 Data Modeling: E/R Diagrams
Cse 344 May 16th – Normalization.
Functional Dependencies and Relational Schema Design
CSE544 Lecture 1: Introduction
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Building a Database Application
Introduction to Database Systems CSE 444
Introduction to Database Management Systems
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Lecture 8: Database Design
Introduction to Database Systems CSE 444
Database management concepts
Lecture 07: E/R Diagrams and Functional Dependencies
Introduction to Database Systems CSE 444
Lecture 5: The Relational Data Model
Functional Dependencies
Terminology Product Attribute names Name Price Category Manufacturer
Syllabus Introduction Website Management Systems
Lecture 08: E/R Diagrams and Functional Dependencies
Introduction to Database Systems CSE 444
Introduction to Database Systems CSE 444
Lecture 6: Functional Dependencies
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Lecture 09: Functional Dependencies
Presentation transcript:

Principles of Database Management Systems CSE 544 Introduction March 31st, 1999

Staff Instructor: Alon Levy TAs: Zack Ives and Rachel Pottinger Sieg, Room 310, alon@cs.washington.edu Office hours: wed, 2:30-3:30. Or by email. TAs: Zack Ives and Rachel Pottinger Office hours: Zack: Mondays at noon (224) Rachel: Thursdays at 2:30pm (224) Mailing list: cse544@cs Web page: (a lot of stuff already there) http://www.cs.washington.edu/education/courses/544/99sp/

Course Times In general, WF, 12-1:20pm (with a 5 minute breather in the middle). Two special dates: Monday, April 5th Monday, April 19th No classes on last week.

Goals of the Course Purpose: Foundations of database management systems. Issues in building database systems. Introduction to current research issues in databases. Have fun: databases are not just bunches of tuples.

Grading Homeworks: 15% Project: 25% Midterm: 15% Final: 35% SQL querying fun Join implementations Project: 25% A query optimization engine for data integration. Midterm: 15% Final: 35% Participation and intangibles: 10%

Textbook Database System Implementation, Ullman, Widom, and Garcia-Molina, to be published by Prentice-Hall in June; available from the copy center.

Other Useful Texts Database Management Systems (Ramakrishnan) Foundations of Databases (Abiteboul, Hull & Vianu) Parallel and Distributed DBMS (Ozsu and Valduriez) Transaction Processing (Gray and Reuter) Database Systems (Silberschatz, Korth and Sudarshan) Data and Knowledge based Systems (volumes I, II) (Ullman) Readings in Database Systems (Stonebraker and Hellerstein) Proceedings of SIGMOD, VLDB, PODS conferences.

Prerequisites

Real Prerequisites Operating systems Data structures and algorithms Distributed systems Complexity theory Mathematical Logic Knowledge Representation User interface design Programming languages Artificial Intelligence (Search) Greek, Hebrew, French

Why Use a DBMS? All programs manipulate data, so why use a database? Large amounts of data (Giga’s, Tera’s) Data is very structured Persistent data Valuable data Performance requirements Concurrent access to the data Restricted access to data

Functionality of a DBMS Persistent storage management Transaction management Resiliency: recovery from crashes. Separation between logical and physical views of the data. High level query and data manipulation language. Efficient query processing Interface with programming languages

Persistent Storage Becomes a hard problem because of the interaction with the other levels of the DBMS: What are we storing? Efficient indexing Special issues due to resiliency requirements Exploit “semantic” knowledge Issue: interaction with the operating system. Should we rely on the OS?

Transaction Processing and Recovery For efficient use of resources, we want concurrent access to data. Systems sometimes crash. A “real” database guarantees ACID: Atomicity: all or nothing of a transaction. Consistency: always leave the DB consistent. Isolation: every transaction runs as if it’s the only one in the system. Durability: if committed, we really mean it. Do we really want ACID?

Physical vs. Logical Levels External Schema 1 External Schema 2 Conceptual schema: tables and their attributes Physical schema: files, indexes hash tables. External schema: views of the different applications, classes of users. Relational Schema System catalog: The component of the database that stores meta data. Conceptual design: a precursor to the relational schema. Physical Schema Disk

The Relational Model Data is organized into tables with attributes. Rows in the tables are tuples. The power of simplicity!

Logical Model Issues What data model should we use? Relational, object-oriented, object-relational, deductive database model, semi-structured How do we design a good schema? (normal forms, index selection) Are we really providing an abstraction? How does this abstraction interact with the programming language? (the impedance mismatch).

Querying a Database Find all the students who have taken CSE444 in Winter, 1998. S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CSE444 and E.quarter=“Winter, 1998” SQL also provides update facilities. SQL: an acquired taste (try datalog first)

Issues in Query Languages Does it provide the appropriate functionality? SQL books get thicker and thicker. Expressive power of a query language. Ease of use (query by example) Declarativity Provide guidance in writing “good” queries?

Query Optimization A query is a declarative specification of “what” you want. A query execution plan is an imperative program to produce the answer. Query optimization: produce an efficient query execution plan. Issues: large search space of plans, cost estimation, semantic transformations Real goal: avoid the bad plans.

Database Industry Relational databases are a great success of theoretical ideas. “Big 3” DBMS companies are among the largest software companies in the world. IBM (with DB2) and Microsoft (SQL Server, Microsoft Access) are also important players. $20B industry Moving to warehousing, decision support.

Course (Rough) Outline The basics: The relational model SQL Views, integrity constraints Conceptual modeling datalog (recursive queries) Physical representation: Index structures.

Course Outline (cont) Query execution: Query Optimization Algorithms for joins, selections, projections. Query Optimization Advanced topics: data integration data mining semi-structured data Transaction processing

The relational data model

Terminology Product (relation name) Attribute names Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples (Arity=4) Product(name: string, Price: real, category: enum, Manufacturer: string)

More Terminology Every attribute has an atomic type. Relation Schema: relation name + attribute names + attribute types Relation instance: a set of tuples. Only one copy of any tuple! (not) Database Schema: a set of relation schemas. Database instance: a relation instance for every relation in the schema.

More on Tuples Formally, a mapping from attribute names to (correctly typed) values: name gizmo price $19.99 category gadgets manufacturer GizmoWorks Sometimes we refer to a tuple by itself: (note order of attributes) (gizmo, $19.99, gadgets, GizmoWorks) or Product (gizmo, $19.99, gadgets, GizmoWorks).

Integrity Constraints An important functionality of a DBMS is to enable the specification of integrity constraints and to enforce them. Knowledge of integrity constraints is also useful for query optimization. Examples of constraints: keys, superkeys foreign keys domain constraints, tuple constraints. Functional dependencies, multivalued dependencies.

Keys A minimal set of attributes that uniquely the tuple (I.e., there is no pair of tuples with the same values for the key attributes): Person: social security number name name + address name + address + age Perfect keys are often hard to find, but organizations usually invent something anyway. Superkey: a set of attributes that contains a key. A relation may have multiple keys: (but only one primary key) employee number, social-security number

Foreign Key Constraints Purchase: buyer price product Joe $20 gizmo Jack $20 E-gizmo Product: name manufacturer description gizmo G-sym great stuff E-gizmo G-sym even better An attribute of a relation R is must refer to a key of a relation S.

Functional Dependencies Definition: If two tuples agree on the attributes A , A , … A 1 2 n then they must also agree on the attributes B , B , … B 1 2 m Formally: A , A , … A B , B , … B 1 2 n 1 2 m Key of a relation: all the attributes are either on the left or right.

Some Obvious Properties of FD’s A , A , … A B , B , … B Is equivalent to 1 2 n 1 2 m A , A , … A B 1 2 n 1 Splitting rule and Combing rule A , A , … A B 1 2 n 2 … A , A , … A B 1 2 n m A , A , … A A Always holds. 1 2 n i

Comparing Functional Dependencies Entailment: a set of functional dependencies S1 entails a set S2 if: any database that satisfies S1 much also satisfy S2. Example: A B, B C entails A C Equivalence: two sets of FD’s are equivalent if each entails the other. {A B, B C } is equivalent to {A B, A C, B C} Closure: Given a set of attributes A and a set of dependencies C, we want to find all the other attributes that are functionally determined by A.

Closure Algorithm Start with Closure=A. Until closure doesn’t change do: if is in C, and B is not in Closure then add B to closure. A , A , … A B 1 2 n A , A , … A Are all in the closure, and 1 2 n

Problems in Designing Schema Name SSN Phone Number Fred 123-321-99 (201) 555-1234 Fred 123-321-99 (206) 572-4312 Joe 909-438-44 (908) 464-0028 Joe 909-438-44 (212) 555-4000 Problems: - redundancy - update anomalies - deletion anomalies

Relation Decomposition Break the relation into two relations: Name SSN Fred 123-321-99 Joe 909-438-44 Name Phone Number Fred (201) 555-1234 Fred (206) 572-4312 Joe (908) 464-0028 Joe (212) 555-4000

Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if and only if: Whenever there is a nontrivial dependency for R , it is the case that { } is a super-key for R. A , A , … A B 1 2 n A , A , … A 1 2 n In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.