Download presentation
Presentation is loading. Please wait.
Published byJane Phillips Modified over 9 years ago
1
Principles of Database Management Systems CSE 544 Introduction March 31st, 1999
2
Staff zInstructor: Alon Levy ySieg, Room 310, alon@cs.washington.edu yOffice hours: wed, 2:30-3:30. Or by email. zTAs: Zack Ives and Rachel Pottinger yOffice hours: x Zack: Mondays at noon (224) x Rachel: Thursdays at 2:30pm (224) zMailing list: cse544@cs zWeb page: (a lot of stuff already there) http://www.cs.washington.edu/education/courses/544/99sp/
3
Course Times zIn general, WF, 12-1:20pm (with a 5 minute breather in the middle). zTwo special dates: yMonday, April 5th yMonday, April 19th zNo classes on last week.
4
Goals of the Course zPurpose: yFoundations of database management systems. yIssues in building database systems. yIntroduction to current research issues in databases. yHave fun: databases are not just bunches of tuples.
5
Grading zHomeworks: 15% ySQL querying fun yJoin implementations zProject: 25% yA query optimization engine for data integration. zMidterm: 15% zFinal: 35% zParticipation and intangibles: 10%
6
Textbook zDatabase System Implementation, Ullman, Widom, and Garcia-Molina, to be published by Prentice-Hall in June; available from the copy center.
7
Other Useful Texts zDatabase Management Systems (Ramakrishnan) zFoundations of Databases (Abiteboul, Hull & Vianu) zParallel and Distributed DBMS (Ozsu and Valduriez) zTransaction Processing (Gray and Reuter) zDatabase Systems (Silberschatz, Korth and Sudarshan) zData and Knowledge based Systems (volumes I, II) (Ullman) zReadings in Database Systems (Stonebraker and Hellerstein) zProceedings of SIGMOD, VLDB, PODS conferences.
8
Prerequisites
9
Real Prerequisites zOperating systems zData structures and algorithms zDistributed systems zComplexity theory zMathematical Logic zKnowledge Representation z User interface design z Programming languages z Artificial Intelligence (Search) z Greek, Hebrew, French
10
Why Use a DBMS? Large amounts of data (Giga’s, Tera’s) Data is very structured Persistent data Valuable data Performance requirements Concurrent access to the data Restricted access to data All programs manipulate data, so why use a database?
11
Functionality of a DBMS zPersistent storage management zTransaction management zResiliency: recovery from crashes. zSeparation between logical and physical views of the data. yHigh level query and data manipulation language. yEfficient query processing zInterface with programming languages
12
Persistent Storage zBecomes a hard problem because of the interaction with the other levels of the DBMS: yWhat are we storing? yEfficient indexing ySpecial issues due to resiliency requirements yExploit “semantic” knowledge zIssue: interaction with the operating system. Should we rely on the OS?
13
Transaction Processing and Recovery zFor efficient use of resources, we want concurrent access to data. zSystems sometimes crash. ACID zA “real” database guarantees ACID: yAtomicity: all or nothing of a transaction. yConsistency: always leave the DB consistent. yIsolation: every transaction runs as if it’s the only one in the system. yDurability: if committed, we really mean it. zDo we really want ACID?
14
Physical vs. Logical Levels External Schema 1External Schema 2 Relational Schema Physical Schema Disk Conceptual schema: tables and their attributes Physical schema: files, indexes hash tables. External schema: views of the different applications, classes of users. System catalog: The component of the database that stores meta data. Conceptual design: a precursor to the relational schema.
15
The Relational Model Data is organized into tables with attributes. Rows in the tables are tuples. The power of simplicity!
16
Logical Model Issues zWhat data model should we use? yRelational, object-oriented, object-relational, deductive database model, semi-structured zHow do we design a good schema? (normal forms, index selection) zAre we really providing an abstraction? zHow does this abstraction interact with the programming language? (the impedance mismatch).
17
Querying a Database zFind all the students who have taken CSE444 in Winter, 1998. zS(tructured) Q(uery) L(anguage) yselect E.name yfrom Enroll E ywhere E.course=CSE444 and y E.quarter=“Winter, 1998” zSQL also provides update facilities. zSQL: an acquired taste (try datalog first)
18
Issues in Query Languages zDoes it provide the appropriate functionality? ySQL books get thicker and thicker. zExpressive power of a query language. zEase of use (query by example) zDeclarativity zProvide guidance in writing “good” queries?
19
Query Optimization zA query is a declarative specification of “what” you want. zA query execution plan is an imperative program to produce the answer. zQuery optimization: produce an efficient query execution plan. zIssues: large search space of plans, cost estimation, semantic transformations zReal goal: avoid the bad plans.
20
Database Industry zRelational databases are a great success of theoretical ideas. z“Big 3” DBMS companies are among the largest software companies in the world. zIBM (with DB2) and Microsoft (SQL Server, Microsoft Access) are also important players. z$20B industry zMoving to warehousing, decision support.
21
Course (Rough) Outline zThe basics: yThe relational model ySQL yViews, integrity constraints yConceptual modeling ydatalog (recursive queries) zPhysical representation: yIndex structures.
22
Course Outline (cont) zQuery execution: yAlgorithms for joins, selections, projections. zQuery Optimization zAdvanced topics: ydata integration ydata mining ysemi-structured data zTransaction processing
23
The relational data model
24
Terminology Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples Attribute names Product (relation name) Product(name: string, Price: real, category: enum, Manufacturer: string) (Arity=4)
25
More Terminology Every attribute has an atomic type. Relation Schema: relation name + attribute names + attribute types Relation instance: a set of tuples. Only one copy of any tuple! (not) Database Schema: a set of relation schemas. Database instance: a relation instance for every relation in the schema.
26
More on Tuples Formally, a mapping from attribute names to (correctly typed) values: name gizmo price $19.99 category gadgets manufacturer GizmoWorks Sometimes we refer to a tuple by itself: (note order of attributes) (gizmo, $19.99, gadgets, GizmoWorks) or Product (gizmo, $19.99, gadgets, GizmoWorks).
27
Integrity Constraints An important functionality of a DBMS is to enable the specification of integrity constraints and to enforce them. Knowledge of integrity constraints is also useful for query optimization. Examples of constraints: keys, superkeys foreign keys domain constraints, tuple constraints. Functional dependencies, multivalued dependencies.
28
Keys A minimal set of attributes that uniquely the tuple (I.e., there is no pair of tuples with the same values for the key attributes): Person: social security number name name + address name + address + age Perfect keys are often hard to find, but organizations usually invent something anyway. Superkey: a set of attributes that contains a key. A relation may have multiple keys: (but only one primary key) employee number, social-security number
29
Foreign Key Constraints Purchase: buyer price product Joe $20 gizmo Jack $20 E-gizmo Product: name manufacturer description gizmo G-sym great stuff E-gizmo G-sym even better An attribute of a relation R is must refer to a key of a relation S.
30
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally: A, A, … A 12n B, B, … B 12m Key of a relation: all the attributes are either on the left or right.
31
Some Obvious Properties of FD’s A, A, … A 12n B, B, … B 12m A, A, … A 12n 1 Is equivalent to B A, A, … A 12n 2 B 12n m B … 12n i A Always holds. Splitting rule and Combing rule
32
Comparing Functional Dependencies Entailment: a set of functional dependencies S1 entails a set S2 if: any database that satisfies S1 much also satisfy S2. Example: A B, B C entails A C Equivalence: two sets of FD’s are equivalent if each entails the other. {A B, B C } is equivalent to {A B, A C, B C} Closure: Given a set of attributes A and a set of dependencies C, we want to find all the other attributes that are functionally determined by A.
33
Closure Algorithm Start with Closure=A. Until closure doesn’t change do: if is in C, and B is not in Closure then add B to closure. A, A, … A 12n B 12n Are all in the closure, and
34
Problems in Designing Schema Name SSN Phone Number Fred 123-321-99 (201) 555-1234 Fred 123-321-99 (206) 572-4312 Joe 909-438-44 (908) 464-0028 Joe 909-438-44 (212) 555-4000 Problems: - redundancy - update anomalies - deletion anomalies
35
Relation Decomposition Name SSN Fred 123-321-99 Joe 909-438-44 Name Phone Number Fred (201) 555-1234 Fred (206) 572-4312 Joe (908) 464-0028 Joe (212) 555-4000 Break the relation into two relations:
36
Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if and only if: Whenever there is a nontrivial dependency for R, it is the case that { } is a super-key for R. A, A, … A 12n B 12n In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.