CENG 352 Database Management Systems İsmail Sengör Altıngövde URL:
CENG 352 Instructor: İsmail Sengör Altıngövde Office: A Lecture Hours: Mon 15:40-17:30 (BMB4); Thu. 16:40 (BMB4) Course Web page: Teaching Assistant: Abdullah Doğan –A-206, 2
Text Books and References 1.Raghu Ramakrishnan, Database Management Systems, McGraw Hill, 3 rd edition, 2003 (text book). 2.M. Kifer, A. Bernstein and P. Lewis, Database Systems: An Application Oriented Approach, 2nd edition, Addison Wesley, A. Silberschatz, H.F. Korth, S. Sudarshan, Database System Concepts, McGraw Hill, 6 th edition, R. Elmasri, S.B. Navathe, Fundamentals of Database Systems, 6 th edition, Addison-Wesley, H. Garcia-Molina, J. D. Ullman, J. Widom, Database Systems The Complete Book, Prentice Hall,
Required Background Material from CENG351 (Entity- Relationship modeling, relational data model, relational algebra, B+-trees and hash-based indexing, algorithms for sorting/merging data files) Intermediate programming skills –C/C++/Java
Course Outline Review (Relational Model) –E/R modeling –SQL Database Application Development and Internet Applications –Embedded SQL, cursors, dynamic SQL, JDBC classes and interfaces, stored procedures. –http protocol, html/xml documents, xml dtd, html forms, javascript, style sheets, cgi, application servers, servlets, java server pages.
Course Outline (cont.) Relational Database Design –Functional Dependency Theory Attribute closure, FD set closure, 3NF, BCNF, decomposition lossless-join decomposition, dependency- preserving decomposition. –Normalization of Relations Decomposition into 3NF, BCNF. –Other Types of Dependencies Multivalued dependencies, join dependencies, 4NF, 5NF.
Course Outline (cont.) Query Processing –Algorithms for relational operators (sorting, indexing (hashing, tree-indexing), nested-loops join, sort-merge join, hash join), query evaluation plans. Query Optimization –Translation of SQL queries to RA, estimating the cost of an execution plan, equivalences of RA expressions, enumeration of alternative plans.
Course Outline (cont.) Transaction Processing –Properties of transactions, concurrent execution, schedules, recoverability. Concurrency Control –Serializability theory, two-phase locking, deadlock problem, timestamp-ordering, optimistic concurrency control. Crash Recovery –Logging, checkpointing, recovering from a system crash, example systems like ARIES.
Grading Midterm 25 % Final Exam 35 % Assignments & Quizzes 20 % Project 20 % Attendance: –Once in every week –50% attendance is mandatory to get the final exam 9
Grading Policies Policy on final exam: –Attendance should be > 50%, AND –The weighted average of the assignments & quizzes, OR the midterm grade should be at least 30 Policy on missed midterm: –make-up exam in the following week (only for the legal excuses and if informed beforehand) Lateness policy: –No late submission for the written assignments. –For the programming homework(s): can be late up to 3 days. Afterwards, will be penalized by 10%* day 2 –All assignments are to be your own work. 10
Projects Projects in groups of two. –Determine your groups till the end of add/drops, i.e., Oct 3 Friday, 24:00 3 stages: –Proposal & Preliminary E/R Report –Design Report –Final Report + Demo 11
Questions?
CENG Files Data is not scattered hither and thither on disk. Instead, it is organized into files. Files are organized into records. Records are organized into fields.
File-Based Systems Collection of application programs that perform services for the end users (e.g. reports). Each program defines and manages its own data. 14
File-Based Processing Pearson Education ©
Limitations of File-Based Approach Separation and isolation of data –Each program maintains its own set of data. –Users of one program may be unaware of potentially useful data held by other programs. Duplication of data –Same data is held by different programs. –Wasted space and potentially different values and/or different formats for the same item. 16
Limitations of File-Based Approach Data dependence –File structure is defined in the program code. Incompatible file formats –Programs are written in different languages, and so cannot easily access each other’s files. Fixed Queries/Proliferation of application programs –Programs are written to satisfy particular functions. –Any new requirement needs a new program. 17
Database Approach Arose because: –Definition of data was embedded in application programs, rather than being stored separately and independently. –No control over access and manipulation of data beyond that imposed by application programs. Result: –the database and Database Management System (DBMS). 18
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke19 Database Management Systems Chapter 1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke20 What Is a DBMS? A very large, integrated collection of data. Models real-world enterprise. Entities (e.g., students, courses) Relationships (e.g., Madonna is taking CS564) A Database Management System (DBMS) is a software package designed to store and manage databases.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke21 Typical DBMS Functionality Define a database : in terms of data types, structures and constraints Construct or load the database on a secondary storage medium Manipulating the database : querying, generating reports, insertions, deletions and modifications to its content Concurrent Processing and Sharing by a set of users and programs – yet, keeping all data valid and consistent
Database Management System (DBMS) Pearson Education ©
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke23 Why Use a DBMS? Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke24 Why Study Databases?? Shift from computation to information at the “low end”: scramble to webspace (a mess!) at the “high end”: scientific applications Datasets increasing in diversity and volume. Digital libraries, interactive video, Human Genome project, EOS project ... need for DBMS exploding DBMS encompasses most of CS OS, languages, theory, AI, multimedia, logic ?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke25 Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using a given data model. The relational model of data is the most widely used model today. Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke26 Instance of Students Relation Students( sid: string, name: string, login: string, age: integer, gpa: real ) sidnameloginagegpa
Data Models Object-Based Data Models Entity-Relationship Semantic Functional Object-Oriented. Record-Based Data Models Relational Data Model Network Data Model Hierarchical Data Model. Physical Data Models Pearson Education ©
Relational Data Model Pearson Education ©
Network Data Model Pearson Education ©
Hierarchical Data Model Pearson Education ©
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke31 Levels of Abstraction Single conceptual (logical) schema and physical schema, many views * Schemas are defined using DDL; data is modified/queried using DML. Physical Schema Conceptual Schema View 1View 2View 3 Conceptual schema defines logical structure Physical schema describes the files and indexes used External shemata (views) describe how users see the data.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke32 Example: University Database Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string,enrollment:integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke33 Data Independence * Applications insulated from how data is structured and stored. Logical data independence : Protection from changes in logical structure of data. Physical data independence : Protection from changes in physical structure of data. * One of the most important benefits of using a DBMS!
Data Independence Logical Data Independence –Refers to immunity of external schemas to changes in conceptual schema. –Conceptual schema changes (e.g. addition/removal of entities). –Should not require changes to external schema or rewrites of application programs. 34
Data Independence Physical Data Independence –Refers to immunity of conceptual schema to changes in the internal schema. –Physical schema changes (e.g. using different file organizations, storage structures/devices). –Should not require change to conceptual or external schemas. 35
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke36 Concurrency Control Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems don’t arise: users can pretend they are using a single-user system.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke37 Transaction: An Execution of a DB Program Key concept is transaction, which is an atomic sequence of database actions (reads/writes). Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints. Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke38 Structure of a DBMS A typical DBMS has a layered architecture. The figure does not show the concurrency control and recovery components. This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB These layers must consider concurrency control and recovery
Database Languages Data Definition Language (DDL) –Allows the DBA or user to describe and name entities, attributes, and relationships required for the application –plus any associated integrity and security constraints. Data Manipulation Language (DML) –Provides basic data manipulation operations on data held in the database. 39
System Catalog Repository of information (metadata) describing the data in the database. One of the fundamental components of DBMS. Typically stores: –names, types, and sizes of data items; –constraints on the data; –names of authorized users; –data items accessible by a user and the type of access; –usage statistics. 40
The DBMS Marketplace Relational DBMS companies – Oracle, Sybase – are among the largest software companies in the world. IBM offers its relational DB2 system. Microsoft offers SQL-Server, plus Microsoft Access for the cheap DBMS on the desktop, answered by “lite” systems from other competitors. Relational companies also challenged by “object-oriented DB” companies. But countered with “object-relational” systems, which retain the relational core while allowing type extension as in OO systems. 41
Databases make these folks happy... End users and DBMS vendors DB application programmers – E.g., smart webmasters Database administrator (DBA) – Designs logical /physical schemas – Handles security and authorization – Data availability, crash recovery – Database tuning as needs evolve Must understand how a DBMS works!
Summary DBMS used to maintain, query large datasets. Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are well-paid! DBMS R&D is one of the broadest, most exciting areas in CS.