Download presentation
Presentation is loading. Please wait.
Published byOwen Quinn Modified over 9 years ago
1
1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were developed by Professor Ghandeharizadeh
2
Logistics Required text book: Required text book: Database System Concepts, Silberschatz, Korth & Sudarshan, Fifth edition. Pre-req for the course: Pre-req for the course: CS201: Data Structures Knowledge of an object-oriented programming language such as C++, Java, C#
3
3 Teaching Assistant Shahin Shayande Office: (Microsoft Lab)SAL 200C Office Hours: TBA
4
4 Grading Midterm 1: 35% Midterm 1: 35% Midterm 2: 35% Midterm 2: 35% Project & Assignments: 30% Project & Assignments: 30%
5
5 What to do immediately? Register with the web site Register with the web sitehttp://dblab.usc.edu/csci485
6
6 Database Management System (DBMS) Database: An integrated collection of data, usually stored on secondary storage, typically describing the activities of one or more related organizations. Database: An integrated collection of data, usually stored on secondary storage, typically describing the activities of one or more related organizations. Database management system (DBMS): A collection of software/programs designed to assist in maintaining and utilizing large collections of data. Database management system (DBMS): A collection of software/programs designed to assist in maintaining and utilizing large collections of data. DBMS contains information about a particular enterprise DBMS contains information about a particular enterprise Used almost on a daily basis for either individual or business use. Used almost on a daily basis for either individual or business use. Relational database vendors were one of the fastest growing sectors during the.COM boom! Relational database vendors were one of the fastest growing sectors during the.COM boom!
7
7 BEFORE DBMS In the early days, database applications were built on top of file systems Data Data User 1 User 2 Application programs
8
8 Data managed by DBMS AFTER DBMS User 1 User 2 DBMS Application programs
9
9 WHY A DBMS? 1. Reduced application development time 2. Data independence: Application programs not dependent on data representation and storage details 3. Data sharing: data is better utilized (discovered and reused), redundancy of data is minimized 4. Data integrity and consistency: one may enforce consistency constraints on data, e.g., number of seats sold ≤ number of seats on the plane × 1.1 5. Centralized control: DBA tunes the database to balance user's needs 6. Security: mechanisms to prevent unauthorized access. These mechanisms are based on content instead of file- oriented approach. 7. Concurrency control: avoids undesirable race conditions that arise with simultaneous access/updates to data 8. Crash recovery: ensures the integrity of data in the presence of failures
10
10 DATABASE MANAGEMENT SYSTEMS ARCHITECTURE User 1 User n Conceptual schema Physical data DB DBMS
11
11 Data Models A collection of tools for describing A collection of tools for describing Data Data relationships Data semantics Data constraints Relational model Relational model Entity-Relationship data model (mainly for database design) Entity-Relationship data model (mainly for database design) Object-based data models (Object-oriented and Object- relational) Object-based data models (Object-oriented and Object- relational) Semistructured data model (XML) Semistructured data model (XML) Other older models: Other older models: Network model Hierarchical model
12
Challenges Conceptual Logical Physical Abstraction, Inheritance, Encapsulation Reduction to tables with minimal: data duplication, potential for data loss and update anomalies Effective use of a DBMS, management of mismatch between tables and OO constructs, Index structures, CC & Crash recovery, Optimization techniques
13
Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model Entities, Attributes, Relationships Emp SS# name address
14
Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model Entities, Attributes, Relationships Enrolled in Emp SS# name address Health Plan name Co-Pay
15
Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships Married to Emp SS# name address
16
Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships Works for Emp SS# name address
17
Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships Works for Emp SS# name address date
18
Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships Inheritance sid student name ISA graduate Undergrad Specialization Generalization
19
Conceptual Data Models Abstraction, Inheritance, Encapsulation Abstraction, Inheritance, Encapsulation Exercise these concepts using in-class examples and homework assignments Exercise these concepts using in-class examples and homework assignments A library database contains a listing of authors who have written books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects.
20
Conceptual Data Models Abstraction, Inheritance, Encapsulation Abstraction, Inheritance, Encapsulation Exercise these concepts using in-class examples and homework assignments Exercise these concepts using in-class examples and homework assignments A library database contains a listing of authors who have written books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects. Entity sets: authors, subjects, books, libraries Relationship sets: wrote, carry, indexed
21
Conceptual Data Models Abstraction, Inheritance, Encapsulation Abstraction, Inheritance, Encapsulation Exercise these concepts using in-class examples and homework assignments Exercise these concepts using in-class examples and homework assignments A library database contains a listing of authors who have written books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects. carry books indexwrote subjectauthors SS# name titleisbn Subject matter libraries address
22
Data Models Logical Physical Works for Emp SS# name address
23
Relational Data Model Prevalent in today’s market place. Prevalent in today’s market place. Why? Performance! Everything is a table! Everything is a table! Logical data design is the process of reducing an ER diagram to a collection of tables. Logical data design is the process of reducing an ER diagram to a collection of tables.
24
Logical Data Design Trivial reduction: Trivial reduction: An entity set = a table A relationship set = a table Pitfalls: Pitfalls: Duplication of data Unintentional loss of data Data ambiguity that impacts software design, resulting in update anomalies
25
Data Duplication Works for Emp SS# name address 396ShahramSeattle 400AsokeChicago 200Joe New York 396400200400 120400 SS#NameAddress SS#MGRSS#
26
Data Duplication The SS# column is duplicated! The SS# column is duplicated! Works for Emp SS# name address 396ShahramSeattle 400AsokeChicago 200Joe New York 396400200400 120400 SS#NameAddress SS#MGRSS#
27
Data Duplication: Solution Merge the two tables into one: Merge the two tables into one: 396ShahramSeattle400 400AsokeChicagoNULL 200Joe New York 400 SS#NameAddressMGRSS# Works for Emp SS# name address
28
Data Loss Ford maintains warehouses containing different automobile parts Ford maintains warehouses containing different automobile parts Records are inserted and deleted based on availability of a part at a warehouse Records are inserted and deleted based on availability of a part at a warehouse 123PistonTijuana 203CylinderMichigan 877BumperMichigan 389SeatsArizona Part#DescriptionLocation
29
Data Loss (Cont…) When a warehouse becomes empty, it is lost from the database: When a warehouse becomes empty, it is lost from the database: Solution: utilize two different tables Solution: utilize two different tables 123PistonTijuana 389SeatsArizona Part#DescriptionLocation 123Piston12389Seats45 Part#DescriptionWHID 12Tijuana45Arizona WHIDLocation
30
Data Ambiguity Represent faculty of a department as: Represent faculty of a department as: A change of address for a faculty might be for the entire department. This cannot be differentiated with this table design! A change of address for a faculty might be for the entire department. This cannot be differentiated with this table design! Ghandeharizadeh Comp Sci SAL Zartoshty SAL Bohem SAL FacultyDepartmentLocation
31
Data Ambiguity Utilize two tables: Utilize two tables: Ghandeharizadeh Comp Sci Zartoshty Jenkins Bio Medical Bohem Comp Sci FacultyDepartment SAL Sex Ed BOVARD Bio Medical HEDCO DepartmentLocation
32
Data Ambiguity (Cont…) Employees of a bi-lingual company having different skills. Employees of a bi-lingual company having different skills. Update anomalies! Update anomalies! AsokeTeachHindi AsokeCookFrench AsokeNullGerman AsokeProgramEnglish EmployeeSkillLanguage
33
Data Ambiguity: Solution Utilize two tables: Utilize two tables: AsokeTeach AsokeCook AsokeProgram EmployeeSkill AsokeHindiAsokeFrench AsokeGerman AsokeEnglish EmployeeLanguage
34
Logical Data Design A quest to flatten objects with minimal data duplication, loss of data, and update anomalies! A quest to flatten objects with minimal data duplication, loss of data, and update anomalies! William Kent, “A Simple Guide to Five Normal Forms in Relational Database Theory”, Communications of the ACM 26(2), Feb 1983, 120-125. William Kent, “A Simple Guide to Five Normal Forms in Relational Database Theory”, Communications of the ACM 26(2), Feb 1983, 120-125.
35
Data Models Physical Works for Emp SS# name address Logical Data Design 396ShahramSeattle400400AsokeChicagoNull SS#NameAddress MGR SS#
36
Physical Implementation Reconstruct main memory objects for manipulation and presentation: Reconstruct main memory objects for manipulation and presentation: Specify class definitions Typically correspond to entity-sets Populate an instance of a class by issuing SQL queries to a DBMS Update instances in memory Flush dirty instances back to DBMS Potential use of transactions
37
Type Mismatch A column of a row must be a primitive such as an integer, real, etc. A column of a row must be a primitive such as an integer, real, etc. It may NOT be an array of integers or object pointers A property (attribute) of a class might be of a multi-valued type, e.g., an array, a vector, etc. A property (attribute) of a class might be of a multi-valued type, e.g., an array, a vector, etc. Changes in software may impact the design of tables. (Management of type mismatch by the system designer.) Changes in software may impact the design of tables. (Management of type mismatch by the system designer.)
38
Implementation Set operators in the DBMS Set operators in the DBMS Does set A contain set B? Does value v1 appear in set A? Aggregates in the DBMS Aggregates in the DBMS Compute average employee salary Count the number of employees Find the oldest employee
39
Challenges Conceptual Logical Physical Abstraction, Inheritance, Encapsulation Reduction to tables with minimal: data duplication, potential for data loss and update anomalies Effective use of a DBMS, management of mismatch between tables and OO constructs, Index structures, CC & Crash recovery, Optimization techniques
40
40 Entity-Relationship Model Example of schema in the entity-relationship model
41
41 Entity Relationship Model (Cont.) E-R model of real world E-R model of real world Entities (objects) E.g. customers, accounts, bank branch Relationships between entities E.g. Account A-101 is held by customer Johnson Relationship set depositor associates customers with accounts Widely used for database design Widely used for database design Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing
42
42 Relational Model Example of tabular data in the relational model Example of tabular data in the relational model Attributes
43
43 A Sample Relational Database
44
44 DATA INDEPENDENCE 1. Physical data independence: modify the physical scheme (data structures, e.g., B- tree or hash index) without causing application programs to be rewritten. These modifications are necessary to enhance performance and new software releases. Most relational vendors support this kind of data independence. 2. Logical data independence: Modify the conceptual scheme (e.g., add a new attribute to a table, rename an attribute) without causing application programs to be rewritten. This kind of data independence is harder to achieve.
45
45 DATABASE LANGUAGES There are several languages associated with a database: 1. Data Definition Language (DDL): The database scheme is specified by a set of definitions that are expressed by a special language named DDL. The result of compiling DDL statements is a set of tables stored in a file called data dictionary. This file contains meta-data (data about the data stored in the database). 2. Data Manipulation Language (DML): a language that enables users to access or manipulate data (retrieve, insert, replace, delete) as organized by a certain data model. We will look at a commercial DML named SQL. In general, there are two types of DML: Procedural: Describes what data is needed and how to get it: e.g., relational algebra Procedural: Describes what data is needed and how to get it: e.g., relational algebra Non-procedural: Describes what data is needed without specifying how to get it: e.g., tuple relational calculus Non-procedural: Describes what data is needed without specifying how to get it: e.g., tuple relational calculus
46
46 SQL SQL: widely used non-procedural language SQL: widely used non-procedural language E.g. find the name of the customer with customer-id 192-83-7465 select customer.customer-name from customer where customer.customer-id = ‘192-83-7465’ E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number Application programs generally access databases through one of Application programs generally access databases through one of Language extensions to allow embedded SQL Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent to a database
47
47 Storage Management Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible to the following tasks: The storage manager is responsible to the following tasks: Interaction with the file manager Efficient storing, retrieving and updating of data Issues: Issues: Storage access File organization Indexing and hashing
48
48 Query Processing 1.Parsing and translation 2.Optimization 3.Evaluation
49
49 Query Processing (Cont.) Alternative ways of evaluating a given query Alternative ways of evaluating a given query Equivalent expressions Different algorithms for each operation Cost difference between a good and a bad way of evaluating a query can be enormous Cost difference between a good and a bad way of evaluating a query can be enormous Need to estimate the cost of operations Need to estimate the cost of operations Depends critically on statistical information about relations which the database must maintain Need to estimate statistics for intermediate results to compute cost of complex expressions
50
50 Transaction Management A transaction is a collection of operations that performs a single logical function in a database application A transaction is a collection of operations that performs a single logical function in a database application Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database. Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
51
51 SYSTEM USERS There are several kind of users associated with a system: Database administrator: defines schemas, storage structures and access method definitions, physical organization, authorization, integrity constraints. Database administrator: defines schemas, storage structures and access method definitions, physical organization, authorization, integrity constraints. Application programmers: they write a program and make it available to the end- users Application programmers: they write a program and make it available to the end- users Sophisticated users: they use a query language (SQL) to access the database interactively Sophisticated users: they use a query language (SQL) to access the database interactively Naive (end) users: they invoke the application programs Naive (end) users: they invoke the application programs
52
52 Overall System Structure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.