Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were.

Similar presentations


Presentation on theme: "1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were."— Presentation transcript:

1 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were developed by Professor Ghandeharizadeh

2 Logistics Required text book: Required text book:  Database System Concepts, Silberschatz, Korth & Sudarshan, Fifth edition. Pre-req for the course: Pre-req for the course:  CS201: Data Structures  Knowledge of an object-oriented programming language such as C++, Java, C#

3 3 Teaching Assistant Shahin Shayande Office: (Microsoft Lab)SAL 200C Office Hours: TBA

4 4 Grading Midterm 1: 35% Midterm 1: 35% Midterm 2: 35% Midterm 2: 35% Project & Assignments: 30% Project & Assignments: 30%

5 5 What to do immediately? Register with the web site Register with the web sitehttp://dblab.usc.edu/csci485

6 6 Database Management System (DBMS) Database: An integrated collection of data, usually stored on secondary storage, typically describing the activities of one or more related organizations. Database: An integrated collection of data, usually stored on secondary storage, typically describing the activities of one or more related organizations. Database management system (DBMS): A collection of software/programs designed to assist in maintaining and utilizing large collections of data. Database management system (DBMS): A collection of software/programs designed to assist in maintaining and utilizing large collections of data. DBMS contains information about a particular enterprise DBMS contains information about a particular enterprise Used almost on a daily basis for either individual or business use. Used almost on a daily basis for either individual or business use. Relational database vendors were one of the fastest growing sectors during the.COM boom! Relational database vendors were one of the fastest growing sectors during the.COM boom!

7 7 BEFORE DBMS In the early days, database applications were built on top of file systems Data Data User 1 User 2 Application programs

8 8 Data managed by DBMS AFTER DBMS User 1 User 2 DBMS Application programs

9 9 WHY A DBMS? 1. Reduced application development time 2. Data independence: Application programs not dependent on data representation and storage details 3. Data sharing: data is better utilized (discovered and reused), redundancy of data is minimized 4. Data integrity and consistency: one may enforce consistency constraints on data, e.g., number of seats sold ≤ number of seats on the plane × 1.1 5. Centralized control: DBA tunes the database to balance user's needs 6. Security: mechanisms to prevent unauthorized access. These mechanisms are based on content instead of file- oriented approach. 7. Concurrency control: avoids undesirable race conditions that arise with simultaneous access/updates to data 8. Crash recovery: ensures the integrity of data in the presence of failures

10 10 DATABASE MANAGEMENT SYSTEMS ARCHITECTURE User 1 User n Conceptual schema Physical data DB DBMS

11 11 Data Models A collection of tools for describing A collection of tools for describing  Data  Data relationships  Data semantics  Data constraints Relational model Relational model Entity-Relationship data model (mainly for database design) Entity-Relationship data model (mainly for database design) Object-based data models (Object-oriented and Object- relational) Object-based data models (Object-oriented and Object- relational) Semistructured data model (XML) Semistructured data model (XML) Other older models: Other older models:  Network model  Hierarchical model

12 Challenges Conceptual Logical Physical Abstraction, Inheritance, Encapsulation Reduction to tables with minimal: data duplication, potential for data loss and update anomalies Effective use of a DBMS, management of mismatch between tables and OO constructs, Index structures, CC & Crash recovery, Optimization techniques

13 Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model  Entities, Attributes, Relationships Emp SS# name address

14 Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model  Entities, Attributes, Relationships Enrolled in Emp SS# name address Health Plan name Co-Pay

15 Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model  Entities, Attributes, Relationships  Recursive relationships Married to Emp SS# name address

16 Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model  Entities, Attributes, Relationships  Recursive relationships Works for Emp SS# name address

17 Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model  Entities, Attributes, Relationships  Recursive relationships Works for Emp SS# name address date

18 Conceptual Data Models Entity-Relationship (ER) data model Entity-Relationship (ER) data model  Entities, Attributes, Relationships  Recursive relationships  Inheritance sid student name ISA graduate Undergrad Specialization Generalization

19 Conceptual Data Models Abstraction, Inheritance, Encapsulation Abstraction, Inheritance, Encapsulation Exercise these concepts using in-class examples and homework assignments Exercise these concepts using in-class examples and homework assignments  A library database contains a listing of authors who have written books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects.

20 Conceptual Data Models Abstraction, Inheritance, Encapsulation Abstraction, Inheritance, Encapsulation Exercise these concepts using in-class examples and homework assignments Exercise these concepts using in-class examples and homework assignments  A library database contains a listing of authors who have written books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects.  Entity sets: authors, subjects, books, libraries  Relationship sets: wrote, carry, indexed

21 Conceptual Data Models Abstraction, Inheritance, Encapsulation Abstraction, Inheritance, Encapsulation Exercise these concepts using in-class examples and homework assignments Exercise these concepts using in-class examples and homework assignments  A library database contains a listing of authors who have written books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects. carry books indexwrote subjectauthors SS# name titleisbn Subject matter libraries address

22 Data Models Logical Physical Works for Emp SS# name address

23 Relational Data Model Prevalent in today’s market place. Prevalent in today’s market place.  Why? Performance! Everything is a table! Everything is a table! Logical data design is the process of reducing an ER diagram to a collection of tables. Logical data design is the process of reducing an ER diagram to a collection of tables.

24 Logical Data Design Trivial reduction: Trivial reduction:  An entity set = a table  A relationship set = a table Pitfalls: Pitfalls:  Duplication of data  Unintentional loss of data  Data ambiguity that impacts software design, resulting in update anomalies

25 Data Duplication Works for Emp SS# name address 396ShahramSeattle 400AsokeChicago 200Joe New York 396400200400 120400 SS#NameAddress SS#MGRSS#

26 Data Duplication The SS# column is duplicated! The SS# column is duplicated! Works for Emp SS# name address 396ShahramSeattle 400AsokeChicago 200Joe New York 396400200400 120400 SS#NameAddress SS#MGRSS#

27 Data Duplication: Solution Merge the two tables into one: Merge the two tables into one: 396ShahramSeattle400 400AsokeChicagoNULL 200Joe New York 400 SS#NameAddressMGRSS# Works for Emp SS# name address

28 Data Loss Ford maintains warehouses containing different automobile parts Ford maintains warehouses containing different automobile parts Records are inserted and deleted based on availability of a part at a warehouse Records are inserted and deleted based on availability of a part at a warehouse 123PistonTijuana 203CylinderMichigan 877BumperMichigan 389SeatsArizona Part#DescriptionLocation

29 Data Loss (Cont…) When a warehouse becomes empty, it is lost from the database: When a warehouse becomes empty, it is lost from the database: Solution: utilize two different tables Solution: utilize two different tables 123PistonTijuana 389SeatsArizona Part#DescriptionLocation 123Piston12389Seats45 Part#DescriptionWHID 12Tijuana45Arizona WHIDLocation

30 Data Ambiguity Represent faculty of a department as: Represent faculty of a department as: A change of address for a faculty might be for the entire department. This cannot be differentiated with this table design! A change of address for a faculty might be for the entire department. This cannot be differentiated with this table design! Ghandeharizadeh Comp Sci SAL Zartoshty SAL Bohem SAL FacultyDepartmentLocation

31 Data Ambiguity Utilize two tables: Utilize two tables: Ghandeharizadeh Comp Sci Zartoshty Jenkins Bio Medical Bohem Comp Sci FacultyDepartment SAL Sex Ed BOVARD Bio Medical HEDCO DepartmentLocation

32 Data Ambiguity (Cont…) Employees of a bi-lingual company having different skills. Employees of a bi-lingual company having different skills. Update anomalies! Update anomalies! AsokeTeachHindi AsokeCookFrench AsokeNullGerman AsokeProgramEnglish EmployeeSkillLanguage

33 Data Ambiguity: Solution Utilize two tables: Utilize two tables: AsokeTeach AsokeCook AsokeProgram EmployeeSkill AsokeHindiAsokeFrench AsokeGerman AsokeEnglish EmployeeLanguage

34 Logical Data Design A quest to flatten objects with minimal data duplication, loss of data, and update anomalies! A quest to flatten objects with minimal data duplication, loss of data, and update anomalies! William Kent, “A Simple Guide to Five Normal Forms in Relational Database Theory”, Communications of the ACM 26(2), Feb 1983, 120-125. William Kent, “A Simple Guide to Five Normal Forms in Relational Database Theory”, Communications of the ACM 26(2), Feb 1983, 120-125.

35 Data Models Physical Works for Emp SS# name address Logical Data Design 396ShahramSeattle400400AsokeChicagoNull SS#NameAddress MGR SS#

36 Physical Implementation Reconstruct main memory objects for manipulation and presentation: Reconstruct main memory objects for manipulation and presentation:  Specify class definitions  Typically correspond to entity-sets  Populate an instance of a class by issuing SQL queries to a DBMS  Update instances in memory  Flush dirty instances back to DBMS  Potential use of transactions

37 Type Mismatch A column of a row must be a primitive such as an integer, real, etc. A column of a row must be a primitive such as an integer, real, etc.  It may NOT be an array of integers or object pointers A property (attribute) of a class might be of a multi-valued type, e.g., an array, a vector, etc. A property (attribute) of a class might be of a multi-valued type, e.g., an array, a vector, etc. Changes in software may impact the design of tables. (Management of type mismatch by the system designer.) Changes in software may impact the design of tables. (Management of type mismatch by the system designer.)

38 Implementation Set operators in the DBMS Set operators in the DBMS  Does set A contain set B?  Does value v1 appear in set A? Aggregates in the DBMS Aggregates in the DBMS  Compute average employee salary  Count the number of employees  Find the oldest employee

39 Challenges Conceptual Logical Physical Abstraction, Inheritance, Encapsulation Reduction to tables with minimal: data duplication, potential for data loss and update anomalies Effective use of a DBMS, management of mismatch between tables and OO constructs, Index structures, CC & Crash recovery, Optimization techniques

40 40 Entity-Relationship Model Example of schema in the entity-relationship model

41 41 Entity Relationship Model (Cont.) E-R model of real world E-R model of real world  Entities (objects)  E.g. customers, accounts, bank branch  Relationships between entities  E.g. Account A-101 is held by customer Johnson  Relationship set depositor associates customers with accounts Widely used for database design Widely used for database design  Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing

42 42 Relational Model Example of tabular data in the relational model Example of tabular data in the relational model Attributes

43 43 A Sample Relational Database

44 44 DATA INDEPENDENCE 1. Physical data independence: modify the physical scheme (data structures, e.g., B- tree or hash index) without causing application programs to be rewritten. These modifications are necessary to enhance performance and new software releases. Most relational vendors support this kind of data independence. 2. Logical data independence: Modify the conceptual scheme (e.g., add a new attribute to a table, rename an attribute) without causing application programs to be rewritten. This kind of data independence is harder to achieve.

45 45 DATABASE LANGUAGES There are several languages associated with a database: 1. Data Definition Language (DDL): The database scheme is specified by a set of definitions that are expressed by a special language named DDL. The result of compiling DDL statements is a set of tables stored in a file called data dictionary. This file contains meta-data (data about the data stored in the database). 2. Data Manipulation Language (DML): a language that enables users to access or manipulate data (retrieve, insert, replace, delete) as organized by a certain data model. We will look at a commercial DML named SQL. In general, there are two types of DML: Procedural: Describes what data is needed and how to get it: e.g., relational algebra Procedural: Describes what data is needed and how to get it: e.g., relational algebra Non-procedural: Describes what data is needed without specifying how to get it: e.g., tuple relational calculus Non-procedural: Describes what data is needed without specifying how to get it: e.g., tuple relational calculus

46 46 SQL SQL: widely used non-procedural language SQL: widely used non-procedural language  E.g. find the name of the customer with customer-id 192-83-7465 select customer.customer-name from customer where customer.customer-id = ‘192-83-7465’  E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465 select account.balance from depositor, account where depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-number Application programs generally access databases through one of Application programs generally access databases through one of  Language extensions to allow embedded SQL  Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent to a database

47 47 Storage Management Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible to the following tasks: The storage manager is responsible to the following tasks:  Interaction with the file manager  Efficient storing, retrieving and updating of data Issues: Issues:  Storage access  File organization  Indexing and hashing

48 48 Query Processing 1.Parsing and translation 2.Optimization 3.Evaluation

49 49 Query Processing (Cont.) Alternative ways of evaluating a given query Alternative ways of evaluating a given query  Equivalent expressions  Different algorithms for each operation Cost difference between a good and a bad way of evaluating a query can be enormous Cost difference between a good and a bad way of evaluating a query can be enormous Need to estimate the cost of operations Need to estimate the cost of operations  Depends critically on statistical information about relations which the database must maintain  Need to estimate statistics for intermediate results to compute cost of complex expressions

50 50 Transaction Management A transaction is a collection of operations that performs a single logical function in a database application A transaction is a collection of operations that performs a single logical function in a database application Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database. Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

51 51 SYSTEM USERS There are several kind of users associated with a system: Database administrator: defines schemas, storage structures and access method definitions, physical organization, authorization, integrity constraints. Database administrator: defines schemas, storage structures and access method definitions, physical organization, authorization, integrity constraints. Application programmers: they write a program and make it available to the end- users Application programmers: they write a program and make it available to the end- users Sophisticated users: they use a query language (SQL) to access the database interactively Sophisticated users: they use a query language (SQL) to access the database interactively Naive (end) users: they invoke the application programs Naive (end) users: they invoke the application programs

52 52 Overall System Structure


Download ppt "1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were."

Similar presentations


Ads by Google