Define database A collection of related data, logically connected and organized. A database management System (DBMS) enables users to create and use a database
Simplified Database System Application Programs/Queries Software to process queries/programs Software to access stored data Stored Database Definition (META-DATA) Stored Database
Example STUDENT file stores data on each student. COURSE file stores data on eahc course SECTION file stores data on each section of a course GRADE_REPORT file stores grades that students receive in the various sections they complete PREREQUISITE file stores prerequisites for each course.
STUDENT file Each student has Define the types of each of these Name StudentID Class Major Define the types of each of these
Look at relationships between these files PREREQUISITE COURSE STUDENT SECTION GRADE_REPORT
Basic operations Queries (list the names of all students enrolled in Databases) Updates (change Smith's section from 1 to 2) These are informal queries (must be fully specified before real query can be made against database)
Old way to access data Flat file system. Easiest to implement quickly. Each system has its own copy of the data. Every programmer must keep track of their own interfaces, must maintain consistency themselves.
Fundamental characteristic of a database Should be self describing. System catalog contains info about all data, fields, constraints and relationships between data. (Also called meta-data) Database program should be application agnostic. (as long as relationships can be described by meta-data.) Information is looked up according to meta-data definitions. In ordinary files, this information is encoded in accessor routines.
Databases provide Program-data independence. Change in physical file structure should not change access routines. Data abstraction – all details about how data is stored is hidden (access is via data model.) Multiple views – subsets of data and "virutal" data.
Who accesses the database? Database administrators (DBA) – authorizing access, coordinating and monitoring use. Database designers – identify data and how it will be stored in DB, understand requirements, devise views. End users Casual end users – use query language to access database. Naïve or parametric end users – use canned queries (little variation.) Sophisticated end users – thoroughly familiar with intricacy of database, make sophisticated queries. System Analyst – determine needs of Naïve and parameteric users Application programs – implement SA reqs.
Behind the scenes DBMS designers and implementers – maintain modules inside the database (query processors, data access, security, etc.) Tool developers – Facilitate design and improve performance. (performance monitoring tools, natural language interfaces, etc.) Operators and maintenance personnel
Why use a DBMS? Control redundancy Shared data (concurrent access) Restrict unauthorized access Provide multiple interfaces to data Represent complex relationships between data. Enforce integrity constraints Backup and recovery
Lesser advantages Ability to enforce standards across organization. Flexibility Reduced development time for new applications. Work with up to date data Economies of scale
When not to use a DBMS Database and application are simple, well defined, not expected to change often. System has stringent real-time constraints that DBMS my not meet. Multiple access to data is not required. DBMS are high overhead systems (hardware, planning, training and maintenance).
DBMS Concepts and Architecture
Data model Describes structure of the database (types, relationships and constraints.) High level (conceptual) vs. low level (physical) vs. implementation data models High level (object-based models) Entity – an object Attribute – some quality or quantity associated with an object Relationship – how objects are related Implementation level Relational, network, hierarchical.
Database Schema Description of the database. Specified during design time and not expected to change. Schema Diagram – data model specific convention for displaying schemas. Schema Construct – single object description in a schema.
Schema Diagram example STUDENT Name StudentNumber Class Major COURSE CourseName Number CreditHour Dept SECTION SectionID CourseNumber Semester Year
Understand Data changes frequently, data scheme, if well designed, should not change. The data in the database at any time is called the database instance, occurance or state. Any insert, delete or change converts DB from one instance to another. DBMS is responsible for verifying that each instance of the DB adheres to schema. Scheme also called intension, instance – extension of the schema.
Three Schema Architecture Supports program data independence, multi-user views, and catalog to store DB schema. Internal level has internal scheme – physical structure of DB and access paths. Conceptual level has conceptual schema – describes entities, datatype, relationships, and constraints. External or view level has multiple external schemas or user views. – views for a particular audience.
Schemas are only descriptions of data Data only exists at the physical level. Mappings are used to convert from schema to schema (walk through an example).
How schema preserve data independence. Logical data independence – changes to conceptual scheme do not affect external schemas. Physical data independence – changes to internal schema (physical layer) do not affect conceptual schema. In reality, difficult to implement (also multiple level mappings add overhead to queries.)
Interfaces to Databases Languages: Data Definition Language (DDL) – used to define internal and conceptual schemas Storage Definition Language (SDL) – mapping between conceptual and internal schema. View Definition Language (VDL) – defines user views and mapping to conceptual Data Manipulation Language (DML) – methods for retrieving, inserting, deleting and modifying data. High-level (non procedural) set-at-a-time Low-level (procedural) must be embedded. Record-at-a-time.
How languages are used Data Manipulation calls are made either directly (query language) or embedded in a host language (data sublanguage) Naïve and parametric users have user-friendly interfaces
"User Friendly" DBMS Interfaces Menus, graphical interfaces, forms, natural language, command languages for parametric users, DBA interfaces Are these really user friendly?
Classification of DBMS Relational, network, hierarchical, other. Single vs. multi-user Number of sites (centralized or distributed). Homo vs. heterogeneous (federated) Cost Types of access path General vs. special purpose
Main criteria – the data model Relational – data organized in tables, high level query language, limited form of user views. Conceptual and internal views are not distinguishable. Network – set of records and implements limited 1:N relationships. Must be embedded in host programming language. Hierarchical – include parent-child relationships.
Database design process Collect requirements and analyze results. Create conceptual schema (conceptual database design) (datatypes, relationships, and constraints.) Implement database (data model mapping) Design physical database (specify internal storage structures and file organizations)
Example: a company database Company is organized in departments. Each department has a name, a number and an employee who manages the department. Track when the employee started managing the department. Department may have several locations. Departments have projects, each has a name, a number and a location. Employees have SSN, address, salary, sex and birthdate. Employee is assigned to one department but may work on multiple projects. Track # of hours worked per week and on which projects. Track employee's direct supervisor.
Entity A "thing" in the real world. Can be real or conceptual. Entity has particular properties: attributes E.g. E1 Name = John Addr = 1 J St, Apt 2, NY, NY 10002 Age = 55 Phone = 222 222 2222
Attributes Atomic vs. composite. Atomic or simple are not decomposable. Composite attributes can form a hierarchy (street address – number, street, apartment) Composite attributes useful when parts are referenced as well as whole.
Attributes cont. Single valued vs. multivalued. Derivable attributes (age from BD, # of employees by sum) NULL- not applicable or unknown
Entity Types Similar entities (having similar attributes but with different values) Described using Entity Type Schema (intension)– common structure shared by entities. E.g. EMPLOYEE has Name, Age, Salary Sets of instances valid at a given point in time is called an extension of the entity type. E.g. (John Smith, 45, 80k) (Fred Brown, 32, 20K) Entity type schema should not change; extensions change often.
Key Attribute Uniqueness constraint on attributes. An entity type usually an attribute whose values are distinct for each individual entry. Key attribute is used to uniquely identify an entity. E.g. PERSON has SocialSecurityNumber, COMPANY has Name. No two instances can have the same Key Attribute. Some entities can have more than one Key Attribute.
Value Sets of Attributes A: Attribute E: Entity Type V: Value Set A:E -> P(V) P(V) is the Power Set of V the set of all subsets of V.
Value Sets of Attributes cont. A(e) : value of attribute A for entity e A(e) is a singleton for SV attributes (has only one element) A(e) may be empty set, single element or multiple element for multivalued attribute. A(e) for composite attribute is cartesian product (all pairs mapping) of P(V1)xP(V2)xP(V3)… Use (…) and comma separted list to denote composite attribute. Use {…} and comma separated list to denote multi-valued attribute
Value Set of Entity cont. E.g. if a person can have more than one residence and each residence can have more than one phone then the attribute AddressPhone {AddressPhone({Phone(AreaCode,PhoneNumber)}, Address(StreetAddress(Number, Street, ApartmentNumber),City,State,Zip))}
Relationships Relationship type R among N entities E1…EN is a set of associations among these types. R is a set of ri where each ri is an n tuple of (e1, e2, … en) and each ej in ri is a member of entity type Ej 1<j<N
Example EMPLOYEE DEPARTMENT WORKS_FOR e1 e2 e3 e4 e5 e6 e7 r1 r2 r3 r4
Degree of a Relationship Model Degree is number of participating Entity Types. In previous example, degree is 2 or binary. (most used form) Ternary relationship type has three Entity Types. (holds more information than 3 binary relationships). Connection trap can occur when 3 binary relations used instead of ternary relation.
Relations as Attributes
Example
Entity-Relationship Model
Example Student Name StudentID Class Major Smith 17 1 COSC Brown 8 2
