Introduction to Data Management Chapter 1, Pratt & Adamski Delivered 8/20/98
Data and Information DATA: Facts concerning people, objects, vents or other entities. Databases store data. INFORMATION: Data presented in a form suitable for interpretation. Data is converted into information by programs and queries. Data may be stored in files or in databases. Neither one stores information. KNOWLEDGE: Insights into appropriate actions based on interpreted data.
Knowledge Generation DATA INFORMATION
Using a DBMS Data DBMS Engine Access Data Management Database Design Metadata DBMS Engine Access Direct access Host language Data Management
Basic Principles DATABASE: A shared collection of interrelated data designed to meet the varied information needs of an organization. DATABASE MANAGEMENT SYSTEM: A collection of programs to create and maintain a database. Define Construct Manipulate
Advantages of Database Processing More information from same data Shared data Balancing conflicts among users Controlled redundancy Consistency Integrity Security Increased productivity Data independence
Disadvantages of Database Processing Increased size Increased complexity More expensive personnel Increased impact of failure Difficulty of recovery Cost Especially server and mainframe systems
Objectives of the DBMS Approach SELF-DESCRIBING DATA INDEPENDENCE MULTIPLE VIEWS MULTIPLE USERS
What is a Database Management System? Data Files Directory Access Engine Utility Programs
Database DATA METADATA ACCESS ENGINE UTILITIES
Files and Databases Metadata “Data about data” Description of fields Display and format instructions Structure of files and tables Security and access rules Triggers and operational rules
Database Access USER INTERFACE DATABASE PROGRAM
History of Database Management File Management Systems Hierarchical Model IBM “Information Management System (IMS)” 1966 Network Model Charles Bachman’s “Integraded Data Store (IDS)” 1965 Conference on Data Systems Languages /DataBase Task Group CODASYL/DBTG (1971) Relational Model E.F. Codd, 1970
File Management Systems Provided facilities to extract data and share files, but did not implement any way to connect records in one file to those in another. Relationships had to be implemented in application code.
Database vs File Systems Program 1 Meta-Data Data Program 2 Meta-Data Program 3 Meta-Data DATABASE Program 1 Meta- Data Data Program 2 Program 3
Structured Databases Relationships were implemented by physical pointers (called “sets”) which allowed records to be connected in different files. Hierarchical databases allow only one parent set; networks allow several. These permit efficient processing but the sets must be constructed on data entry and cannot be rearranged later.
Relational Models Relational models implement relationships with matched data values in related files (called primary and foreign keys). Any attributes can be matched. The connection is established at retrieval so interconnections can be developed as needed.
Hierarchy SECTION STUDENT INSTRUCTOR COLLEGE COLLEGE Each file can have only one parent. To implement a second “parent” (COLLEGE) we have to implement a shadow copy.
Network SECTION STUDENT INSTRUCTOR COLLEGE Each file can have several parents. Both SECTION and COLLEGE are “parent” files..
Relational SECTION SECTION-STUDENT SECTION-INSTRUCTOR STUDENT SECTION-KEY STUDENT-KEY SECTION-INSTRUCTOR SECTION-KEY INSTRUCTOR-KEY STUDENT COLLEGE-KEY INSTRUCTOR COLLEGE-KEY COLLEGE Each file can have several parents. Both SECTION and COLLEGE are “parent” files..
Relational Terminology Entity Person, place, thing or event about which we wish to keep data Attribute property of an entity Relationship an association among entities (entity records)
Distribution Strategies for Databases Centralized Data and Processing: Dumb terminal with "screen scraping". Intelligent Terminal: Data and processing centralized; data preparation and display on remote devices. Distributed Logic: Data storage distributed; processed at the optimal location. A version of parallel processing. Client Server: Data (usually departmental) maintained on a server. Subsetting occurs on the server, processing on client machines. Distributed Database: Data distributed among different locations; processing access data wherever it is located. Data may be replicated or partitioned.
Data Management Designing and managing information in a data base environment requires: Understanding the principles of data modeling in system design. Using SQL for data manipulation. Understanding the concepts of managing data in a database environment.
Information System Modeling Approaches PROCESS MODELING: The traditional method of designing systems by following the changes to data flows. DATA MODELING: An approach to system development that specifies the file structure that conforms to the things important to the organization. PROTOTYPING: An iterative approach that focuses on building small operating OBJECT MODELING (Event driven design): Defines objects that contain data and associated processing rules encapsulated together.