Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Management Systems

Similar presentations


Presentation on theme: "Database Management Systems"— Presentation transcript:

1 Database Management Systems
Lecture 1

2 What will we cover? Database Management System (DBMS)
Levels of Abstraction View of data Relational model Structured query language or SQL Database design Entity – relationship model Database architecture Database users

3 Data and Database Data refers to the facts or information that can be recorded and have meaning. E.g. Name and phone # of your friends Database is a collection of related information. E.g. Addressbook database for storing all names and phone numbers of your friends Few points to note about database: A database represents some aspects of real world, sometimes called mini-world. Changes in mini-world get reflected in database. It is a logical collection of data with some inherent meaning It is designed, built and populated with data for a specific purpose. It has some source from which data is derived, some degree of interaction with the events in the real world and an audience that is interested in its contents. For example, customer buying camera, employee having baby, etc. are some real world events which get reflected in database. Databases are widely used. Here are some representative applications: • Banking: For customer information, accounts, and loans, and banking transactions. • Airlines: For reservations and schedule information. Airlines were among the first to use databases in a geographically distributed manner—terminals situated around the world accessed the central database system through phone lines and other data networks. • Universities: For student information, course registrations, and grades. • Credit card transactions: For purchases on credit cards and generation of monthly statements. • Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about the communication networks. • Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds. • Sales: For customer, product, and purchase information. • Manufacturing: For management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores, and orders for items. • Human resources: For information about employees, salaries, payroll taxes and benefits, and for generation of paychecks.

4 Database Management System (DBMS)
DBMS contains information about a particular enterprise Collection of interrelated data Set of programs to access the data An environment that is both convenient and efficient to use Database Applications: Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Online retailers: order tracking, customized recommendation Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deduction Databases touch all aspects of our lives Databases are widely used. Here are some representative applications: • Banking: For customer information, accounts, and loans, and banking transactions. • Airlines: For reservations and schedule information. Airlines were among the first to use databases in a geographically distributed manner—terminals situated around the world accessed the central database system through phone lines and other data networks. • Universities: For student information, course registrations, and grades. • Credit card transactions: For purchases on credit cards and generation of monthly statements. • Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on prepaid calling cards, and storing information about the communication networks. • Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds. • Sales: For customer, product, and purchase information. • Manufacturing: For management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores, and orders for items. • Human resources: For information about employees, salaries, payroll taxes and benefits, and for generation of paychecks.

5 DBMS (contd.) In the early days, database applications were built directly on top of file systems. Drawbacks of using file systems to store data: Data redundancy and inconsistency - Multiple file formats, duplication of information in different files Difficulty in accessing data – To be done manually or need to write a new program to carry out each new task Data isolation — multiple files and formats Integrity problems - Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly. Hard to add new constraints or change existing ones. Atomicity problems – System failure occurs in the middle of a program execution resulting in inconsistent database state. Concurrent access anomalies – Multiple users trying to update the same data resulting in incorrect or inconsistent state Security problems – imposed user access management • Data redundancy and inconsistency. Since different programmers create the files and application programs over a long period, the various files are likely several places (files). For example, the address and telephone number of a particular languages. Moreover, the same information may be duplicated in to have different formats and the programs may be written in several programming customer may appear in a file that consists of savings-account records that is, the various copies of the same data may no longer agree. For to higher storage and access cost. In addition, it may lead to data inconsistency; and in a file that consists of checking-account records. This redundancy leads example, a changed customer address may be reflected in savings-account find out the names of all customers who live within a particular postal-code • Difficulty in accessing data. Suppose that one of the bank officers needs to records but not elsewhere in the system. Because the designers of the original system did not anticipate this request, area. The officer asks the data-processing department to generate such a list. information manually or ask a system programmer to write the necessary program to generate the list of all customers. The bank officer has now two choices: either obtain the list of all customers and extract the needed there is no application program on hand to meet it. There is, however, an application application program. Both alternatives are obviously unsatisfactory. Suppose balance of $10,000 or more. As expected, a program to generate such a list does needs to trim that list to include only those customers who have an account that such a program is written, and that, several days later, the same officer satisfactory. not exist. Again, the officer has the preceding two options, neither of which is responsive data-retrieval systems are required for general use. needed data to be retrieved in a convenient and efficient manner. More The point here is that conventional file-processing environments do not allow • Data isolation. Because data are scattered in various files, and files may be in • Integrity problems. The data values stored in the database must satisfy certain data is difficult. different formats, writing new application programs to retrieve the appropriate may never fall below a prescribed amount (say, $25). Developers enforce types of consistency constraints. For example, the balance of a bank account to change the programs to enforce them. The problem is compounded when programs. However, when new constraints are added, it is difficult these constraints in the system by adding appropriate code in the various application constraints involve several data items from different files. failure occurs, the data be restored to the consistent state that existed prior to device, is subject to failure. In many applications, it is crucial that, if a • Atomicity problems. A computer system, like any other mechanical or electrical the failure. Consider a program to transfer $50 from account A to account B. resulting in an inconsistent database state. Clearly, it is essential to database that the $50 was removed from account A but was not credited to account B, If a system failure occurs during the execution of the program, it is possible consistency that either both the credit and debit occur, or that neither occur. system. not at all. It is difficult to ensure atomicity in a conventional file-processing That is, the funds transfer must be atomic—it must happen in its entirety or and faster response, many systems allow multiple users to update the • Concurrent-access anomalies. For the sake of overall performance of the system $500. If two customers withdraw funds (say $50 and $100 respectively) from may result in inconsistent data. Consider bank account A, containing data simultaneously. In such an environment, interaction of concurrent updates account A at about the same time, the result of the concurrent executions may that value by the amount being withdrawn, and write the result back. If the executing on behalf of each withdrawal read the old balance, reduce leave the account in an incorrect (or inconsistent) state. Suppose that the programs back $450 and $400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value two programs run concurrently, they may both read the value $500, and write accessed by many different application programs that have not been coordinated of supervision. But supervision is difficult to provide because data may be of $350. To guard against this possibility, the system must maintain some form previously. to see only that part of the database that has information about the various access all the data. For example, in a banking system, payroll personnel need • Security problems. Not every user of the database system should be able to But, since application programs are added to the system in an ad hoc bank employees. They do not need access to information about customer accounts. manner, enforcing such security constraints is difficult.

6 DBMS (contd.) DBMS is a collection of programs that enable users to create and maintain a database. It is a general purpose software that facilitates the processes of defining, constructing, manipulating and sharing databases. Defining  Specifying data types, structures and constraints of data to be stored. It is also stored in the database in the form of database catalogue or dictionary, called meta-data. Constructing  Process of storing the data on some storage medium Manipulating  Includes functions like querying the database to retrieve, update and generate reports from data Sharing  Allows multiple users and programs to access the data concurrently (or simultaneously) • Data redundancy and inconsistency. Since different programmers create the files and application programs over a long period, the various files are likely several places (files). For example, the address and telephone number of a particular languages. Moreover, the same information may be duplicated in to have different formats and the programs may be written in several programming customer may appear in a file that consists of savings-account records that is, the various copies of the same data may no longer agree. For to higher storage and access cost. In addition, it may lead to data inconsistency; and in a file that consists of checking-account records. This redundancy leads example, a changed customer address may be reflected in savings-account find out the names of all customers who live within a particular postal-code • Difficulty in accessing data. Suppose that one of the bank officers needs to records but not elsewhere in the system. Because the designers of the original system did not anticipate this request, area. The officer asks the data-processing department to generate such a list. information manually or ask a system programmer to write the necessary program to generate the list of all customers. The bank officer has now two choices: either obtain the list of all customers and extract the needed there is no application program on hand to meet it. There is, however, an application application program. Both alternatives are obviously unsatisfactory. Suppose balance of $10,000 or more. As expected, a program to generate such a list does needs to trim that list to include only those customers who have an account that such a program is written, and that, several days later, the same officer satisfactory. not exist. Again, the officer has the preceding two options, neither of which is responsive data-retrieval systems are required for general use. needed data to be retrieved in a convenient and efficient manner. More The point here is that conventional file-processing environments do not allow • Data isolation. Because data are scattered in various files, and files may be in • Integrity problems. The data values stored in the database must satisfy certain data is difficult. different formats, writing new application programs to retrieve the appropriate may never fall below a prescribed amount (say, $25). Developers enforce types of consistency constraints. For example, the balance of a bank account to change the programs to enforce them. The problem is compounded when programs. However, when new constraints are added, it is difficult these constraints in the system by adding appropriate code in the various application constraints involve several data items from different files. failure occurs, the data be restored to the consistent state that existed prior to device, is subject to failure. In many applications, it is crucial that, if a • Atomicity problems. A computer system, like any other mechanical or electrical the failure. Consider a program to transfer $50 from account A to account B. resulting in an inconsistent database state. Clearly, it is essential to database that the $50 was removed from account A but was not credited to account B, If a system failure occurs during the execution of the program, it is possible consistency that either both the credit and debit occur, or that neither occur. system. not at all. It is difficult to ensure atomicity in a conventional file-processing That is, the funds transfer must be atomic—it must happen in its entirety or and faster response, many systems allow multiple users to update the • Concurrent-access anomalies. For the sake of overall performance of the system $500. If two customers withdraw funds (say $50 and $100 respectively) from may result in inconsistent data. Consider bank account A, containing data simultaneously. In such an environment, interaction of concurrent updates account A at about the same time, the result of the concurrent executions may that value by the amount being withdrawn, and write the result back. If the executing on behalf of each withdrawal read the old balance, reduce leave the account in an incorrect (or inconsistent) state. Suppose that the programs back $450 and $400, respectively. Depending on which one writes the value last, the account may contain either $450 or $400, rather than the correct value two programs run concurrently, they may both read the value $500, and write accessed by many different application programs that have not been coordinated of supervision. But supervision is difficult to provide because data may be of $350. To guard against this possibility, the system must maintain some form previously. to see only that part of the database that has information about the various access all the data. For example, in a banking system, payroll personnel need • Security problems. Not every user of the database system should be able to But, since application programs are added to the system in an ad hoc bank employees. They do not need access to information about customer accounts. manner, enforcing such security constraints is difficult.

7 Levels of Abstraction Physical level: describes how a record (e.g., customer) is stored Logical level: describes data stored in database, and the relationships among the data type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : integer; end; View level: application programs hide details of data types. Views can also hide information (such as an employee’s salary) for security purposes For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system: • Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail. • Logical level. The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction. • View level. The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database. Figure 1.1 shows the relationship among the three levels of abstraction. An analogy to the concept of data types in programming languages may clarify the distinction among levels of abstraction. Most high-level programming languages support the notion of a record type. For example, in a Pascal-like language, we may declare a record as follows: type customer = record customer-id : string; customer-name : string; customer-street : string; customer-city : string; end; This code defines a new record type called customer with four fields. Each field has a name and a type associated with it. A banking enterprise may have several such record types, including • account, with fields account-number and balance • employee, with fields employee-name and salary At the physical level, a customer, account, or employee record can be described as a block of consecutive storage locations (for example, words or bytes). The language compiler hides this level of detail from programmers. Similarly, the database system hides many of the lowest-level storage details from database programmers. Database administrators, on the other hand, may be aware of certain details of the physical organization of the data. At the logical level, each such record is described by a type definition, as in the previous code segment, and the interrelationship of these record types is defined as well. Programmers using a programming language work at this level of abstraction. Similarly, database administrators usually work at this level of abstraction. Finally, at the view level, computer users see a set of application programs that hide details of the data types. Similarly, at the view level, several views of the database are defined, and database users see these views. In addition to hiding details of the logical level of the database, the views also provide a security mechanism to prevent users from accessing certain parts of the database. For example, tellers in a bank see only that part of the database that has information on customer accounts; they cannot access information about salaries of employees.

8 View of Data Architecture of a database system

9 Relational Model Attributes Relational Database
Tabular data in a Relational Model

10 Structured Query Language
Widely used non-procedural language Application programs generally access databases through one of Language extensions to allow embedded SQL Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database Example: Find the balances of all accounts held by the customer with customer-id select account.balance from depositor, account where depositor.customer_id = ‘ ’ and depositor.account_number = account.account_number

11 Database Design Logical Design – Deciding on the database schema. Database design requires that we find a “good” collection of relation schemas. Business decision – What attributes should we record in the database? Technology decision – What relation schemas should we have and how should the attributes be distributed among the various relation schemas? Physical Design – Deciding on the physical layout of the database

12 The Entity-Relationship Model
Models an enterprise as a collection of entities and relationships Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects Described by a set of attributes Relationship: an association among several entities Represented diagrammatically by an entity- relationship diagram:

13 Database Architecture
The architecture of a database systems is greatly influenced by the underlying computer system on which the database is running: Centralized Client-server Parallel (multi-processor) Distributed

14 Database Users Users are differentiated by the way they expect to interact with the system Application programmers – interact with system through DML calls Sophisticated users – form requests in a database query language Specialized users – write specialized database applications that do not fit into the traditional data processing framework Naïve users – invoke one of the permanent application programs that have been written previously Examples, people accessing database over the web, bank tellers, clerical staff

15 End of Lecture 1


Download ppt "Database Management Systems"

Similar presentations


Ads by Google