Switch off your Mobiles Phones or Change Profile to Silent Mode.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management Tenth Edition
Advertisements

Database Systems: Design, Implementation, and Management Ninth Edition
Management Information Systems, Sixth Edition
1 Chapter 2 Database Environment Transparencies © Pearson Education Limited 1995, 2005.
ETEC 100 Information Technology
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Introduction to Databases Transparencies
Chapter 2 Database Environment Pearson Education © 2014.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Lecture Two Database Environment Based on Chapter Two of this book:
Academic Year 2014 Spring.
Academic Year 2014 Spring.
1 Chapter 2 Database Environment. 2 Chapter 2 - Objectives u Purpose of three-level database architecture. u Contents of external, conceptual, and internal.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
LECTURE 2 DATABASE SYSTEM CONCEPTS AND ARCHITECTURE.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
Switch off your Mobiles Phones or Change Profile to Silent Mode
Chapter 2 CIS Sungchul Hong
Chapter 2 Database System Architecture. An “architecture” for a database system. A specification of how it will work, what it will “look like.” The “ANSI/SPARC”
Chapter 2 Database Environment
CSC271 Database Systems Lecture # 4.
Database Systems: Design, Implementation, and Management Ninth Edition
CODD’s 12 RULES OF RELATIONAL DATABASE
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
1Mr.Mohammed Abu Roqyah. Database System Concepts and Architecture 2Mr.Mohammed Abu Roqyah.
Database Environment Session 2 Course Name: Database System Year : 2013.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Chapter 1 Introduction to Databases Transparencies.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
Lection №4 Development of the Relational Databases.
Object storage and object interoperability
Chapter 2 Database Environment.
1 Chapter 2 Database Environment Pearson Education © 2009.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
DBS201: Data Modeling. Agenda Data Modeling Types of Models Entity Relationship Model.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
- The most common types of data models.
Architecture & Data Models
Chapter 2 Database Environment.
Chapter 4 Relational Databases
Chapter 2 Database Environment.
Chapter 2 Database Environment Pearson Education © 2009.
Introduction to Database Management System
Chapter 2 Database Environment Pearson Education © 2009.
The ANSI/SPARC Architecture aka the 3 Level Architecture
Chapter 2 Database Environment.
Data Base System Lecture : Database Environment
Data, Databases, and DBMSs
Database Environment Transparencies
Database Systems Instructor Name: Lecture-3.
Chapter 2 Database Environment Pearson Education © 2014.
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

Switch off your Mobiles Phones or Change Profile to Silent Mode

Database Architecture and Query Optimisation

Database Architecture

Topics Historical Developments Navigational Data Models Non-navigational Data Model Data Independence Database Languages

Historical Developments Navigational Data Models: Hierarchical model Network model Non-navigational Data Model: Relational model

Hierarchical Model Developed in the 1960s To manage large amounts of data for complex manufacturing projects such as Apollo rocket that landed on moon (1969) Its basic logical structure is represented by an upside-down tree. The hierarchical structure contains levels, or segments. A segment is equivalent of a file system’s record type.

Hierarchical Model Within the hierarchy, a higher layer is perceived as the parent of the segment directly beneath it, which is called the child. The hierarchical model depicts a set of one-to-many (1:M) relationships between a parent and its children segments. Each parent can have many children, but each child has only one parent.

Hierarchical Model Depends on every entity being subject to a higher one. A simple example is geneology (each parent can be identified from the child and vice versa). Another example of a representation of hierarchy of data is a customer invoice system

Hierarchical Model

Hierarchical views can differ between user group

Hierarchical Model - Drawbacks Data is stored in hierarchies physically. Difficult to change structure once a particular hierarchy has been designed / formulated, making it less flexible to meet dynamic needs. (e.g. in the customer invoice example: it's not possible to allow a single payment to be made for several invoices) Unplanned (ad-hoc) queries are difficult to support;- it may require major restructuring of the hierarchy.

Hierarchical Model - Drawbacks The drawbacks of the Hierarchical Model prompted the development of the Network Model

Network Model Network Model was created to represent complex data relationships more effectively than hierarchical model, to improve database performance, and to impose a database standard. User perceives the network database as a collection of records in 1:M relationships. Unlike the Hierarchical Model, Network Model allows a record to have more than one parent.

Network Model The network model represents a more complex structure, allowing non- hierarchical structures Within a model any record may have many immediate parents as well as many dependents, reflecting more real-world scenarios. A network of data:- customer invoice/payment example

Network Model

Network Model - Drawbacks Data is stored in linked sets physically. Pointer technology is used to implement relationships (with overhead, performance issues). Unplanned queries still difficult to support Programmer must be aware of 'sets' (relationships between record types) and the structural changes. Users have to 'navigate' through database (not a most user-friendly way to interact with the database).

Relational Model Relational Model was introduced in 1970 by E. F. Codd (of IBM) in his landmark paper “A Relational Model of Data for Large Shared Databanks” Data model that represents data in form of tables or relation.

Relational Model Relational database model consists of following three components: 1.Data structure: Data are organized in form of tables or relations. 2.Data manipulation: Powerful operations such as SQL languages or Query-by- example, are used to manipulate data stored in database. 3.Data integrity: Business rules are specified to maintain integrity of data when they are manipulated.

Relational Model Physical Properties A relation consists of 1 or more columns and 0 or more rows. A row is called a tuple. Each relation is given a unique name. Each column has a name unique within the relation. Each row contains an instance of the data associated with the relation. A relation with no rows is empty (contains no data), but still exists.

Relational Model Logical Properties Columns are unordered, left to right. This property is designed to preserve the independence of each column. Rows are unordered, top to bottom. This is designed to preserve the independence of each row. No row may be duplicated in a given relation. Uniqueness in a relation is guaranteed by the designation of a Primary Key for each relation.

Relational Model A Candidate Key is an attribute that uniquely identifies a row in that relation. A Primary Key is a candidate key that has been selected to be unique identifier for each row. Primary key values cannot be null, since they would then not identify a row. Columns can be interchanged without changing the meaning or use of relation. It makes no difference as whether to insert a new row in front or at end of table.

Relational Model

Relational Model - Advantages Data is stored in relations, and relation is the only construct in relational model. Concept of relations/attributes/tuples (similar to tables/columns/ records) is easy to understand. Contains a small set of commands in a fully defined relational query language. No physical pointers are used for 'navigating' the database.

Relational Model - Advantages Relationships between relations are indicated by foreign keys;- hence 'non navigational'. Logical/physical aspects of the database are clearly separated;- one of consequences of not using physical pointers. Easy to set up and change the database by using a query language (DDL/DML)

ANSI/SPARC Database Model ANSI – The American National Standards Institute SPARC – Standards Planning and Requirements Committee The ANSI/SPARC model is used as a general framework (benchmark) on which various architectural issues of databases can be discussed on a level-playing field. However, this is not the only model, and not every database system matches its 'structure'.

ANSI/SPARC Model – 3 Levels Consisting of 3 levels, with 3 schemas: External Level A collection of individual users' views of the database (database is seen by users);- external schema. Conceptual Level 'global' definition/description of database in its entirety ('union‘ of all users views) at the logical level. It deals with information structure/content;- conceptual schema.

ANSI/SPARC Model – 3 Levels Internal Level 'global' definition/description of the database at the physical level. It deals with information format/physical storage;- internal schema.

ANSI/SPARC Model – 3 Levels

ANSI/SPARC – 2 Mappings Mapping is a process of transforming requests and results between the levels in the ANSI/SPARC model. There are 2 mappings: external/conceptual mapping conceptual/internal mapping Schemas and mappings are built and maintained by DBA.

ANSI/SPARC – 2 Mappings

Data Independence The ability to allow users to take a logical view of the database which is independent of the way that the data is actually stored. The ANSI/SPARC model based on the 3 schema architecture can be used to explain the concept of Data Independence (DI). Mappings are essential to DI. Data Independence can be defined as the capacity to change the schema at one level of a database system without having to change schema at next higher level.

Data Independence This allows users to take a logical view of the database which is independent of the way that the data is actually stored. There are two types of Data Independence Logical data independence Physical data independence

Data Independence Logical Data Independence The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence The capacity to change the internal schema without having to change the conceptual schema.

Data Independence Different applications will need different views of same data e.g. CUSTOMER BALANCE

Data Independence Need to change storage structures and access paths without modifying existing database structures or applications

Front-end / Back-end System Architecture

Two Tier System Architecture

Three Tier System Architecture

Three Tier Approach - Advantages ‘thin’client (compared to the 2-tier architecture), with less expensive hardware reduction in client-side administration centralised application maintenance enhanced modularity and tier independence;- easier to modify/replace one tier without affecting others

Three Tier Approach - Advantages improved load balancing of business logic, by separating core business logic from database functions. An added advantage is that the 3-tier architecture maps quite naturally to the Web-enable database environment.

Web-enabled Database Architecture

Database Languages Data Definition Language (DDL) facilitates the creation and description of database Data Manipulation Language (DML) facilitates the manipulation and processing of data Host Language (e.g. C, Cobol, Java) Query Language SQL includes both DDL and DML

Query Optimisation

Query Optimisation is an important component of a modern relational database system. Relational Database Systems provide a system managed optimisation facility by making use of a wealth of statistical information (meta data) available to the system

Query Optimisation Description A Query Optimiser is essentially a program for efficient evaluation of relational queries, making use of relevant statistic information Objective To choose the most efficient strategy for implementing a given relational query, thereby improve the efficiency and performance of a relational database system.

Need for Query Optimisation To perform automatic navigation A relational database system (based on non-navigational relational model) allows users to simply state what data they require and leave system to locate and process that data in database

Need for Query Optimisation To achieve acceptable performance There may be different plans (called query plan) to perform a single user query and query optimiser aims to select and execute most efficient query plan based on information available to system

Need for Query Optimisation To minimize existing discrepancies Due to existing discrepancy in speed between CPU and I/O devices, a query optimiser aims to minimise I/O activities by choosing ‘cheapest’ query plan for a given query

Effects of Optimisation -Example Consider following Student, Lending and Book tables: Student (student-no, student-name, gender, address) Lending (lending-no, student-no, book- no) Book (book-no, title, author, edition)

Effects of Optimisation -Example Assume that database tables contains 100 students in Student table 1000 lendings in Lending table, of which only 50 are for book ‘B1’ 5000 books in Book table Further assume that only results (intermediate relations) of up to 50 tuples can be kept in memory during query processing

Effects of Optimisation -Example Query Retrieve names of students who have borrowed book ‘B1’ SQL SELECT DISTINCT student-name FROM student, lending WHERE student.student-no = lending.student-no AND lending.book-no = ‘B1’

Query Plan A – No Optimisation Operation Sequence – Join – Select – Project Step 1 Join student and lending over student-no giving T1 Step 2 Select T1 where book-no = ‘B1’ giving T2 Step 3 Project T2 over student-name giving result

Query Plan A – No Optimisation We calculate number of database accesses (tuple I/O operations) required for each stem Number of tuple I/O is described as number of tuples (records) to be read and written during operation

Query Plan A – Calculation Step 1 – Join student and lending over student-no giving T1 Step 2 – Select T1 where book-no = ‘B1’ giving T2 Step 3 – Project T2 over student-name giving result IR: Intermediate Relation Total tuple I/O: StepReadWriteIRSubtotal 1100 x <= 500

Query Plan B–With Optimisation Operation Sequence – Select – Join – Project Step 1 Select lending where book-no = ‘B1’ giving T1 Step 2 Join T1 and student over student-no giving T2 Step 3 Project T2 over student-name giving result

Query Plan B–With Optimisation We again calculate number of tuple I/O operations required for each step

Query Plan B – Calculation Step 1 – Select lending where book-no = ‘B1’ giving T1 Step 2 – Join T1 and student over student-no giving T2 Step 3 – Project T2 over student-name giving result IR: Intermediate Relation Total tuple I/O: StepReadWriteIRSubtotal <= 500

Comparison Plan A vs Plan B Ratio of I/O tuples (Plan A to Plan B): / Intermediate relations in Plan B are much smaller than those in Plan A Tuple I/Os can be further reduced by using indexes If there is an index on book-no in lending table, tuples to be read will be just 50 instead of 10000

Four Stages of Optimisation Stage 1 Convert query into some internal form more suitable for machine manipulation E.g. Query tree Relational Algebra Stage 2 Further convert internal form into some equivalent and more efficient canonical form making use of well defined transformation rules

Four Stages of Optimisation StudentLending Join Restrict Project Result Example of Query Tree – Plan A (Join – Select – Project) Over student no Where book-no = ‘B1’ Over student-name

Four Stages of Optimisation Some Transformation Rules Rule 1 (A where Restrict-1) where Restrict-2 = A (where Restrict-1 AND Restrict-2) Rule 2 A([Project]) where Restrict = (A where Restrict) [Project] Rule 3 (A [Project-1]) [project-2] = A [Project-2] Rule 4 (A join B) where Restrict-on-A AND Restrict-on-B = (A where Restrict-on-A0 Join (B where Restrict-on-B)

Four Stages of Optimisation Rule 5 where p OR (q AND r) = where (p OR q) AND (p OR r) Rule 6 (A Join B) Join C = A Join (B Join C) Rule 7 Perform projects as early as possible Rule 8 Perform restrictions as early as possible

Four Stages of Optimisation Stage 3 Choose a set of candidate low-level procedures using statistics about database Low Level Operations (e.g. join, select, project) Implementation procedures (one for each low level operation based on varying conditions) Cost formulae (one for each implementation procedure)

Four Stages of Optimisation Stage 4 Generate a set of candidate query plans and choose best (cheapest) of those plans by evaluating cost formulae Process of selecting a query plan is also called ‘access path’ selection ‘cheapest’ query plan is normally considered to be one which produces minimum I/O tuple operations and smallest set of intermediate relations

Database Statistics Selection of ‘optimal’ query plans in optimisation process makes use of database statistics stored in System Catalogue or Data Dictionary of database system In other words, without this information (meta data) being available, query optimiser will not be able to choose most efficient query plan for implementing a given query

Database Statistics Typical Database Statistics include For each base table Cardinality Number of pages for this tables For each column of each base table Number of distinct values Maximum, minimum and average value Actual values and their frequencies For each index Number of levels Number of leaf pages

Any Questions?