Introduction to Databases

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Introduction to Databases
Introduction to Databases
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
Prentice Hall, Database Systems Week 1 Introduction By Zekrullah Popal.
Chapter 3 Database Management
Introduction to Databases
Ch1: File Systems and Databases Hachim Haddouti
Beyond data modeling Model must be normalised – purpose ? Outcome is a set of tables = logical design Then, design can be warped until it meets the realistic.
Introduction to Databases Transparencies
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Chapter 1 Introduction to Databases
Chapter 1: The Database Environment
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Introduction to Databases
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Chapter 1 1 © Prentice Hall, 2002 Database Design Dr. Bijoy Bordoloi Introduction to Database Processing.
Introduction to Databases and Database Languages
IST Databases and DBMSs Todd S. Bacastow January 2005.
Objectives of the Lecture :
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Using MIS 4e Chapter 5 Database Processing
1 DATABASE TECHNOLOGIES BUS Abdou Illia, Fall 2007 (Week 3, Tuesday 9/4/2007)
Concepts of Database Management, Fifth Edition Chapter 1: Introduction to Database Management.
Chapter 1 Introduction to Databases Pearson Education ©
Chapter 1 Overview of Database Concepts Oracle 10g: SQL
Database Technical Session By: Prof. Adarsh Patel.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Introduction: Databases and Database Users
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Concepts and Terminology Introduction to Database.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Databases and Database Management Systems
Lecture 2 An Overview of Relational Database IST 318 – DB Admin.
311: Management Information Systems Database Systems Chapter 3.
© 2007 by Prentice Hall 1 Introduction to databases.
I Information Systems Technology Ross Malaga 4 "Part I Understanding Information Systems Technology" Copyright © 2005 Prentice Hall, Inc. 4-1 DATABASE.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Chapter 12: Designing Databases
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
FILES AND DATABASES. A FILE is a collection of records with similar characteristics, e.g: A Sales Ledger Stock Records A Price List Customer Records Files.
1 Chapter 1 Introduction to Databases Transparencies.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
IS 4424: Developing the Entreprise Database Frederic Adam.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
Introduction to Databases Transparencies © Pearson Education Limited 1995, 2005.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Managing Data Resources File Organization and databases for business information systems.
Introduction to Databases Transparencies
Introduction to Databases
Introduction to DBMS Purpose of Database Systems View of Data
JAVA Connection The following uses a ‘bridge’ between Java Database Connectivity (JDBC) and ODBC, namely sun.jdbc.odbc.JdbcOdbcDriver Supplied with the.
Introduction to Databases
Introduction to Databases Transparencies
Introduction to Databases
Introduction to Databases
Introduction to Database Systems
Basic Concepts in Data Management
Introduction to Databases
MANAGING DATA RESOURCES
Introduction to Databases
Introduction to Databases
Introduction to Databases Transparencies
Presentation transcript:

Introduction to Databases Data Organisation Definition Data Modelling SQL DBMS Functions

Basics of Data Organisation: DATA HIERARCHY (four categories) Fields = represent a single data item Records = made up of a related set of fields describing one instance of an entity Table = a set of related records Database = a collection of related tables

Example of Data Structure Fields Surname First name Telephone Martin Smith 11 25 65 65 Andrew Healy 11 58 96 63 Peter White 11 25 28 89 Lorene Marsh 11 78 85 85 Records Table + Other Tables => complete data Structure = DB

Database: Definition. "A collection of interrelated data stored together with controlled redundancy, to serve one or more applications in an optimal fashion; the data is stored so that it is independent of the application programs which use it; a common and controlled approach is used in adding new data and in modifying existing data within the database."

Definition - Examined A collection of interrelated data stored together with controlled redundancy to serve one or more applications in an optimal fashion the data is stored so that it is independent of the application programs which use it a common and controlled approach is used in adding new data and in modifying existing data within the database.

Advantages of Databases: data are independent from applications - stored centrally data repository accessible to any new program data are not duplicated in different locations programmers do not have to write extensive descriptions to build up a DBs Physical and logical protection is centralised

Disadvantages of DBs: Centralisation can be a weakness Large DBs require expensive hardware and software Specialised / Scarce personnel are required to develop and maintain large DBs Standardisation of data on a central repository has implications for the format in which it is stored

Characteristics of DBs… High concurrency (high performance under load) Multi-user (read does not interfere with write) Data consistency – changes to data don’t affect running queries + no phantom data changes High degree of recoverability (pull the plug test)

ACID test Atomicity Consistency Isolation Durability All or nothing Preserve consistency of database Transactions are independent Once committed data is preserved

DataBase Management System (DBMS): program that makes it possible to: create Use (insert / update / delete data) maintain a database It provides an interface mechanism between the logical organisation of the data stored in the DB and the physical organisation of the data

SQL Examples SELECT store_name FROM Store_Information Source: http://www.1keydata.com/sql/sqlselect.html

Interacting with the DB: Structured Query language (SQL) –. Data Manipulation Language (DML)- for programmers who want to modify the data model in which the data is stored. Host Language - the language used by programmers to embed SQL statements into their application – e.g. Java

import java.sql.*; public class CreateCoffees { public static void main(String args[]) { String url = "jdbc:mysql://localhost/cerami"; Connection con; String createString; createString = "create table COFFEES " + "(COF_NAME VARCHAR(32), " + "SUP_ID INTEGER, " + "PRICE FLOAT, " + "SALES INTEGER, " + "TOTAL INTEGER)"; Statement stmt;

try { Class.forName("com.mysql.jdbc.Driver"); } catch(java.lang.ClassNotFoundException e) { System.err.print("ClassNotFoundException: "); System.err.println(e.getMessage()); con = DriverManager.getConnection(url); stmt = con.createStatement(); stmt.executeUpdate(createString); stmt.close(); con.close(); catch(SQLException ex) { System.err.println("SQLException: " + ex.getMessage());

Relational DBs: Data items stored in tables Specific fields in a table related to another field in associated tables (joins) infinite number of possible viewpoints on the data (queries) Highly flexible DB but overly slow for complex searches Oracle, SyBase, Ingres, Access, Paradox for Windows...

Describing Relationships Attempt at modelling the business elements (entities) and their relationships (links) Can be based on users’ descriptions of the business processes Specifies dependencies between the data items Coded in an Entity-Relationship Diagram (ERD)

Types of Relationships one-to-one: one instance of one data item corresponds to one instance of another one-to-many: one instance to many instances many-to-many: many instance correspond to many instances Also some relationships may be: compulsory optional

Source: www.smartdraw.com Examples Source: www.smartdraw.com

Entity Relationship Diagram (Generic) Source: www.smartdraw.com

ERD Example – Banking Transaction Deposit Withdrawal Customer Bank Loan

Next Step – Creating the Data Structure Few rules - a lot of experience Can get quite complex (paramount for the speed of the DB) Tables must be normalised – i.e. redundancy is limited to the strict minimum In practice, normalisation is not always the best

Data Structure Diagrams Describe the underlying structure of the DB: the complete logical structure. A DSD provides a graphical notation which documents entities, relationships, and their constraints Data items are stored in tables linked by pointers attribute pointers: data fields in one table that will link it to another (common information) logical pointers: specific links that exist between tables Tables have a key Is it an attribute or an entity?

ORDER order number Item description Item Price Quantity ordered Customer number Item number Customer Customer number Customer name Customer address Customer balance Customer special rate Item Item number Item description Item cost Quantity on hand

Normalisation Process of simplifying the relationships amongst data items as much as possible (see handout) Through an iterative process, structure of data is refined to 1NF, 2NF, 3NF etc. Reasons for normalisation: to simplify retrieval (speed of response) to simplify maintenance (updates, deletion, insertions) to reduce the need to restructure the data for each new application

First Normal Form Design record structure so that each record looks the same (no repeating groups) Repetition within a record means one relation was missed = create new relation Elements of repeating groups are stored as a separate entity, in a separate table Normalised records have a fixed length and expanded primary key

Second Normal Form Record must be in first normal form Each item in the record must be fully dependent on the key for identification Functional dependency means a data item’s value is uniquely associated with another’s Only on-to-one relationship between elements in the same file Otherwise split into more tables

Third Normal Form To remove transitive dependencies When one item is dependent on an item which is dependent from the key in the file Relationship is split to avoid data being lost inadvertently This will give greater flexibility for the design of the application + eliminate deletion problems In practice, 3 NF not used all the time - speed of retrieval can be affected

Beyond Data Modeling Model must be normalised Optimised model “no surprise” model resilience Outcome is a set of tables = logical design Then, design can be warped until it meets the realistic constraints of the system E.g.: what business problem are we trying to solve? Update anomalies Each item should appear only once + you ask many good questions

Realistic Constraints Users cannot cope with too many tables Too much development required in hiding complex data structure Too much administration Optimisation is impossible with too many tables Actually: RDBs can be quite slow!

Key Practical Questions What are the most important tasks that the DB MUST accomplish efficiently? How must the DB be configured physically to address the business needs? What coding practices will keep the coding clean and simple? What additional demands arise from the need for resilience and security?

Analysis - Three Levels of Schema External Schema 1 External Schema 2 External Schema … Tables Logical Schema Disk Array Internal Schema

4 Way Trade-off Security Performance Ease of use Clarity of code

Key Decisions Oracle offers many different ways to do things Indexes Backups… Good analysis is not only about knowing these => understanding whether they are appropriate Failure to think it through => unworkable model Particularly, predicting performance must be done properly Ok on the technical side, tricky on the business side

Design Optimisation Sources of problems: Network traffic Excess CPU usage But physical I/O is greatest threat (different from logical I/O) Disks still the slowest in the loop Solution: minimise or re-schedule access Try to minimise the impact (e.g. mirroring, internal consistency checks…)

Using Scenarios for Analysis Define standard situation for DB use Analyse their specific requirements Understand the implications for DB design Compare and contrast new problems with old ones

Categories of Critical Operations Manual transaction processing = complex DE by small number of operators Automatic transaction processing: large number of concurrent users performing simple DE High batch throughput: automatic batch input into DB of very large number of complex transactions Data warehousing: large volumes of new data thrown on top every day at fixed intervals + intensive querying

Manual Transaction Processing Insurance telemarketing broker Data entry Retrieving reference info Calculations On-line human-computer interaction!! Instant validation (field by field) Drop-down lists (DE accelerators) Quick response time Critical issue = user-friendly front end, but minimise traffic between interface and back end!

Automatic Transaction Processing Large number of user performing simple tasks Real-time credit card system (e.g. authorisation) or check out (EPOS) Human interaction at its most simple – eg typing a code or swiping a card Minimum validation, no complex feed back… Large numbers mean potential problems are: Connection opening / closing rate Contention between concurrent users SQL engine + data consistency costs Design with multiple servers

Automatic Transaction Processing Another e.g.: on-line shopping What specific problems would arise from shopping cart type applications? How do you handle lost customers?

High Batch Throughput E.g. mobile phone network operator Real time + huge volume of simultaneous complex transactions Number checks Account info Price info Pattern checks Large processing capacity required + need to tackle all transactions together in batches DB query may not be only solution (or quickest) Move customer account to cache Copy updated figures for accounts to a log and updated accounts in slack periods (2.5GB an hour!) Indexing or partitioning for quicker access

“Data Warehouse” Huge store of data Large volume added every day 99% new data, 1% corrections to existing data Substantial analysis required prior to development: What to include How to aggregate and organise it Where data comes from Real Oracle territory because schedule is lax – i.e. not a real time application Key issues: Getting partitioning right Deciding how many summary levels Deciding what to hold and what to recalculate

Partitioning Oldest trick in the book to speed up retrieval Smaller bunch of data Well labeled so it can be easily found Smaller index Data manipulation – maintenance, copy and protection far easier Break down big table into small ones

Internet Databases Characteristics Many concurrent sessions Reduced interaction front end back end Internet = Extra response time (2 secs!) In practice, many sites are quite slow Key issues “thin client” Reduced dialogue Management of sessions (e.g. cookies) to avoid multiple restarts

Conclusion: Key issues At one end: very large numbers of small transactions Threat of network or process contention At other end: small number of processes with complex data crunching and time constraints Design of DB and application must reflect these constraints