Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to databases

Similar presentations


Presentation on theme: "Introduction to databases"— Presentation transcript:

1 Introduction to databases
ARK2100 Digital recordkeeping and preservation I 2017 Thomas Sødring P48-R407

2 What is a database? A very simple explanation is that a database is an organised collection of interconnected data elements e.g. your bank account and all the transactions This is an oversimplification, but an OK start These data elements need a system that organise and make the data searchable and retrievable This is what we call a database management system or a DBMS

3 Why do we need databases?
Store data in a reliable and persistent manner Organise data so that it can easily be retrieved Save data at a central location so that it can be shared and used by multiple users

4 The road to modern databases
There has always been a need to store, organise and retrieve information Historically the need is visible throughout history with libraries and archives In more modern times we see it in businesses that need to organize data After the second world war the US experienced a massive industrial expansion Large amounts of data were generated Organisation and efficient search for information was no trivial task

5 As we may think, Vannevar Bush
In 1945 Bush argued the need to develop machines to help us organise and store information

6 The road to modern databases
In the 1960s Computers were becoming more cost-effective for businesses allowing them to store and process data An early challenge was that the data was dependent on the storage method Thinking at the time was that data was something that should be processed, little need to think about the structure Changes to datafields required a reprogramming of the software e.g. more digits for a telephone number A user needed to know the physical structure of the database to search for information

7 Hierarchical model Data in the database is organised in a tree-like structure Data in the real world does not really follow this model Pål Solberg Nils Nilsen Ari Hansen 2,000 8,000 -3,000

8 Network model The network model allowed for greater flexibilty and a better representation of the relationships between data objects A more natural datamodel than the hierarchical model Pål Solberg Nils Nilsen Ari Hansen 2,000 8,000 8,000 -3,000

9 The road to modern databases
EF Codd proposes a new model called the relational model This was a revolutionary way of thinking about storing data The logical organisation of the data is disconnected from the physical organisation This model became a de facto standard Codd theories are implemented and Relational Database Management Systems (RDBMS) are developed

10 The relational model The relational model requires no predetermined relationship between the data The other models required this Without the need to understand the underlying structure it became easier to work with data Data stored in tables and a table can be seen as a collection of records Network and Hierarchical model connected data at the object level while the relational model links data based on values

11 The relational model Data is stored in tables
Bank accounts Customers 1 2 3 4 Owners 1 8,000 Pål Solberg Nils Nilsen Ari Hansen 1 2 3 2 2,000 3 8,000 4 -3,000 Data is stored in tables Relationships are determined based on values in tables

12 The road to modern databases
1976 P. Chen proposed a model for database design called Entity-Relationship (ER) Allowed database designers to work at a more abstract level with conceptual data models Simplified the design and implementation of databases Early 1980s Chen and Codds theories are put into pratice Commercialisation of RDBMS starts to pick up This really paved the way for the modern IT-based society we have today

13 ER diagram (Noark 4) Classification CaseParty RegistryEntry
Extra information Sender/Receiver Electronic Document Precedence Related Case Case Comment

14 ER diagram (Noark 4) Classification CaseParty RegistryEntry
Extra information Sender/Receiver Electronic Document Precedence Case Related Case Comment

15 The road to modern databases
Late 90's Dotcom boom fuels investments in Internet companies creating a boom for Web / Internet / database systems Open source solutions begin to establish themselves Early 2000's Dotcom boom crashed resulting in a decline for the Internet industry but the need for database applications continue to grow Mid 2000's New gadgets are commonplace mobile phones, tablets Internet moves to the so-called 'Web 2.0' model

16 The road to modern databases
The relational model is at the heart the IT- evolution Codds theories hold water, even though the relational model does not suit all applications Eventually the demand for data processing catches up with relational model and new models are required Science-based databases i.e genome projects, geological data, space data, simulation data require something beyond the relational model

17 The road to modern databases
Big Data is another scenario where relational databases may not be good enough NoSQL databases (NotOnlySQL) Perhaps these again go back to the scenario prior to the development of the relational model We are now again looking at processing information more than the management of information

18 Flat files Data can also be stored in a flat file structure as in a spreadsheet e.g an Excel spreadsheet No relationships 1 Pål Solberg 8,000 1 Pål Solberg 2,000 2 Nils Nilsen 8,000 2 Nils Nilsen 8,000 3 Ari Hansen -3,000 3 Ari Hansen 8,000

19 RDBMS A Relational Database Management Systems (RDBMS) is a database management systems based on the the relational model Proprietary Oracle, IBM, Microsoft Open Source Postgres, MySQL

20 Three layers in a database
A DBMS will typically have three layers View Logical Physical Logical Physical View

21 Three layers in a database
View Information customised for different users Logical The organisations data according to a particular data model Physical layer How data is organized and stored on disk Specialists can customise the database for efficient storage and processing

22 Why databases? Strategic use of an organisations data
sharing, centralised control reduces data redundancy and inconsistency increases data integrity reduces application development and maintenance costs keeps data independent of application Results in flexibility when managing an organisations information

23 What to expect from a DBMS
ACID (Atomicity, Consistency, Isolation, Durability) Transactions (Automated) optimisation Query and retrieval ability Change and access logging Data security Backup and replication Enforcement of rules Calculations

24 ACID - Atomicity When a series of database operations have to be performed either all are undetaken or none at all Defined within a transaction Need to prevent incomplete updates to the database Example, transfer kr 2 000,- from Account 1 Account 2 This is bound within a transaction Take kr 2 000,- out of Account 1 Put kr 2 000,- into Account 2 Either both actions are executed or neither

25 transfer kr 2 000,- from Account 1 to Account 2
ACID - Atomicity transfer kr 2 000,- from Account 1 to Account 2 Account 1 8 000,- Account 2 1 000,- Start transaction Account 1 8 000,- 1 1. Take out 2 000,- Account 1 6 000,- 2 Account 2 1 000,- 3 2. Put in 2 000,- Account 2 3 000,- 4 End transaction

26 ACID - Consistency An update must not violate any integrity rules
If an operation would result in an illegal state in the database the operation must be canceled and an error message is generated Pål Solberg 1 Customer Firstname Surname CustNr Tog 1 Products Name ProductNr Purchases CustNr ProductNr Amount 2 1 2 There is no customer with CustNr=2, therefore the database is not consistent

27 ACID - Isolation A database operation can not access data that is in temporary altered state (because another database operation is accessing the data) For example 2 people sharing a bank Account try to transfer money out of their accounts simultaneously Paul transfer kr 1 000,- from Account 1 to Account 2 Ina transfers kr 1 000,- from Account 1 to Account 3 These two transactions can not occur simultaneously What happens if we only have kr 1 000,- in the account? First we isolate the amount of money in Account 1 and try to subtract the money Paul wants to transfer Next we try to subtract the amount Ina wants to transfer, but there is not enough money in the account

28 ACID - Isolation Paul transfers kr 1 000,- from Account 1 to Account 2 while Ina transfers kr 1 000,- from Account 1 to Account 3 Account 1 1 000,- Account 2 3 000,- Account 3 500,- ,- ,- ,- Account 1 0,- Account 2 4 000,- Account 3 1 500,- Without isolation the bank could lose 1 000,- First we isolate Account 1 and transfers the money to Account 2 Account 1 1 000,- Account 2 4 000,- Now, when Ina tries to transfer money there is no money left to transfer Account 1 0,- Account 3 500,-

29 ACID - Durability Results of updates cannot simply disappear
eg due to a system crash This is achieved by transaction logging and support for backup This is an essential and fundamental requirement and one that allows us to trust a particular DBMS

30 Automatic optimalisation
An advanced DBMS can be configured to automatically detect usage patterns or particular requests that occur frequently and automatically change the structure or other characteristics of the database to improve performance Adding an index to a often searched column to speed up the query process The DBMS may also have associated tools that monitor performance and report statistical information on usage so that database experts can make the adjustments in order to improve performance

31 Query and retrieval ability
A DBMS supports a query language and often has associated report tools that allow users to interactively search a database and analyse data This query language and associated tools can also allow users to update the database if they have the correct access rights / roles

32 Change and Access logging
A desirable and extremely useful functionality a DBMS can support is the ability to provide information about who has had access to data? what has been changed ? when was it changed? Especially important in connection with records management from an authenticity point-of-view

33 Data security A DBMS will prevent unauthorised users from creating, updating, viewing or deleting parts (or all) of a database Achieved using username / password user access control on the entire database or subsections of it For example, a personnel database containing all data about individual employees All users can get access to their own data A group of users can be given payroll data Another group can have access to see all employee data User access is often controlled in the application

34 Backup and replication
A DBMS will usually have functionality to simplify the process to exporting / importing data This can be a backup Copies of data are regularly made in the event of a system crash occurs For some organisations, it may not be practical to allow everyone to access the same database so periodic copies are made and distributed This is replication There are multiple copies of the data at different locations

35 Rule enforcement Often, there is a requirement to add rules to data in the database A record is required to have a particular value set e.g gender for a person A car can only have one engine DBMS can be configured to prevent rules like the above from being broken But rules might have to be changed as time goes by Hybrid cars can have an electric as well as fossil fuel engine DBMS has to handle this without major changes in the datamodel These kind of rules are often handled by the application

36 Calculations A DBMS supports a number of built-in functions for performing calculations on data counting, sum, finding averages, sorting, grouping Useful so that applications on top of the database do not have to develop this often- used functionality Ask the DBMS to carry out the calculations Far quicker to get the DBMS to do this Reduces the load on the application and DBMS

37 These are not really relevant anymore
Some database models Hierarchical Network Relational Object-Oriented Object-Relational XML NoSQL These are not really relevant anymore We don't cover these

38 Databases and records management
All electronic recordkeeping is undertaken with some form of a DBMS In most cases it is a RDBMS A records manager has to have a basic understanding of this technology When you discuss development of a system When you develop requirements specifications To understand the challenges relational databases pose to daily use, interoperability, records management and long term preservation Some of our students also carry out database projects

39 Databases and records management
Often the records management department in the public sector lack a basic understanding of databases Often the Noark 5 system is seen as the municipality archive, rather than the database This is a big problem A municipality should be able to undertake queries against their databases generate extractions that can be preserved in the long term To do this you need to understand databases

40 This course Will teach you about the basics of databases
the relational model Modelling DDL, DML and SQL the Noark-5 structure in a database


Download ppt "Introduction to databases"

Similar presentations


Ads by Google