Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007.

Similar presentations


Presentation on theme: "Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007."— Presentation transcript:

1 Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007

2 Terms  Database ƒ A shared collection of logically related data (and description of this data), designed to meet the information needs of an organisation  DataBase Management System (DBMS) ƒ A software system that enables users to define, create and maintain the database and provides controlled access to this database

3 Terms (example)  Database ƒ Personnel Database ƒ Stock database ƒ Statistical Database  DataBase Management System (DBMS) ƒ Oracle ƒ DB2 ƒ Access ƒ MySql ƒ FoxPro ƒ Firebird

4 Why keep information in databases?  Accessibility of data ƒ Increased concurrency (reads and writes) ƒ Sharing data  Improved data integrity  Improved security ƒ access only to necessary data  Relatable ƒ More information from same amount of data  Visible

5 Why keep information in databases? (continued)  backup and recovery  Improved productivity ƒ common tools / common processes

6 Disadvantages of databases  Complexity  Size  Cost of DBMS  Need to upgrade versions  Additional hardware costs  Higher impact of failure www.cableready.net/newsletter/winter99.html

7 Databases  used to be solely mainframe  commonly on minicomputers  increasingly available on microcomputers  mostly accessed by SQL

8 Relational Databases  entities ƒ datatypes ƒ validation  relationships ƒ rules for interaction

9 Database Tables  rows and columns  fixed number of columns  multiple rows (records)  columns are of same datatype

10 Structured Query Language - SQL  Standard database language that allows:- ƒ Database creation and relation structures ƒ Basic data management tasks ƒ Both simple and complex queries

11 SQL - Data Definition - DDL  allows creation, modification and deletion of database objects ƒ Creation - CREATE  CREATE TABLE TAB1 (COL1 NUMBER, COL2 NUMBER); ƒ Modification - ALTER  ALTER TABLE TAB1 ADD COL3 NUMBER; ƒ Deletion - DROP  DROP TABLE TAB1;

12 Structured Query Language - SQL Data Manipulation - DML  Standard language to allow access the data stored in databases ƒ Extraction - SELECT  SELECT COL1,COL2 from TAB1; ƒ Loading - INSERT  INSERT INTO TAB1 (COL1,COL2) VALUES(7,22); ƒ Manipulation - UPDATE  UPDATE TAB1 SET COL2 = COL1 + 2; ƒ Deletion - DELETE  DELETE FROM TAB1 WHERE COL2 = 4;

13 Database Modeling  representation of "real world"  conceptual model  logical model  physical model

14 Keys  Primary Keys ƒ uniquely identifies a record  Foreign Keys ƒ pointer to a Primary Key in another table

15 Indexes  May be applied to columns to allow fast data access  May be applied to single columns or several columns  Direct pointers to rows containing specific values in the indexed column(s)  may be unique or non-unique  May have more than one index per table

16 Normalisation  A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise

17 Normalisation - unnormalised  A representation of the data that contains repeating groups

18 Normalization - unnormalised form

19 Normalisation - 1st normal form  A relation in which the intersection of each row and column contains one and only one value 1NF

20 Normalization - 1st normal form 1NF

21 Normalisation - 2nd normal form  A relation that is ƒ in first normal form ƒ every non-primary key attribute is fully functionally dependent on the primary key 2NF

22 Normalization - 2nd normal form 2NF

23 Normalisation - 3rd normal form  A relation that is ƒ in first and second normal form and ƒ in which no non-primary key attribute is transitively dependent on the primary key 3NF

24 Normalization - 3rd normal form 3NF

25 Loading data into databases  Bulk loading tool  Data Integrity  Validation  ad-hoc loading

26 Data Extraction  Assemble data into usable format  Spreadsheet  Timeseries  Data Cube  Publication

27 Data manipulation  Inside database ƒ Sophisticated manipulation language - SQL  Outside database ƒ Timeseries  Seasonal Adjustment  Chain Volume Measures (Constant Price) ƒ SAS, SPSS

28 Transactional Integrity  the ability to apply rules to the data via database constraints  ability to group several discrete data insertion or data manipulation into one logical data change  In SQL, controlled via COMMIT and ROLLBACK statements

29 Transactional Integrity  database constraints ƒ values must conform to specific rules  exist in a specific column  belong to a "set"  uniqueness  If a validation against a constraint fails ƒ the current transaction fails

30 Transactions & Recovery  Each transaction is logged by the DBMS  Backups taken periodically  Data can be recovered ƒ to an archived backup ƒ to a point in time

31 COMMIT; INSERT INTO TABLE1 (COL1,COL2) VALUES(7,22); UPDATE TABLE1 SET COL1 = 77 WHERE COL2 = 22; DELETE FROM TABLE1 WHERE COL1 = 7; ROLLBACK; INSERT INTO TABLE2 (COL3,COL4) VALUES('ABC',11); UPDATE TABLE2 SET COL3 = 'XYZ'; DELETE FROM TABLE2 WHERE COL3 = 'xyz'; COMMIT; Transaction example transaction 1 transaction 2

32 Database Systems a Practical Approach to Design, Implementation and Management Thomas Connolly, Carolyn Begg, Anne Strachan (Addison-Wesley) 1999 cartoons - Randy Glassbergen References

33 Questions?


Download ppt "Databases and Statistical Databases Session 4 Mark Viney Australian Bureau of Statistics 5 June 2007."

Similar presentations


Ads by Google