1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009
2 Mapping the logical model onto physical design n Entities become tables ä More often than not! n Attributes become fields (columns) n Unique identifiers become primary keys n Relationships implemented by foreign key columns n Resolve M:N relationships by inserting intersection table
3 Mapping considerations n Independence n Privacy n Efficiency of queries
4 Denormalisation n Joins take time! n Split or merge normalised entities based on frequent associated use ä Remove redundant relationships ä Merge entities with 1:1 relationships ä Use summary fields ä Use summary tables and views
5 Using summary field(1) n Consider running a query “give the total value of all orders for customer X” How many joins?
6 Using summary field (2) n Note summary field in Orders table How many joins now ?
7 Distributed database systems n Special rules apply!
8 The traditional model n One centralised database n Terminals at remote locations n Disadvantages ä Networks are slow (esp WANS!) ä Central machine does all processing ä If central machine fails, database is down (Integrity, redundancy and disaster recovery considered in later lectures!)
9 The Client/Server model n Client – application – “front end” n Server – DBMS – “back end” n Still dependent on central database
10 Client responsibilities n Manages user interface n Accepts user data n Has local processing capability within the application n Generates database requests and transmits them via network to server n Receives results from server and formats them as required by application
11 Server responsibilities n Accepts database requests from client n Processes database requests ä Handles security issues ä Deals with concurrency issues ä Optimizes queries ä Handles recovery/rollback issues n Returns results to client
12 Distributed database architecture n A collection of logically related “sites”, connected together so that the users view is that of a single database at a single location. n Each site is a database in it’s own right n Not necessarily physically or geographically separated, but often are – and are logically separated.
13 Advantages n Organisations are distributed, why shouldn’t their data be? n Improved efficiency ä Store data close to where it’s used
14 Types of DDS n Homogenous – same type of RDBMS at each site (easy!) n Heterogeneous – different types of DBMS at each site (not so easy!)
15 Implementation methods (1) n Fragmentation – splitting data between sites ä Horizontal – row based – e.g. store all employee records for a location at that location ä Vertical – column based – e.g. store all payroll columns in payroll department, all other employee data in HR n Either way, fragments must be able to be put back together!
16 Implementation methods (2) n Replication ä Controlled duplication of data at more than one site n Update propagation?
17 Objectives (1) n Local autonomy ä Local data locally owned and managed – minimal data requirements from remote sites. n No reliance on central site n Continuous operation ä Reliability ä Availability
18 Objectives (2) n Location independence ä From user’s view, all data is at their site. n Fragmentation independence ä Needs joins and unions to put fragments back together n Replication independence
19 Objectives (3) n Distributed query processing n Distributed transaction management ä Transactions carried out by “agents” at distributed sites ä Two-phase commit ä Locking issues (later lecture)
20 Objectives (4) n Hardware independence n Operating system independence n Network independence n DBMS independence
21 DDS issues n Query processing ä Optimisation even more important n Catalogue (data dictionary) management ä Centralised? ä Fully replicated? ä Partitioned? ä Combination of first and third?
22 DDS issues n Update propagation ä An issue where replication is used. ä “Primary copy” system n Recovery ä Two-phase commit n Recovery ä Locking strategies
23 Summary n Mapping the logical model n Denormalisation n Traditional database architecture n Client/server model n Distributed Database systems ä Advantages ä Objectives ä Implementation methods ä Issues
24 Further reading n Rolland chapter 10 n Hoffer chapters 12 n Denormalisation - click to follow the link! Denormalisation - click to follow the link!