Databases in Context Wendy Moncur Department of Computing Science, University of Aberdeen
Databases in Context Database design in a major bank Database design in a major bank Database management Database management 6000-table Personnel database 6000-table Personnel database
Who am I ? Wendy Moncur Wendy Moncur DataBase Administrator (DBA) at one of UK’s largest banks. DataBase Administrator (DBA) at one of UK’s largest banks. Designed databases for high performance & availability. Designed databases for high performance & availability. 19 years industry experience. 19 years industry experience. Platform: DB2 & SQL Platform: DB2 & SQL Largest database: 6000 tables Largest database: 6000 tables
Why listen? DBA Average Minimum Salary £41,896 DBA Average Minimum Salary £41,896 DBA Average Maximum Salary £47,147 DBA Average Maximum Salary £47,147 Source: Source:
What does a DBA do? Database design & optimisation Database design & optimisation Quality assurance of SQL Quality assurance of SQL Performance management Performance management Database administration Database administration
Case study: the monster database
6000+ tables indexes
Part1: Challenges “One size fits all” “One size fits all” External supplier External supplier tables tables indexes indexes 1 tablespace 1 tablespace Short timescale Short timescale
Challenges: “one size fits all”? One size does not fit all. One size does not fit all. Performance of SQL statements dependent on: Performance of SQL statements dependent on: Database design Database design Index design Index design The DATA The DATA
Challenges: “one size fits all”? Every company has different requirements. Every company has different requirements. Customers demand high performance... and control the budget. Customers demand high performance... and control the budget. Service Level Agreements (SLAs) dictate … Service Level Agreements (SLAs) dictate … Minimum transaction speed Minimum transaction speed Number of concurrent users Number of concurrent users Number of remote locations Number of remote locations Daily system availability Daily system availability Database must be tailored to achieve site-specific SLAs. Database must be tailored to achieve site-specific SLAs.
Challenges: external supplier Challenges: external supplier Software package & database from external supplier. Software package & database from external supplier. Cannot change this. Cannot change this.
Challenges: 6,000+ tables Cannot change tables: no denormalisation allowed. Cannot change tables: no denormalisation allowed. Supplied program code demands these tables exist. Supplied program code demands these tables exist. Cannot change supplied program code unless essential. Cannot change supplied program code unless essential.
Challenges: 18,000+ indexes Can change indexes: Can change indexes: Unique indexes Unique indexes Clustering indexes Clustering indexes Secondary indexes Secondary indexes
Unique index Defines what makes a row unique. Defines what makes a row unique. Components of the index cannot be changed. Components of the index cannot be changed. Order of components can be changed. Order of components can be changed.
Unique index E.g. – for Table “EMPLOYEE” Unique index = DateOfBirth, Firstname, Surname. Most queries ask for data where only Surname, Firstname are known. SELECT Surname, Firstname, DateOfBirth From Employee Where Surname = “Jenkins” And Firstname = “Malcolm” ; Recommendation: Change order of unique index to Surname, Firstname, DateOfBirth.
Clustering indexes Defines the physical order in which rows of data should be stored. Defines the physical order in which rows of data should be stored. Components of the index can be changed. Components of the index can be changed. Order of components can be changed. Order of components can be changed.
Clustering indexes E.g. – Table “EMPLOYEE” Clustering index = DateOfBirth Yet most queries order by EmploymentStartDate SELECT EmploymentStartDate, Surname, Firstname From Employee Where Surname = “Jenkins” And Firstname = “Malcolm” ; Order by EmploymentStartDate; Recommendation: Change clustering index to use EmploymentStartDate.
Secondary indexes Not unique. Not unique. Do not dictate how the data is to be held. Do not dictate how the data is to be held. Created to improve performance of queries and updates. Created to improve performance of queries and updates. Increases cost of insert and update, as must be created and maintained along with the table. Increases cost of insert and update, as must be created and maintained along with the table. Recommendation: Drop superfluous secondary indexes.
Challenges: One tablespace
Many tablespaces Create many new tablespaces. Create many new tablespaces. Split the tables between them, according to table function. Split the tables between them, according to table function.
At least 4 test environments: At least 4 test environments: 96,000 objects! 96,000 objects! ((6,000 tables + 18,000 indexes) * 4 environments) 3 months 3 months Challenge: Short timescale VanillaUnit testSystem testPre-live
Tools Use tools to… Use tools to… Check performance of each SQL statement Check performance of each SQL statement Manage change process Manage change process
Check performance “EXPLAIN” “EXPLAIN” Evaluates route to data for every SQL statement. Evaluates route to data for every SQL statement. Identifies what indexes are used Identifies what indexes are used Doesn’t identify redundant indexes Doesn’t identify redundant indexes Doesn’t identify indexes that need to be changed. Doesn’t identify indexes that need to be changed.
Manage change process Rigorous control needed Rigorous control needed Achieved through… Achieved through… Consistent naming standards Consistent naming standards Detailed record of every change Detailed record of every change Consistent route through environments, no short cuts Consistent route through environments, no short cuts DBA tools DBA tools
Part1: Recap of challenges Can’t change: “One size fits all” “One size fits all” External supplier External supplier tables tables Can change: indexes 1 tablespace Short timescale
Part2: The Production Database Does it perform? Does it perform? Can the right people use it? Can the right people use it? If disaster strikes, can the data be recovered? If disaster strikes, can the data be recovered?
Does the database perform? Database performance monitored against Service Level Agreements (SLAs). Database performance monitored against Service Level Agreements (SLAs). Regular health checks carried out: Regular health checks carried out: Data stored in sequence? Data stored in sequence? Enough space? Enough space? If sub-standard performance, further database design work done. If sub-standard performance, further database design work done.
Can the right people access the data? PERSONNEL database
Can the right people access the data? Personnel team Query & update data at individual or regional level PERSONNEL database
Can the right people access the data? Personnel team Query & update data at individual or regional level PERSONNEL database DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data
Can the right people access the data? Personnel team Query & update data at individual or regional level PERSONNEL database DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data Chief executive Employee statistics
Can the right people access the data? Personnel team Query & update data at individual or regional level PERSONNEL database DBA Backup/ restore data Reorganise data Change database definitions Update statistics on data Chief executive Employee statistics Staff member Their own data
Can the right people use the database? Different people, different information needs. Different people, different information needs. Sensitive data – salary, health, discipline… Sensitive data – salary, health, discipline… Solution Solution VIEWS VIEWS Transaction Management Transaction Management
If disaster strikes, can the data be recovered? Robust backup & recovery strategies for: Robust backup & recovery strategies for: Hardware failure Hardware failure Software failure Software failure
Part2: Recap of Production Database issues Database must perform to acceptable level. Database must perform to acceptable level. Only the right people should have access to any data item. Only the right people should have access to any data item. No matter what, the data must be recoverable. No matter what, the data must be recoverable.