Construction of a database Per Weidenman PAR AB
Database A collection of data It belongs together It models the ”world” Database management system (DBMS) The database (a collection of interrelated data) Software to manage and access the data
DBMS Input: transactions User: Searching Reporting Statistical analysis Organised data DBMS requirements ”Database” Data Warehouse etc.
Database management systems (DBMS) Microsoft Access Microsoft SQL Server DB2 Oracle MySQL FirebirdSQL etc. SQL – Structured Query Language A computer language to define and search data
Relational databases Tables containing data, organised in rows and columns Keys, used for linking data in different tables
Example Simple database for collecting and organising statistical papers Created in Microsoft Access
Paper name and details Link to dokument (pdf file) Autors
A database with four tables Keys
One of the tables, containing paper name and details One paper on each row Rows containing paper name and other details Key
The keys are used to link data in the four tables
…56111…56 123…44123… Aaaa Bbbb Cccc Dddd Table ”artiklar” Table ”författare2” Table ”personer2” Key: artikel_id Key: artikel_id Key: person_id Key: person_id One paper having 3 autors One person being the autor of 2 papers
A query: the result of asking the database about papers and autors One paper and the corresponding 3 autors One autor and the corresponding 2 papers
DBMS Input: transactions User: Searching Reporting Statistical analysis Organised data DBMS requirements ”Database” Data Warehouse etc. IT Department ”Business” users
DBMS requirements from a statistical / analytical viewpoint Data quality Data types Performance Maximun information Historical data Regulation and secrecy
DBMS requirements from a statistical / analytical viewpoint DBMS Data quality Instead of entering text/data by typing… Input: transactions Sales System X Enter customer name: User: Searching Reporting Sales System X Choose customer name: Volvo Personvagnar AB Volvo Lastvagnar AB Volvo Construction AB Volvo Bussar AB Volvo Logistics AB … … use, if possible, selection from a list of valid values
DBMS requirements from a statistical / analytical viewpoint DBMS Data quality Input: transactions User: Searching Reporting Sales System X Enter customer age: Define rules for valid input (values, intervals, etc.) We dont want: Negative values
DBMS requirements from a statistical / analytical viewpoint DBMS Data quality Input: transactions User: Searching Reporting Handling of missing values … Missing values should stored as ”null” in the database. Not as 0 (digit zero)
DBMS requirements from a statistical / analytical viewpoint Data types Text Numeric
DBMS requirements from a statistical / analytical viewpoint Performance DBMS Input: transactions User: Searching Reporting Statistical analysis Organised data DBMS requirements Searching for individual records Creating ”prepared” reports by counting or summing Large datasets Multivariate methods Iterative estimation Etc.
DBMS requirements from a statistical / analytical viewpoint DBMS Maximum information Input: transactions User: Searching Reporting Sales System X Enter customer age: 34 We need to report on age groups: … Thus we store age as an interval, not as a value! The fallacy of beeing too user oriented!
DBMS requirements from a statistical / analytical viewpoint DBMS Historical data Input: transactions User: Searching Reporting Sales System X Customer name: Customer address: Order date: Order value: Table: Orders Customer ID Order date Order value Each new order for a specific customer … … will be added to table Orders and stored as a ”new row”
DBMS requirements from a statistical / analytical viewpoint DBMS Historical data Input: transactions User: Searching Reporting Sales System X Customer name: Customer address: Order date: Order value: Table: Customers Customer ID Customer name Customer address But a new address … … will probably UPDATE the existing record (row) for the specific customer Thus, the old value of ”customer address” will be deleted and replaced with the new value. But this will do fine for users focusing on searching / reporting!
DBMS requirements from a statistical / analytical viewpoint DBMS Historical data Input: transactions User: Searching Reporting Customer ID Customer name Customer address Table: Customers Table: Customers_history Customer ID Customer name Customer address FromTo Create av new table to contain historic records Each time a value is UPDATED for a certain customer … … the complete (previous) record is transfered to the table Customers_history
DBMS requirements from a statistical / analytical viewpoint DBMS Historical data Input: transactions User: Searching Reporting Customer ID Customer name Customer address Table: Customers Table: Customers_history Customer ID Customer name Customer address FromTo This structure will make analysis of processes possible But not easy!
DBMS requirements from a statistical / analytical viewpoint Regulation and sectrecy
DBMS requirements from a statistical / analytical viewpoint Current data Current + historical data Operating on individual records Operating on many records Next on this channel…
DBMS Input: transactions User: Searching Reporting Statistical analysis Organised data DBMS requirements A database containing historic transactions
Board data PAR / Bisnode database Tables Basic company data One record per company. Contains name, address, startdate, enddate, line of business, etc. Historic company data Many records per company. Contains the accumulated historic records from table FTG Balance sheet data One record per annual report (thus many records per company). Turnover, profit, key ratios, etc. Board member data Many records per company and person. FTG FTG_H BOKSLUT FUNKTION_ PERIOD And many more tables! Serrano Statistical analysis How? Historic names etc. Sampling for times series statistics
END
Basic company data One record per company. Contains name, address, startdate, enddate, line of business, etc.
Historic company data Many records per company. Contains the accumulated historic records from table FTG
Balance sheet data One record per annual report (thus many records per company). Turnover, profit, key ratios, etc.
Board member data Many records per company and person.
Serrano Balance sheet data from different periods transformed to yearly data records
Serrano Historic transactions from FTG_H transformed to yearly data records
Serrano Board Data Balance member data from any mix of startdate, enddata and period length transformed to yearly data records
Summing up register data to annual figures A ÅR Nu 321 Exampel. Register containing balance sheet data: Number of employes Turnover Profit Tangible assets Etc. Exampel. Register containing balance sheet data: Number of employes Turnover Profit Tangible assets Etc.
A ÅR Nu 321 B Brutet räkenskapsår Summing up register data to annual figures
A ÅR Nu 321 B C Omlagda räkenskapsår Summing up register data to annual figures
A ÅR Nu 321 B C D Missing data Summing up register data to annual figures
ÅR Nu 321 B Förslag: Bryt ner flödesvariablerna (omsättning, vinst, etc.) till månadsvärden … Förslag: Bryt ner flödesvariablerna (omsättning, vinst, etc.) till månadsvärden … Summing up register data to annual figures
ÅR Nu 321 B Förslag: … och summera månadsvärdena till ett ’fingerat’ kalenderårsvärde. Förslag: … och summera månadsvärdena till ett ’fingerat’ kalenderårsvärde. Förslag: … samt imputera för full täckning under sista året Förslag: … samt imputera för full täckning under sista året Summing up register data to annual figures
ÅR Nu 321 B Summing up register data to annual figures Database
First exampel Register based transport statistics for SIKA: Decreased response burden Increased understanding of the transporting companies (as a complement to the ”usual” fokus on type of goods) Time series describing economic status and change.
Objective: Describing economic status and change in transporting companies during the last ten years. Total number of employes and turnover …
Objective: Describing economic status and change in transporting companies during the last ten years. … or turnover growth compared to BNP
Objective: Describing economic status and change in transporting companies during the last ten years. … or profit development for different types of freight companies
Objective: Describing economic status and change in transporting companies during the last ten years. … or the number of employes in a cohort of new companies.
Aktiva företagAktiva aktiebolagBNP ÅrTotalt Därav aktie- bolag Antal anställda Nettoom- sättning (Mkr) Löpande priser (Mkr) Tables based on balance sheet data from each company
Aktiva företagAktiva aktiebolagBNP ÅrTotalt Därav aktie- bolag Antal anställda Nettoom- sättning (Mkr) Löpande priser (Mkr) What data is needed? Company data including micro level history. Exactly which companies where active in transport during each year? Company data including micro level history. Exactly which companies where active in transport during each year? Balance sheet data from all transporting companies for each year
Aktiva företagAktiva aktiebolagBNP ÅrTotalt Därav aktie- bolag Antal anställda Nettoom- sättning (Mkr) Löpande priser (Mkr) What data is needed? Company data including micro level history. Exactly which companies where active in transport during each year? Company data including micro level history. Exactly which companies where active in transport during each year? Balance sheet data from all transporting companies for each year Faster access to ”last years” data compared to taxation based registers
A ÅR Nu B C D 321 Sampling companies for time series statistics
A ÅR Nu B C D 321 Sampling companies for time series statistics
A ÅR Nu B C D 321 Sampling companies for time series statistics
A ÅR Nu B C D 321 ACDACD ABCDABCD ABCABC Sampling companies for time series statistics