Statistical file systems and archive statistics – Experiences from Statistics Sweden Contribution to the Nordbotten Seminar at the Nordic Statistical Meeting in Copenhagen 2010 Bo Sundgren ”Special thanks to Statistics Denmark for having sponsored my participation in this meeting”
Archive-statistical principles (not technology-dependent, by and large) Reuse existing raw data from administrative and statistical sources – for statistical purposes Continuous inflow of data (more or less) Organise data in a systematic way: statistical file system, databases, data warehouse Ad hoc production of statistics Systematic descriptions and definitions of data: –data and table definition languages; Nordbotten (1967) –metadata; Sundgren (1973) Standardised definitions and identifiers enabling flexible integration and combination of data: registers, classifications, standard variables Generalised software
Benefits Much lower costs for collection of raw data: Up to 99% cost reduction; Statistics Netherlands Reduced response burden Faster data collection Faster and more flexible response to new demands More coherent data and statistics: statistical systems Potential for better quality – but some new quality problems have to be tackled, or maybe rather the same old problems in new shapes
The 1960’s Seminal papers by Svein Nordbotten from 1960 (Helsinki) and onwards (1963, 1965, 1966, 1967a,b,c, 1968) Enthusiastic interest by Ingvar Ohlsson and Lennart Fastbom, Director General and Deputy Director General of Statistics Sweden I was recruited in 1968 and started to work with Christer Arvas, new head of a new unit created for this development Svein Nordbotten and Börje Langefors, founder of information systems (informatics) as an academic discipline in Sweden
Seminal Nordbotten papers Nordbotten, S. (1960). Elektronmaskinene og statistikkens utforming i årene framover, De Nordiske Statistikermøter i Helsingfors 1960, Helsinki 1961, pp Available for free downloading from Elektronmaskinene og statistikkens utforming i årene Nordbotten, S. (1963). Automatic Editing of Individual Statistical Observations, Statistical Standards and Studies, No. 3, United Nations. Available for free downloading from Editing of Individual Statistical Nordbotten, S.(1965). The Efficiency of Automatic Detection and Correction of Errors in Individual Observations as Compared with other Means of Improving the Quality of Statistics, Proceedings from the 35th Session of the International Statistical Institute. Beograde Available for free downloading from Efficiency of Automatic Detection and Correction of Errors in Individual Observations as Compared with other Means of Improving the Quality of Statistics, Nordbotten, S. (1966). A Statistical File system. Statistisk Tidskrift, Stockholm. Available for free downloading from Statistical File system. Nordbotten, S. (1967a). On Statistical File System II. Statistisk Tidskrift. Stockholm. Available for free downloading from Statistical File System II. Nordbotten, S. (1967b). Automatic Files in Statistical Systems. Statistical Standards and Studies. Handbook No. 9. United Nations. N.Y. Available for free downloading from Files in Statistical Systems. Nordbotten, S. (1967c). Purposes, Problems and Ideas Related to Statistical File Systems. Proceedings from the 36th Session of the International Statistical Institue. Invited paper. Sydney. Available for free downloading from Problems and Ideas Related to Statistical File Systems. Nordbotten, S. (1968). Konfidensiell behandling av data, informasjonsnytte og klassifisering av data, Statistisk Tidskrift, Nr. 5, Stockholm In Norwegian. Konfidensiell behandling av data, informasjonsnytte og klassifisering av data
Nordbotten ”data space” and Langefors ”e-message”,, > Cf also relational data model: time missing data space e-message
ARKSY development projects TAB68, a non-procedural language for easy, fast, and flexible production of statistical tables VARKAT, a metadata system for documentation of variables, classifications, and populations ARKDABA, a microdatabase prototype RSDB and TSDB, multidimensional macrodatabases based on the αβγτ-model and the metadata-driven software AXIS On-going development of statistical (base) registers Planning a reorganisation of Statistics Sweden based on archive-statistical principles (including a data warehouse) and input-thruput-output
Major obstacles The privacy debate provoked by FoB70: development of microdatabases was stopped, resources redirected towards protection of statistical confidentiality Internal resistance against documentation: protection of the information monopoly of survey owners Internal resistance against the proposed new organisation based on a centralised data warehouse, separation of input from output, and dismantling of the traditional stovepipes
Reorientation after 1974 Leaving the subject matter organisation as it was Merging the programming centre and the database centre into a new systems department Standardising data structures (flat files and relational databases) Maximum use of generalised software, including AXIS and the TAB68 software family, interfacing standardised data A model for systems development based on archive- statistical and infological principles Metadata-driven systems: AXIS, the CONDUCTOR SCBDOK (1991) Steadily growing use of administrative data (97-99%) Introduction of microcomputers (80’s) and the Internet (90’s)
2007: A new attempt to reorganise 2006: Kjell Jansson new Director General Focus on customers, processes, and architecture The Lotta project A new process department responsible for standardising the processes A standardised architecture based on SOA Customers, process owners, architects Proposed outsourcing of most IT people (IT operations successfully outsourced already in the early 1990’s) 2008: Kjell Jansson leaves Statistics Sweden
Some possible future developments with Svein Nordbotten New data sources: the Internet Participative design of statistical systems: tackling the problem of reconciling conflicting interests ”within” and between stakeholders in production of official statistics
References Most papers referred to in my paper, even the oldest ones, are available for free downloading from: – –