Developing Statistical Information Systems and XML Information Technologies - Possibilities and Practicable Solutions Geneva, 8-10 May 2007 Heikki Rouhuvirta, Statistical Methodology R&D
Heikki Rouhuvirta Approaches to Statistics Production Sources to statistics – Data Processing Sources to statistics – Statistical Methodology Statistics as Information
Heikki Rouhuvirta tilasto- aineisto Dirty data Compilation / combining of data logical verifications processing into statistical concepts reporting release analyses reporting release protection of unit-level data quality control and approval of data for the purpose of statistics compilation further processing registers Inquiries other statistical data Imputation etc. Datum IT in Statistics Production
Heikki Rouhuvirta Methodological processing of statistical data In statistics production
Heikki Rouhuvirta Statistical Information
Heikki Rouhuvirta Challenge: create solutions that unite the foregoing point of views the solutions offer the services that statistic production needs the solutions are easy recognizable by a user and offer an adequate informative basis for each individual task by solutions the entity of tasks is manageable for the statistician Key for Solution: exploitation of XML Technology
Heikki Rouhuvirta XML Spesification for Statistical Information Common Structure of Statistical Information (CoSSI) Basic of XML
Heikki Rouhuvirta … the result from a statistics standpoint …
Heikki Rouhuvirta 0.Defining 1.Collecting 2.Editing 3.Producing public statistics 4.Using basic format datamatrix and description condensed format table and description descriptions in different documents matrix model including statmeta table model including statmeta statistical metadata model Stages of Processing condensing interpreting Model of Data Organisation matrix module table module statmeta module Statistics Production and Statistical Information
Heikki Rouhuvirta … case studies of XML in statistics production …
Heikki Rouhuvirta XML Database and Statistical Information
Heikki Rouhuvirta Retrieval of Statistical Metadata for a Variable - Simple User Interface
Heikki Rouhuvirta Turn over the Documents in XML Database
Heikki Rouhuvirta Saving Documents to XML Database
Heikki Rouhuvirta /db/logs/contents.xml... STORE /db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4.xml STORE /db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_001.gif STORE /db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.gif STORE /db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_002.png STORE /db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu4_eq_00.gif UPDATE /db/Tilastot/Arbortext-koulutus/Julkaisut/Julkaisu1.xml /db /system admin dba /config admin dba users.xml admin dba rwurwu--- /Tilastot admin dba /logs admin dba contents.xml admin dba rwurwur-- Event log of XML Database
Heikki Rouhuvirta Tabulation Application Architecture in SAS
Heikki Rouhuvirta Tabulation Wizard User Interface in SAS EG
Heikki Rouhuvirta SAS Data Editing Process
Heikki Rouhuvirta Statistical data Logical schema of an XML file
Heikki Rouhuvirta Archiving and Backuping to XML
Heikki Rouhuvirta Example of Xquery/SQL
Heikki Rouhuvirta Content of XML file
Heikki Rouhuvirta Production and Dissemination of Tables in Publishing Process
Heikki Rouhuvirta XML Publication Editor - User Interface
Heikki Rouhuvirta Retrieval of Statsitical Information
Heikki Rouhuvirta … and statistical information in tables
Heikki Rouhuvirta Statistical figure 6 Statistical figure 1Class value 1 Statistical figure 8 Statistical figure 4 Class value 2 Variable 3Variable 2 Variable 1 Statistical figure 6 Statistical figure 5 Statistical figure 2 Statistical figure 1Class value 1 Statistical figure 7 Statistical figure 3 Class value 2 Variable 3Variable 2 Variable 1 Table 1. Statistical Metadata in a informative statistical table (I) Statistical metadata: title, subtitle, footnote, metadata reference (quality declaration) Document metadata elements: subject, keywords, content description, date, identifier Statistical metadata elements: -name, specification, concept definition, concept definition description, operational definition, operational definition description, calculation name, calculation formula, calculation description, measurement unit, measurement description Statistical metadata elements: -code, name, description Document metadata elements: -classification id, type, author, date Statistical metadata elements: -note Register metadata elements: name, concept definition, formation intsruction, law, interpretation of law, lawcases, etc.
Heikki Rouhuvirta Statistical figure 6 Statistical figure 1Class value 1 Statistical figure 8 Statistical figure 4 Class value 2 Variable 3Variable 2 Variable 1 Statistical figure 6 Statistical figure 5 Statistical figure 2 Statistical figure 1Class value 1 Statistical figure 7 Statistical figure 3 Class value 2 Variable 3Variable 2 Variable 1 Table 1. Statistical Metadata in a informative statistical table (II) Quality declaration Quality Indicators: Coefficient of Variation Value=0.92 Quality Indicators: Coefficient of Variation Value=0.87
Heikki Rouhuvirta Statistical figure 6 Statistical figure 1Class value 1 Statistical figure 8 Statistical figure 4 Class value 2 Variable 3Variable 2 Variable 1 Statistical figure 6 Statistical figure 5 Statistical figure 2 Statistical figure 1Class value 1 Statistical figure 7 Statistical figure 3 Class value 2 Variable 3Variable 2 Variable 1 Table 1. Statistical Metadata in a informative statistical table (III) Quality declaration Quality Indicators: Coefficient of Variation Value=0.92 Quality Indicators: Coefficient of Variation Value=0.87
Heikki Rouhuvirta Conclusions XML Based Service Environment in Statistics Production The statistics production solution briefly described above gives indications of the kinds of services that could be produced from a statistical information system in future, both for statisticians and the users of statistical data. The foundation (for statistics production) is an XML-based information architecture and standard applications exploiting it. Basing the implementation of the information architecture on XML allows utilisation of standard and standard-like specifications, but the special characteristics of statistical information should be taken into consideration in their application and implementation. If, for instance, the possibilities of a semantic structural specification are not exploited in the structural analysis and the final structure of statistical data, from the point of information management the solutions become complicated, on the one hand, and ineffective in practice, on the other. From the perspective of application development, it seems especially important that the information architecture itself does not contain application-specific data specifications, because we are unlikely to see a situation where we would have just one monolithic application for both statistics production and information service provision. A semantically relevant structure helps the statistician and the user of statistics to control the correctness of contents.
Heikki Rouhuvirta Thank you for your attention!