Generic Statistical Information Model (GSIM) Jenny Linnerud
This webinar on GSIM ( Generic Statistical Information Model ) is part of a series of lectures on the main projects undertaken by the High Level Group for the Modernization of Official Statistics (HLG-MOS)
Vision of the High Level Group
Why do we need to modernise? We have : rigid processes and methods; inflexible ageing technology; increased cost of traditional data collection methods; slow response to emerging information needs; slow adoption of new and alternative sources of data (such as sensor, satellite); difficulty in attracting and retaining skilled staff in the competitive labour market. In an increasingly digital and data rich environment statistical organizations are struggling to remain timely and relevant.
What is GSIM? It is a strategic approach and a new way of thinking, designed to bring together statisticians, methodologists and IT specialists to modernize and streamline the production of official statistics. It is a reference framework of internationally agreed definitions, attributes and relationships that describe the pieces of information used in the production of official statistics (information objects). This framework enables generic descriptions of the definition, management and use of data and metadata throughout the statistical production process.
What is the relationship betweeen GSIM & GSBPM? GSIM and GSBPM are complementary models for the production and management of statistical information. GSBPM models the statistical production process and identifies the activities undertaken by producers of official statistics that result in information outputs. GSIM helps describe GSBPM sub-processes by defining the information objects that are used by them, that flow between them, and are created in them in order to produce official statistics.
What is an information object? GSIM is a model of objects that specify information about the real world (“information objects”). Examples include data and metadata (such as classifications), as well as rules and parameters needed for production processes to run (e.g. data editing rules). GSIM identifies ca. 110 information objects, which are grouped into four broad categories
Business Exchange Concepts Structures Product Exchange Channel Administrative Register Web Scraper Channel Questionnaire Variable Concept Statistical Classification Unit PopulationData Set Data Structure Information Resource Referential Metadata Set Referential Metadata Structure Statistical Support Program Business Process Process Step Statistical Program Statistical Need
GSIM Development 2012 GSIM sprint in Slovenia, February GSIM sprint in Republic of Korea, March Integration workshop in the Netherlands, November GSIM v1.0 December
Developing the GSIM 17 different organisations
What are the benefits of using GSIM? GSIM enables statistical organizations to rethink how their business could be more efficiently organized – by defining information objects common to all statistical production, regardless of the subject matter area, Improves communication between different disciplines involved in statistical production – within and between statistical organizations; – between users and producers of official statistics. Generates economies of scale – reuse of information can improve comparability of statistics Enables greater automation of the statistical production process Validates existing information systems In Statistics Norway we are also using GSIM to communicate with other government agencies and with IT consultants.
Statistics Norway’s participation in GSIM Implementation GSIM v1.0 Brochure and Communication document in Norwegian Informal task force on metadata flows in the GSBPM - ca. 20 GSIM information objects were mapped to the phases in GSBPM v4 Informal task force GSIM v1.0 discussion forum GSIM Statistical Classification Model -> GSIM v1.1 December 2013 GSIM Statistical Classification Model Trying out GSIM v1.1 within the RAIRD project
GSIM implementation countries provided GSIM Case studies in Canada, Finland, France, Germany, Italy, New Zealand, Norway, Sweden GSIM Statistical Classifications is the part of the model that statistical organisations have implemented most
GSIM in Statistics Norway - Vision GSIM should lead to: A foundation for standardised statistical metadata use throughout systems A standardised framework for consistent and coherent design of statistical production Increased sharing of system components
Remote Access Infrastructure for Register Data (RAIRD) Statistics Norway and the Norwegian Social Science Data Services (NSD) aim to establish a national research infrastructure providing easy access to large amounts of rich high-quality statistical data for scientific research, while at the same time managing statistical confidentiality and protecting the integrity of the data subjects. The work is organized as a project, RAIRD – Remote Access Infrastructure for Register Data, and funded by the Research Council of Norway. See:
RAIRD Information Model (RIM) RIM is an implementation of the Generic Statistical Information Model (GSIM) v1.1. We have based RIM on the GSIM Design Principles RIM extends GSIM with 27 Information objects that are mainly specialisations e.g. to include different types of agents (producers, administrators and researchers) RAIRD is a project that is still in progress with completion planned in 2017.
Potential Benefits of RAIRD Simplify the approval process Provide quicker access to analysis results More Masters students will use our data Simplify large, complicated studies by providing exploratory analysis in an early phase More research and use of our data
Contents in 2017 Demography Education Income Labour market Social security and benefits Better transfer of knowledge within Statistics Norway
Load API SSB Data Mgt. System Event History Data Store Data Catalogue Virtual Statistical Machine Disclosure Control System Analysis Data Set User Operations Provisional Output Final Output User Views Event History Input Data Set Input Metadata Set Browser Browse Data Catalogue Browser VIRTUAL RESEARCH ENVIRONMENT Overview of the main components
Metadata Researcher cannot see the data -> Simplifies the approval process Metadata is the interface to the data Metadata
Analyse data
Statistical Confidentiality in RAIRD Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality on 5-7 October 2015 at Statistics Finland - Topic (v): Practicum: Case Studies and Software
UNECE - GSIM Wiki How do I find out more?
? Questions Thank-you to Peter Frayne for contributing questions in advance
The End