Dissemination and use of aggregate data: structures and functionality

Slides:



Advertisements
Similar presentations
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Advertisements

Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
United Nations Statistics Division Principles and concepts of classifications.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Chapter 4: Organizing and Manipulating the Data in Databases
Chapter 4-1. Chapter 4-2 Database Management Systems Overview  Not a database  Separate software system Functions  Enables users to utilize database.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
South Africa Data Warehouse for PEPFAR Presented by: Michael Ogawa Khulisa Management Services
Chapter 4: Organizing and Manipulating the Data in Databases
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Relational Databases and Statistical Processing Andrew Westlake Survey & Statistical Computing
Data Warehousing.
XP Chapter 1 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Preparing To Automate Data Management Chapter 1 “You.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Implementation Experiences METIS – April 2006 Russell Penlington & Lars Thygesen - OECD v 1.0.
Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Electronic data collection System in CSB of Latvia By Karlis Zeila, Vice President, CSB of Latvia IT DG meeting, October , Eurostat.
Analytics & Reporting Tool.  Outline how to access SAS OLAP Cubes through SAS AMO  Review SAS OLAP Cube creation and how it relates to integration with.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Dissemination Statline tool and organisation André de Boer.
Presented By: Pedel Oppong-Abebrese,Pedel Oppong-Abebrese Michael Boadi, William Osei, Nana Amoa OforiMichael BoadiWilliam OseiNana Amoa Ofori DATA WAREHOUSING.
Supervisor : Prof . Abbdolahzadeh
Introduction to Marketing Research
Data Analysis and OLAP Dr. Ms. Pratibha S. Yalagi Topic Title
Operation Data Analysis Hints and Guidelines
MANAGEMENT OF STATISTICAL PRODUCTION PROCESS METADATA IN ISIS
Database System Concepts and Architecture
Data and Applications Security Developments and Directions
Week 12 Option 3: Database Design
IBM COGNOS online Training at GoLogica Technologies
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Overview of LDB Technology and Tools
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
DATA CUBE Advanced Databases 584.
Data quality 1: Individual records
Cooperation on Dissemination within the ESS
Generic Statistical Business Process Model (GSBPM)
SDMX: A brief introduction
Working Group on Population and Housing Censuses
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
PRESENTATION OF SHORT-TERM ECONOMIC STATISTICS
Database Systems Instructor Name: Lecture-3.
SDMX Information Model: An Introduction
Data validation in Statistical Office of the Republic of Serbia
OLAP in DWH Ján Genči PDT.
RAMON Re-engineering An Update
Dimensional Model January 16, 2003
Data and Applications Security Developments and Directions
DATABASES WHAT IS A DATABASE?
Expert Group on Quality of Life Indicators
Aggregate improvement Lost, shrunken, and collapsed Ralph Kimball
Mapping Data Production Processes to the GSBPM
Data and Applications Security Developments and Directions
Searching the Internet
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Slides based on those originally by : Parminder Jeet Kaur
Metadata on quality of statistical information
Meta-Data: the key to accessing Data and Information
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
Best Practices in Higher Education Student Data Warehousing Forum
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Dissemination and use of aggregate data: structures and functionality Andrew Westlake Survey & Statistical Computing ssc@count.com www.sasc.co.uk 5/13/2019 Meta-data & Functionality

Aggregate data: structures and functionality What are the objectives Systems to support the preparation, processing and dissemination of statistics in the form of aggregated data Appropriate tool set Automation of production processes Dynamic access and ‘analysis’ Developments on the Database side Statistical Database proposals from Computer Science Commercial development of Data Warehouses (OLAP) Requirements Structure Functionality - Manipulation, Dissemination 5/13/2019 Meta-data & Functionality

Processing Aggregate Data 5/13/2019 Meta-data & Functionality

Aggregated Results, as Multi-way Table Period Year Week Month Day Measures Reports received Population at risk Estimated Incidence rate SD of Incidence rate { District Region Country Location Detail Minor Group Major Group Disease Classification (ICD) This example has three dimensions (so that it can be visualised). In reality, for this application, we would need at least two more, Age and Gender. 5/13/2019 Meta-data & Functionality

Statistical Databases SSDBM conferences, from early ‘80s STORM model, Rafanelli & Shoshani, ‘90 Summarizability, Lenz & Shoshani, ‘97 National Statistical Offices Research Projects, particularly Eurostat Idaresa, Addsia, Rainbow, IMIM Concern for concepts, structure, rules, validity No Money 5/13/2019 Meta-data & Functionality

Commercial developments Data Warehouse DB with Emphasis on performance with fixed data, no transactional requirements Star schema for multi-way tables, Data Cubes Products from main stream DB vendors, and specialists OLAP (On-Line Analytical Programming) Term invented by Codd Emphasis on exploration of aggregate structure, selection of sub-groups, change focus between detail and broad groups Lots of Money Products DB Vendors, e.g. Oracle Express, Pivot tables in MS Excel 2000, Informix Red Brick Specialists, e.g. Beyond 20/20, Super-Star Standardisation proposals 5/13/2019 Meta-data & Functionality

Aggregation Functionality Store information with minimal aggregation Maximum detail in classifications Further aggregation (to less detail) on demand (may pre-compute for efficiency) Algebra for aggregating classifications and measures is basically straight forward Aggregation of Measures Everything based on summation can be regrouped (cf. updating algorithms, sufficient statistics) Some others, e.g Range Special issues for time, aggregate or cross sectional measures All aggregated tables are proper tables 5/13/2019 Meta-data & Functionality

Manipulation Functionality - for Processing Manipulation of Measures Introduce measures from other tables with similar structure Derive measures within cells Not all combinations are meaningful Combination of two tables Find common dimensions and classifications (may require some aggregation or mapping) Choose one table as the detail table Aggregate all non-common dimensions out of the 2nd table Transfer measures from 2nd table, repeating values over missing classifications Meta-data to control validity of operations 5/13/2019 Meta-data & Functionality

Rules for proper table structure Well-defined base population from which measures are computed May include a selection rule w.r.t. a wider population Classification Categories must be exclusive and exhaustive w.r.t. the base population Cannot have its own selection rule (but might have a residual category) Measure May have a selection rule (e.g. count with a property) Care is sometimes needed to distinguish between classifications and measures 5/13/2019 Meta-data & Functionality

Confusion between classification and measure Wrong Subject classification is not exclusive if students can register for more than one course Correct Counts selected by subject are different measures 5/13/2019 Meta-data & Functionality

Presentation Functionality Layout Mapping from dimensions to Rows, Columns, Pages Improper table combinations Combination of dissimilar dimensions e.g. Age groups by (SEG + Housing) Distinction between Classification and Measure is less important for presentation Medium Paper, Web, often with analysis (commentary) Machine readable (take away, not linked) Dynamic, for local or remote manipulaton Associated material Generation of descriptions, footnotes, indexes, content lists 5/13/2019 Meta-data & Functionality

Manipulation Functionality - for Exploration Dynamic viewing, linked to source aggregations Selection Subset of classification cells, and of measures Dynamic regrouping Roll up to combine existing groups to next level Drill down to get more detail in groups at lower level Operate independently, i.e. not all parts of a classification at the same level User-defined groupings All derivation and presentation facilities Specialist browsers, available for local data or over the Internet 5/13/2019 Meta-data & Functionality

Discovery through Meta-data Generic descriptions Population, Classifications, Measures linked to concept definitions for searching Specific topics Formal definitions of standard components selection rules, standard classifications, measure types Specific descriptions of substantive content source variable definitions, questionnaire structure, etc. Accessibility Information must be available to search engines and user 5/13/2019 Meta-data & Functionality

Meta-data & Functionality Conclusions Good analysis of structural and functionality requirements can produce good products for automated and individual use Further academic work on structures and functionality needed Commercial products are useful but lack many obvious features - we should demand more Commercially driven standards concentrate on basic functionality and overlook statistical and practical validity - we should get more involved 5/13/2019 Meta-data & Functionality