Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dissemination and use of aggregate data: structures and functionality

Similar presentations


Presentation on theme: "Dissemination and use of aggregate data: structures and functionality"— Presentation transcript:

1 Dissemination and use of aggregate data: structures and functionality
Andrew Westlake Survey & Statistical Computing 5/13/2019 Meta-data & Functionality

2 Aggregate data: structures and functionality
What are the objectives Systems to support the preparation, processing and dissemination of statistics in the form of aggregated data Appropriate tool set Automation of production processes Dynamic access and ‘analysis’ Developments on the Database side Statistical Database proposals from Computer Science Commercial development of Data Warehouses (OLAP) Requirements Structure Functionality - Manipulation, Dissemination 5/13/2019 Meta-data & Functionality

3 Processing Aggregate Data
5/13/2019 Meta-data & Functionality

4 Aggregated Results, as Multi-way Table
Period Year Week Month Day Measures Reports received Population at risk Estimated Incidence rate SD of Incidence rate { District Region Country Location Detail Minor Group Major Group Disease Classification (ICD) This example has three dimensions (so that it can be visualised). In reality, for this application, we would need at least two more, Age and Gender. 5/13/2019 Meta-data & Functionality

5 Statistical Databases
SSDBM conferences, from early ‘80s STORM model, Rafanelli & Shoshani, ‘90 Summarizability, Lenz & Shoshani, ‘97 National Statistical Offices Research Projects, particularly Eurostat Idaresa, Addsia, Rainbow, IMIM Concern for concepts, structure, rules, validity No Money 5/13/2019 Meta-data & Functionality

6 Commercial developments
Data Warehouse DB with Emphasis on performance with fixed data, no transactional requirements Star schema for multi-way tables, Data Cubes Products from main stream DB vendors, and specialists OLAP (On-Line Analytical Programming) Term invented by Codd Emphasis on exploration of aggregate structure, selection of sub-groups, change focus between detail and broad groups Lots of Money Products DB Vendors, e.g. Oracle Express, Pivot tables in MS Excel 2000, Informix Red Brick Specialists, e.g. Beyond 20/20, Super-Star Standardisation proposals 5/13/2019 Meta-data & Functionality

7 Aggregation Functionality
Store information with minimal aggregation Maximum detail in classifications Further aggregation (to less detail) on demand (may pre-compute for efficiency) Algebra for aggregating classifications and measures is basically straight forward Aggregation of Measures Everything based on summation can be regrouped (cf. updating algorithms, sufficient statistics) Some others, e.g Range Special issues for time, aggregate or cross sectional measures All aggregated tables are proper tables 5/13/2019 Meta-data & Functionality

8 Manipulation Functionality - for Processing
Manipulation of Measures Introduce measures from other tables with similar structure Derive measures within cells Not all combinations are meaningful Combination of two tables Find common dimensions and classifications (may require some aggregation or mapping) Choose one table as the detail table Aggregate all non-common dimensions out of the 2nd table Transfer measures from 2nd table, repeating values over missing classifications Meta-data to control validity of operations 5/13/2019 Meta-data & Functionality

9 Rules for proper table structure
Well-defined base population from which measures are computed May include a selection rule w.r.t. a wider population Classification Categories must be exclusive and exhaustive w.r.t. the base population Cannot have its own selection rule (but might have a residual category) Measure May have a selection rule (e.g. count with a property) Care is sometimes needed to distinguish between classifications and measures 5/13/2019 Meta-data & Functionality

10 Confusion between classification and measure
Wrong Subject classification is not exclusive if students can register for more than one course Correct Counts selected by subject are different measures 5/13/2019 Meta-data & Functionality

11 Presentation Functionality
Layout Mapping from dimensions to Rows, Columns, Pages Improper table combinations Combination of dissimilar dimensions e.g. Age groups by (SEG + Housing) Distinction between Classification and Measure is less important for presentation Medium Paper, Web, often with analysis (commentary) Machine readable (take away, not linked) Dynamic, for local or remote manipulaton Associated material Generation of descriptions, footnotes, indexes, content lists 5/13/2019 Meta-data & Functionality

12 Manipulation Functionality - for Exploration
Dynamic viewing, linked to source aggregations Selection Subset of classification cells, and of measures Dynamic regrouping Roll up to combine existing groups to next level Drill down to get more detail in groups at lower level Operate independently, i.e. not all parts of a classification at the same level User-defined groupings All derivation and presentation facilities Specialist browsers, available for local data or over the Internet 5/13/2019 Meta-data & Functionality

13 Discovery through Meta-data
Generic descriptions Population, Classifications, Measures linked to concept definitions for searching Specific topics Formal definitions of standard components selection rules, standard classifications, measure types Specific descriptions of substantive content source variable definitions, questionnaire structure, etc. Accessibility Information must be available to search engines and user 5/13/2019 Meta-data & Functionality

14 Meta-data & Functionality
Conclusions Good analysis of structural and functionality requirements can produce good products for automated and individual use Further academic work on structures and functionality needed Commercial products are useful but lack many obvious features - we should demand more Commercially driven standards concentrate on basic functionality and overlook statistical and practical validity - we should get more involved 5/13/2019 Meta-data & Functionality


Download ppt "Dissemination and use of aggregate data: structures and functionality"

Similar presentations


Ads by Google