Download presentation
Presentation is loading. Please wait.
Published byRandolf Andrew Hawkins Modified over 6 years ago
1
"Production of social statistics… goes social!"
J.Grazzini & P.Lamarche
2
i. rationale ii. framework iii. implementation iv. buzzwords v. voilà
agenda agenda i. rationale ii. framework iii. implementation iv. buzzwords v. voilà i. rationale ii. framework iii. implementation iv. buzzwords v voilà
3
An example of actual production environment
Low-level implementation: all in-house! raw microdata validation microdata integration indicator calculation aggregates' estimation anonymisation and production of User Database management of requests (e.g., ad-hoc extraction/estimation) High-level vision (CSPA, GSBPM, …): not much… Heavy legacy: little genericity, ad-hoc development, sparse documentation, … High complex-ity/-ification: acts like a black-box… Still: operational!
4
immediate needs/issues? routine operations? ad-hoc requests?
"Wait and see" approach ESS SERV initiative immediate needs/issues? routine operations? ad-hoc requests?
5
Our approach (1/2) Objective : Model: Vision:
start small (not from scratch however) build quickly and continuously develop a library of validated statistical methods and IT components gain experience from deployment Model: open, shared, and collaborative agile and flexible Vision: users/producers shall become "produsers" knowledge community
6
Practical organisation:
Our approach (2/2) Practical organisation: focus on "performing layer" provide guidelines and best practices any skilled person can modify the code to suit own needs, learn from its use and contribute to its improvement Flexible design and continuous implementation: support whole production cycle solve immediate ad-hoc needs and legacy issues
7
i. rationale ii. framework iii. implementation iv. buzzwords v. voilà
agenda agenda i. rationale ii. framework iii. implementation iv. buzzwords v voilà
8
Hybrid "design" top-down: Focus on overall workflow and generic processes instead of single tasks bottom-up: Incorporate considerations on specific methodological/technological aspects of the processes modular: Build modular and customisable components that encapsulate statistical methods granular: Build complex operations from simple and small parts agnostic: Release constraint on programming language
9
Open framework (1/4) Can we agree that "open algorithms" guarantee: What about software then? though susceptible to downside, "open software" are obviously preferred … but legacy proprietary software are still in prominent use "Open source code" is adopted for implementation transparency reusability reproducibility verifiability …?
10
Comprehensiveness of the statistical information, e.g.:
Open framework (2/4) Comprehensiveness of the statistical information, e.g.: what (method/technique) is actually implemented? is that done consistently? black-box? example: quantile, Gini, aggregates… Be aware of methodological choices!
11
Control of the statistical implementation, e.g.:
Open framework (3/4) Control of the statistical implementation, e.g.: what is the default configuration? how is the operation set up? what are the ad-hoc parameters? example: quantile implementations Review configuration/parameters settings!
12
Maintenance (continuity and/or migration) of operations, e.g.:
Open framework (4/4) Maintenance (continuity and/or migration) of operations, e.g.: did we keep a track of producer's actions? can we regenerate the outputs (time)? is the consistency of the outputs guaranteed (platform)? example: sampling Deal with technological constraints!
13
i. rationale ii. framework iii. implementation iv. buzzwords v. voilà
agenda agenda i. rationale ii. framework iii. implementation iv. buzzwords v voilà
14
What we practically aim at
low-level black-box low-level modules data programs projects documentation modular and customisable processes IT modules statistical modules documented, tested, exemplified source code organised data workflow guarantee robustness of statistical processes help maintain and update, prepare migration build a community of produsers high-level model web-services (SOA)?
15
https://gjacopo.github.io/PING/
Walk the talk documented: enhances reproducibility, supports multiple platforms, enforces quality assurance versioned: helps at differentiating between configurations used in production and in development tested: guarantee reliability and prepares future migration exemplified: supports sharing and reuse of modules/processes
16
Judgmental choices/arbitrary decisions?
There is no one-to-one matching between statistical processes and IT modules IT modules : trade-off between statistical processes: degree to which modules are configured and parameterised? Preferred for deployment of simple and small IT modules in which the statistical processes are encapsulated scope of application (flexibility) efficiency ease of application (reusability) simplicity
17
https://gjacopo.github.io/quantile/
Proof of concept software-agnostic: traditional quantile estimation technique is implemented robustly on different platforms controlled: parameters are not ad-hoc anymore but are reviewed to correspond to state-of-the-art literature transparent: a quick & dirty web-app provides plug & play quantile estimation service to user who can focus on methods (rather than underlying technology)
18
i. rationale ii. framework iii. implementation iv. buzzwords v. voilà
agenda agenda i. rationale ii. framework iii. implementation iv. buzzwords v voilà
19
BD provider infrastructure Statistical office infrastructure
Scenarios for inclusion of BD sources in statistical production – IT infrastructure perspective (1/2) BD provider infrastructure Statistical office infrastructure Statistical production process Transfer / Synchronisation BD source BD source A Final statistical product different scenarios depending on location of data and algorithms Reduction (filtering aggregation…) Statistical production process Transfer BD source A A Final statistical product Reduced BD source Reduced BD source Statistical production process BD source Transfer A Final statistical product Final statistical product A = algorithmic & methodological knowledge courtesy of J.Gaffuri (DG ESTAT)
20
No knowledge of the runtime target
Scenarios for inclusion of BD sources in statistical production – IT infrastructure perspective (2/2) No knowledge of the runtime target algorithms to be controlled by the statistical office, even when ran on data provider side? acquisition and transfer/synchronisation of raw data? processing model: sequential? parallel? single-server multi-process? distributed (multi-server) cluster processing? database-specific implementations: distributed file system? cluster based? Fast evolving technology existence of adapted infrastructure? which one(s)? tomorrow?
21
core expertise … dumb followers, late adopters, …
Where to put our effort? Traditional statistics Smart statistics Environment core expertise Mobility and transportation Quality of life … dumb followers, late adopters, …
22
i. rationale ii. framework iii. implementation iv. buzzwords v. voilà
agenda agenda i. rationale ii. framework iii. implementation iv. buzzwords v voilà
23
More keywords Open and documented Reproducible and reusable Tested and verifiable Software-agnostic Modular and granular
24
Current status and future plans
methodology: Currently, specific statistical production processes are still being incorporated within this framework, while shared IT modules are being developed. implementation: The former becomes more attractive the more standard IT modules are available to support it. The benefit of the latter increases with each process that is designed to use it. strategy: Exploring produser sharing solutions to further reduce risk and cost of testing and audit
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.