Download presentation
Presentation is loading. Please wait.
1
Statistics Canada and CSPA
Confidentiality, Picasso, and Other Activities Robert McLellan Chief Enterprise Architect June, 2016 Translation – I believe a related version of much of this material was translated for a course in Sept I will try to track it down.
2
Purpose of this presentation
Why do we care about CSPA? Summarize what we have learned from our CSPA experiences to date Share what we are currently working on Pose some interesting questions for Friday’s session Methodology and Metadata Architecture and IT Statistics Canada • Statistique Canada
3
Agency Context – Outcomes
Deliver the ongoing statistical program in conformity with Statistics Canada’s Quality Assurance Framework Respond to the emerging and evolving information needs of data users and stakeholders Operate a responsive program that effectively satisfies ad hoc statistical requests on a cost-recovery basis Enhance the efficiency, responsiveness and robustness of the Agency’s operations Innovate through staff engagement, adoption of alternative information sources – e.g. administrative data, satellite imagery, private sector “big data” “Deliverology” support – outcome-driven government mandates Source: Statistics Canada Report on Plans and Priorities, , various National Media Statistics Canada • Statistique Canada
4
Statistics Canada • Statistique Canada
Flexibility, Agility, Stability, Quality, Efficiency, Change, Innovation Statistics Canada • Statistique Canada
5
Statistics Canada • Statistique Canada
IT Questions How do you deliver transformational projects (with agility), while supporting on-going IT activities for programs (stability)? How do you successfully refactor your IT solutions to deliver platform-based services with economies of scale? How should you embrace the opportunities of big data, data science, and innovation? What are the opportunities of cloud – IaaS, PaaS, SaaS – and how can you benefit from them? How should we work with “citizen developers”? Statistics Canada • Statistique Canada
6
Statistical metadata and data management
Statistics Canada’s Picasso Project – CSPA Compliant Statistical metadata and data management Statistics Canada • Statistique Canada
7
Statistics Canada • Statistique Canada
Picasso Project Statistical Metadata and Data Management IMDB Replacement (legacy) – ISO11179 schema Data Service Centre functionality – CBA Transformation Information Management EDRMS linkage Search, Discover, Navigate, Register, Report, Innovate Aggregator / Linkage focus – e.g. National Accounts, Record Linkage Linked Data and Metadata Oracle 12c RDF graph database URI-linked data sets, documents StatCan SOA integration Metadata services (CSPA compliant) – QSCV… Data Asset management services – ILCM linkage Oracle BPM Suite workflow, Oracle Service Bus Statistical Functions Services, Statistical Entity Services roadmaps Statistics Canada • Statistique Canada
8
CSPA Compliance – Picasso Project
BA decision principles (most) Capitalize on and influence national and international developments Deliver enterprise-wide benefits Increase the value of our statistical assets Maximize the use of existing data/Minimize respondent load IA principles (all) Manage information as an asset Manage the information lifecycle Protect information appropriately Use agreed models and standards Capture information as early as possible Describe to ensure reuse Ensure there is an authoritative source Preserve information input into Statistical Services Describe information by metadata BA design principles (most) Re-use existing before designing new Design new for re-use and easy assembly Processes are metadata driven Adopt available standards Enable discoverability and accessibility Application design principles (all): Maintain independence between design and implementation Use available standards Use architecture patterns Implement using GSIM (modulo some renaming) Minimize coupling Maximize Service Autonomy Include non functional requirements (in progress) We also follow CSPA approach to specifications by having three levels: conceptual, logical and physical. Statistics Canada • Statistique Canada
9
Picasso Wireframe – Concept
Statistics Canada • Statistique Canada 11/10/2018
10
The Power of (Metadata) Visualization
Translation – the terms in the diagrams should not be translated as they are not relevant, i.e. the graphs are examples borrowed from the realm of Bioinformatics The only relevant thing here is the actual circles and lines Statistics Canada • Statistique Canada Source: Cytoscape.org
11
Statistics Canada • Statistique Canada
Picasso R1.0 – Main Search Statistics Canada • Statistique Canada 11/10/2018
12
Picasso R1.0 – Main Search Statistics Canada • Statistique Canada
Search verticals – all, ICN, IMDB, other Search bar Pop up hover panels Search results Metadata Icon Refiners – metadata types (supports faceted search) Statistics Canada • Statistique Canada 11/10/2018
13
Picasso Search Architecture
Statistics Canada • Statistique Canada
14
Metadata Core Solution Architecture
External systems access the Metadata core via Entity Services The Data Access Layer (DAL) provides access to the Picasso Registry and Repository via Common Information Exchange Models (CIEMs) for both Entity Services and Picasso RDF components.
15
Entity Roadmap (GSIM-based)
16
Common data types & controlled vocabularies - Logical
17
Common data types and controlled vocabularies – RDF/OWL
18
Bimodal IT and CSPA Collaboration
“Bimodal IT” (Gartner) – important implications for Governance, Capability Delivery, Collaboration Mode 1 – Predictability, stability – well-understood requirements through process of analysis, includes investment to open up legacy environment (“standard development”) Mode 2 – Exploratory – requirements not well-understood in advance, not enough is known about the area, future reveals itself in small pieces – agile, iterative, pilots, proofs of concept, experimental approaches What is the path from Mode 2 – new, exploratory capabilities – to Mode 1 “Production” delivery? CSPA Collaboration Extend an existing function or capability, replace, renew (mode 1) ? Identify new capability gaps, work collaboratively to experiment, explore, pilot, introduce new capabilities (mode 2) ? Questions: Where is the greatest opportunity? What is the historic reality? What is the new reality with Big Data, Data Science, agile statistics? Statistics Canada • Statistique Canada
19
Statistics Canada • Statistique Canada
Conclusion Statistics Canada continues to roll-out its service-based approach (SOA) Current users include business statistics processing, social statistics processing, Census 2016, corporate services (process automation) Statistical function and entity services are designed to be CSPA-compliant Use of standards, SOA governance incorporation of CSPA elements Production components (especially complicated ones) can take time to “prove-in” before placing them into production Especially if configuration-driven, need to establish the knowledge of how to tune, program the configurations We have a strong focus on management of statistical data and metadata in support of a wide variety of processing, analysis, and information management tasks CSPA-compliant services Future focus on data science activities Statistics Canada • Statistique Canada
20
Statistics Canada • Statistique Canada
appendix Statistics Canada • Statistique Canada
21
Confidentiality-on-the-fly
Statistics Canada • Statistique Canada
22
What business problem are we trying to solve?
The need for more: Tabular data does not meet requirements of many users Options Research Data Centers (RDC) Data Liberation Initiative (DLI) Real Time Remote Access (RTRA/G-Tab) Remote Access Custom Tabulations Constraints Requirements not met Price, time … Statistics Canada • Statistique Canada
23
Solution – Building Statistical Models
Examine the data Build a model Check the model fit Why not allow users to build statistical models directly? Problem: Cannot guarantee confidentiality of the users; Not in line with the basic premise of classical statistics; Statistics Canada • Statistique Canada
24
Confid-on-the-fly (RTRDA)
Disclosure control mechanism Allows users to build (selected classes) of statistical models Descriptive + inferential statistics Statistical models Robust Linear Regression Logistic Regression Multinomial Regression Protects confidentiality of respondents Results obtained real-time No manual checks at StatCan
25
Confid-on-the-Fly Process Flow
26
Capability Opportunity
RTRA (existing approach) Is a confid-on-the-fly solution; Data kept at StatCan; External user submits code; Code parsed, validated and executed at StatCan; Results are parsed, validated and returned back to the user; Descriptive statistics (frequencies, means, medians, proportions, percentiles, shares); RTRDA (new approach) User interacts with the solution and builds a statistical model; Interactive process; Iterative process Statistical model returned to the user Data kept at StatCan; Controls put in place to guarantee confidentiality Control the class of statistical methods available to the user; Control the model building process; Statistics Canada • Statistique Canada
27
Confid-on-the-Fly Layers
28
Extending the business capability
Class of RTRA problems is a subset of a class of Confid-on-the-fly problems: RTRA RTRDA RTRA RTRDA Descriptive statistics Inferential statistics RTRA/G-Tab Confid-On-The-Fly STATISTICS Statistics Canada • Statistique Canada
29
Configuring Confid-on-the-Fly
Elaborate configuration Configuration service Users can change configuration parameters Count and continuous perturbation tables also accessible (currently read-only). Configuration parameters in demo GUI Note the Methodologist configuration work needed to “set up” the service (sample GUI shown) Statistics Canada • Statistique Canada
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.