Adaptive Protocols and the DDI 4 Process Model Jay Greenfield Booz Allen Hamilton NSF-Census Research Network Meeting Spring 2014 Suitland, MD.

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

Database Planning, Design, and Administration
Chapter 13 Review Questions
Interoperability of Distributed Component Systems Bryan Bentz, Jason Hayden, Upsorn Praphamontripong, Paul Vandal.
Semantic Web Services Peter Bartalos. 2 Dr. Jorge Cardoso and Dr. Amit Sheth
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
Chapter 3 Database Management
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Components and Architecture CS 543 – Data Warehousing.
ICS (072)Database Systems Background Review 1 Database Systems Background Review Dr. Muhammad Shafique.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Reducing Metadata Objects Dan Gillman November 14, 2014.
Lecture Nine Database Planning, Design, and Administration
Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Overview of the Database Development Process
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
An Introduction to Software Architecture
Database System Concepts and Architecture
Agent Model for Interaction with Semantic Web Services Ivo Mihailovic.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
Process Modeling NADDI Sprint March. Background Prior to the NADDI 2014 Sprint a design paper on the process model was developeddesign paper It.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
CHAPTER TEN AUTHORING.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
DataBase Management System What is DBMS Purpose of DBMS Data Abstraction Data Definition Language Data Manipulation Language Data Models Data Keys Relationships.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Chapter 1 Introduction to Databases Transparencies.
SCORM Course Meta-data 3 major components: Content Aggregation Meta-data –context specific data describing the packaged course SCO Meta-data –context independent.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
SDMX IT Tools Introduction
Dr. Rebhi S. Baraka Advanced Topics in Information Technology (SICT 4310) Department of Computer Science Faculty of Information Technology.
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
EbXML Semantic Content Management Mark Crawford Logistics Management Institute
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Part of the Cronos Group 4C/kZen 4 th EcoTerm meeting, Vienna, April 18, 2007 Jef Vanbockryck Research & Development “Risk Assessment ontologies and data.
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
ISWG / SIF / GEOSS OOSSIW - November, 2008 GEOSS “Interoperability” Steven F. Browdy (ISWG, SIF, SCC)
Advanced Databases COMP3017 Dr Nicholas Gibbins
V7 Foundation Series Vignette Education Services.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
Metadata Driven Clinical Data Integration – Integral to Clinical Analytics April 11, 2016 Kalyan Gopalakrishnan, Priya Shetty Intelent Inc. Sudeep Pattnaik,
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
Web Ontology Language for Service (OWL-S)
Data Model.
An Introduction to Software Architecture
Session 2: Metadata and Catalogues
Metadata in Digital Preservation: Setting the Scene
Chapter 1 Database Systems
Presentation transcript:

Adaptive Protocols and the DDI 4 Process Model Jay Greenfield Booz Allen Hamilton NSF-Census Research Network Meeting Spring 2014 Suitland, MD

Agenda Describe several uses of an emerging process model that is part of the future DDI 4 specificationDDI 4 specification The uses we will cover today include: – The representation and execution of multi-mode contact procedures – The representation and execution of data transformation pipelines – Putting the two together in a process model able to both represent and facilitate the execution of data driven adaptive protocols 2

Disclaimer 3 GSIM 1.1 is a mature specification. So is OWL-S. So is DDI-3 which uses OWL-S. However, DDI 4 is the the early stages of development. While it integrates with mature specifications, it is very much a work in progress. Many of its details remain to be finalized.

Background: What is DDI 4? It is the next major version of the DDI specification Instead of being XML-based it is information model based Being model-based enables DDI 4 to more easily: – Interact with other UML models including GSIMGSIM – Modularize the specification by resolving it into a set of more or less specialized, extensible functional views – Create functional views that support the use of heterogeneous data sources (including auxiliary data, process data and registries) across the data lifecycle 4

Background: What is GSIM? Modernization of statistics requires: – Reuse and sharing of methods, components, processes and data repositories – Definition of a shared “plug-and-play” modular component architecture The Generic Statistical Business Process Model (GSBPM) will help determine which components are required GSIM will help to specify the interfaces 5 From What is GSIM? at GSIM 1.1 portalGSIM 1.1 portal We need consistent information

Background: What are DDI 4 Functional Views? The primary publications of the DDI Alliance are functional views From these views various artifacts including XML schemas and RDF vocabularies are generated 6

Our use case: meet the players… 7 Application Layer Information Model GLBPM GSBPM

GSBPM 8 From Introducing the GSBPM at the GSBPM portalGSBPM portal

In the adaptive protocol use case… The relationship between Collect and Process is non-linear: – We use administrative and other data collected and processed previously together with almost real time processing of current data to guide data collection now – Collect can depend on Process and vice versa Although GSBPM and the business processes that GSBPM represents were not intended to form a linear model, the systems that we Build behave by and large in a linear way 9

GLBPM 10 From: Generic Longitudinal Business Process ModelGeneric Longitudinal Business Process Model

Because the L in GLBPM is longitudinal, GLBPM thinks in circles And the systems we Build to support longitudinal studies are formed around an Archive There are many lessons learned from ordering processes around archives and archival services 11 GLBPM

Meet the Information Model we will use to design, build and collect the adaptive protocol use case… 12 DDI specializes OWL-S atomic processes with: OWL-S implements GSIM GSIM provides core conceptual model Contact procedures Data transformation objects Process Atomic Process Composite Process Atomic Process Business Process Process Design Process Control Process Step human readable machine readable

Using GSIM, how might we represent an adaptive design? Imagine that we want to use the current representativeness of study participants to decide if, for a specific individual, we want to engage in a refusal conversion In such a design we would conditionally attempt to convert based on the calculation of a statistic Let’s first examine how GSIM might represent the calculation of this statistic… 13

The design of a calculation… Following FDA guidelines for adaptive designs we might undertake “analyses of the accumulating study data…at prospectively planned time points within the study” This might entail comparisons over time of historical demographic data with current demographic data In the process we might want to integrate data (GSBPM 5.1) and calculate aggregates (GSBPM 5.7) 14

GSIM Process Step (Top Level) 15 FDA Adaptive Design Guidelines GSBPM 5.1, 5.7 ???

A tale of two process designs and the case for data federation 16 From Best Practices in SAS Management for Big DataBest Practices in SAS Management for Big Data

A tale of two process designs and the case for data federation 17 From Best Practices in SAS Management for Big DataBest Practices in SAS Management for Big Data D ATA IS TOO SENSITIVE : ORGANIZATIONS DON ’ T WANT TO PROVIDE DIRECT ACCESS TO DATA SOURCES. D ATA IS TOO DIVERSE : DATA IS STORED IN MULTIPLE SOURCE SYSTEMS THAT ALL HAVE DIFFERENT SECURITY MODELS, DUPLICATE USERS AND DIFFERENT PERMISSIONS. D ATA IS TOO AD HOC : WHEN DATA IS CHANGING FREQUENTLY, CONSTANT UPDATES ARE NEEDED TO MAINTAIN INTEGRATION LOGIC. I T BECOMES DIFFICULT TO MAKE A REPEATABLE INTEGRATION PROCESS, ESPECIALLY IF THERE ARE MANY DATA INTEGRATION APPLICATIONS THAT NEED ACCESS TO THE SAME DATA. D ATA IS TOO SENSITIVE : ORGANIZATIONS DON ’ T WANT TO PROVIDE DIRECT ACCESS TO DATA SOURCES. D ATA IS TOO DIVERSE : DATA IS STORED IN MULTIPLE SOURCE SYSTEMS THAT ALL HAVE DIFFERENT SECURITY MODELS, DUPLICATE USERS AND DIFFERENT PERMISSIONS. D ATA IS TOO AD HOC : WHEN DATA IS CHANGING FREQUENTLY, CONSTANT UPDATES ARE NEEDED TO MAINTAIN INTEGRATION LOGIC. I T BECOMES DIFFICULT TO MAKE A REPEATABLE INTEGRATION PROCESS, ESPECIALLY IF THERE ARE MANY DATA INTEGRATION APPLICATIONS THAT NEED ACCESS TO THE SAME DATA.

Our process design is… …a data federation data lake that collects everything and supports dive in anywhere, flexible access data processing The data lake is, in turn, traversed in a series of sub-steps that together form a map reduce algorithm for numerical summarization The algorithm both integrates and aggregates… 18

Our process design is… …a data federation data lake that collects everything and supports dive in anywhere, flexible access data processing The data lake is, in turn, traversed in a series of sub-steps that together form a map reduce algorithm for numerical summarization The algorithm both integrates and aggregates… 19

GSIM Process Step (Top Level) 20 GSBPM 5.1, 5.7 Data Lake Numerical summarization map reduce algorithm FDA Adaptive Design Guidelines

Enter OWL-S… 21 From OWL-S: Semantic Markup for Web ServicesOWL-S: Semantic Markup for Web Services

Enter OWL-S… Previously, in DDI 3, OWL-S information objects were utilized to specify control constructs and the skip logic of questionnaires OWL-S did this with such specificity that software connectors were developed by the Australian Bureau of Statistics and others that take DDI 3 OWL-S as input and code complex instruments in Blaise as well as other survey systemsAustralian Bureau of Statisticsothers 22

Enter OWL-S… Now, in DDI 4, OWL-S information objects are about to be put into play alongside GSIM that can specify both parallel processing and sequential processing OWL-S describes parallel processing in enough detail that, in the adaptive design use case, it will be able to assist in the automatic production of map reduce algorithms for constructing numerical summarizations 23

Enter OWL-S… Also, OWL-S describes sequential processing in enough detail that, in the adaptive design use case, it will be able to assist in the automatic production of schedulers that conditionally determine and produce next actions – such as refusal conversion – in multi-mode contact procedures 24

Enter OWL-S… Finally, using OWL-S, it is possible to define a service profile for each process In conjunction with a software agent, the service profile enables: – Automatic web service discovery – Automatic web service invocation – Automatic web service composition and interoperation In this context a software agent might be a Web Service Business Process Execution Language (WSBPEL) programWeb Service Business Process Execution Language 25

In summary in DDI 4… 26 DDI specializes OWL-S atomic processes with: OWL-S implements GSIM GSIM provides core conceptual model Contact procedures Data transformation objects Process Atomic Process Composite Process Atomic Process Business Process Process Design Process Control Process Step human readable machine readable

27

BIG DATA PRIMER Appendix 28

Big Data Primer (1) 29 From The Data Lake: Turning Big Data Into OpportunityThe Data Lake: Turning Big Data Into Opportunity

Big Data Primer (1) MapReduce work in conjunction with a file system The data lake is this file system – One metaphor for the data lake might be a giant collection grid, like a spreadsheet – one with billions of rows and billions of columns available to hold data – Each cell of the grid contains a piece of data – Cells might contain names, photographs, incident reports, Twitter feeds, OAIS submission information packages – anything and everythingOAIS submission information packages 30

Big Data Primer (2) The image of the grid helps describe the difference between data mining and the data lake With the data lake there is definitely an ingest process – During the ingest process certain details called “metadata” are added so that the basic information can be quickly located and identified 31

Big Data Primer (3) – These metadata tags serve the same purpose as old- style card catalogues which allow a reader to find a book by searching the author, title or subject A MapReduce job uses these metadata tags first to retrieve the appropriate data from the data lake Next MapReduce maps each piece of information to a processor, assigning the piece of information a key – K1 – one for each processor 32

Big Data Primer (4) Next the processor runs a user-defined program on the many pieces of information it receives – During processing it very likely shreds each piece of information into other units, producing for each initial key/value pair one or more additional key/value pairs – So the map process looks like this: map(k1, v1) -> list(k2, v2) Finally the reduce() function is applied to all the lists, aggregating them, and return the value v3 33

Data Lake MapReduce Pipeline Ingest into Data Lake Metadata tag for retrieval Invoke MapReduce MapReduce retrieves data from data lake Map() assigns data to a processor Processor processes data Reduce() aggregates data across processors 34

Big Data Primer References The Data Lake: Turning Big Data Into Opportunity The Data Lake: Turning Big Data Into Opportunity The Data Lake: Taking Big Data Beyond the Cloud The Data Lake: Taking Big Data Beyond the Cloud NoSQL Data Modeling Techniques Note that this reference is of interest for those who want to work through the problem of how data structures might be described and annotated using metadata tags in a data lake environment where on the surface the lake consists of just a big bag of key/value pairs 35