Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Chapter 10: Designing Databases
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
OASIS Reference Model for Service Oriented Architecture 1.0
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
--What is a Database--1 What is a database What is a Database.
Data Management I DBMS Relational Systems. Overview u Introduction u DBMS –components –types u Relational Model –characteristics –implementation u Physical.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Short Course on Introduction to Meteorological Instrumentation and Observations Techniques QA and QC Procedures Short Course on Introduction to Meteorological.
Chapter 1 Introduction to Databases
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
National Science Foundation Cooperative Agreement: OCI
Mrs. Maninder Kaur 1Maninder Kaur
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Introduction Chapter 1. Reference Book  Database Systems Thomas Connolly, Carolyn Begg, Anne Strachan Addison-Wesley 1999 ISBN:
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
National Data Infrastructure Projects EarthCube Layered Architecture (GEO) DataNet Federation Consortium (OCI) integrated Rule Oriented Data System (SDCI)
The University of Akron Dept of Business Technology Computer Information Systems DBMS Functions 2440: 180 Database Concepts Instructor: Enoch E. Damson.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Chapter 2 CIS Sungchul Hong
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Event Data History David Adams BNL Atlas Software Week December 2001.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
DIGITAL ELEVATION MODELING GEOG 421: DR. SHUNFU HU, SIUE Project One Steve Klaas Fall 2013.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
1 Chapter 1 Introduction to Databases Transparencies.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
National Science Foundation Cooperative Agreement: OCI
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
1 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data. u A user’s view is immune to changes.
1 Chapter 2 Database Environment Pearson Education © 2009.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Postgraduate Module Enterprise Database Systems Technological Educational Institution of Larisa in collaboration with Staffordshire University Larisa
Chapter 1 Overview of Databases and Transaction Processing.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Introduction for the Implementation of Software Configuration Management I thought I knew it all !
Databases and Database Users
DataNet Federation Consortium
Policy-Based Data Management integrated Rule Oriented Data System
COIT20235 Business Process Modelling
Introduction to Database Systems
Data Base System Lecture : Database Environment
Digital Object Interface Protocol (DOIP)
Data Model.
Metadata in Digital Preservation: Setting the Scene
Metadata The metadata contains
Technical Issues in Sustainability
Presentation transcript:

Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore

2  Consider a hydrologist who needs to:  Acquire data sets needed for research  Execute an analysis  Save the research results  Enable another hydrologist to re-execute the analysis  Embed the goal of data discovery, access, analysis and management in the larger context of Reproducible data-driven research  Where did the data come from?  How was the data created?  How was the data managed? Mapping Terminology to Use Cases

3  There is a duality between:  Procedures that generate data objects  Data objects generated by a procedure  Terminology is needed that describes:  Operations executed by a researcher to create data objects  Operations executed by a repository to manage data objects Concepts

4 Eco-Hydrology Choose gauge or outlet (HIS) Extract drainage area (NHDPlus) Digital Elevation Model (DEM) Worldfile Flowtable RHESSys Slope Aspect Streams (NHD) Roads (DOT) Strata Hillslope Patch Basin Stream network Nested watershed structure Land Use Leaf Area Index Phenology Soil Data NLCD (EPA) Landsat TM MODIS USDA Soil and vegetation parameter files RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state.

5  Researcher operations  Pick the location of a stream gauge and a date  Access USGS data sets to determine the watershed that surrounds the stream gauge  Access USDA for soils data for the watershed  Access NASA for LandSat data  Access NOAA for precipitation data  Access USDOT for roads and dams  Project each data set to the region of interest  Generate the appropriate environment variables  Conduct the watershed analysis  Store the workflow, the input files, and the results  Data Repository management operations  Authenticate the user  Authorize the deposition  Add a retention period  Extract descriptive metadata  Record provenance information  Log the event  Create derived data products (image thumbnails)  Add access controls (collection sticky bits)  Verify checksum  Version  Replicate  Index  Choose a storage location  Choose the physical path name Researcher operations vs Repository operations

6  DataBits (0s and 1s)  Digital objectNamed bits  Data objectNamed bits plus representation object  Representation objectContext containing provenance, description, structural, and administrative information  OperationsData manipulation function  WorkflowSet of chained operations  Workflow objectText file listing the chained operations Concepts needed for Reproducible research

7  X An operation on a digital entity involves the following elements:  EntityID: the identifier of the digital entity requesting invocation of the operation;  TargetEntityID: the identifier of the digital entity to be operated upon;  OperationID: the identifier that specifies the operation to be performed;  Input: a sequence of bits containing the input to the operation, including any parameters, content or other information; and  Output: a sequence of bits containing the output of the operation, including any content or other information.  Challenge is how to characterize the response of the data management system to a requested operation. The repository may authenticate and authorize, modify state information, log information, add retention, …  Pre-process workflow that controls the input (access control, error checking, logging)  Operation  Post-process workflow that controls the output (changes to state information, audits) Definition of operation

8  Access a known repository. The researcher has an explicit repository in mind for each data set  Query the repository for data sets that satisfy spatial/temporal relationships  Either  get a list of identifiers, retrieve the data sets, and apply a data subsetting algorithm locally  Or apply the data subsetting algorithm at the remote repository  Name the local data subset for processing within the research workflow. This can be  a local collection name  or a global persistent identifier. Data Access Steps

9 Interactions with collections: Remote metadata catalog and Remote data repository DataONE Model: User queries remote MD repository using spatial/temporal parameters Related Metadata for Data Sets Remote MD catalog Repository sends identifiers & MD for files that satisfy spatial/ temporal requirements User OPeNDAP Model: User queries remote data repository using spatial/temporal parameters for desired physical variables Data Collection Data Sets Remote Data repository ` Desired data sets are generated by remote data repository and returned to user Remote Data repository ` User retrieves files using the identifiers Data Collection Data Sets Local Data repository ` Data Collection Data Sets Local Data repository `

10 Policy-Based Data Management  Purpose - reason a collection is assembled  Properties - attributes needed to ensure the purpose  Policies - enforce and maintain collection properties  Procedures - functions that implement the policies  Persistent state information - results of applying procedures  Property assessment criteria – validation that state information conforms to the desired purpose  Federation - controlled sharing of logical name spaces Policy: Assertion or assurance that is enforced about a collection or a dataset

11 Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Control s Updates Client Action Periodic Assessment Criteria Policy Policy Enforcement Point Workflow Invokes Has SubType Isa Function Chains Operation Isa Persistent State Information Persistent State Information Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa GetUserACL SetDataType SetQuota DataObjRepl SysChksumDataObj Isa DATA_ID DATA_REPL_NUM DATA_CHECKSUM Isa HasFeature Policy Concept Graph