National Science Foundation Cooperative Agreement: OCI-0940841 Reagan Moore, PI Mary Whitton, Project Manager.

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Network Systems Sales LLC
A Very Brief Introduction to iRODS
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Coping with Electronic Records Setting Standards for Private Sector E-records Retention.
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
Extracting and Ingesting DDI Metadata and Digital Objects from a Data Archive into the iRODS extension of the NARA TPAP Using the OAI-PMH J. Ward, A. de.
A Community Approach to Preservation: “Experiences with Social Science Data” Community Approaches to Digital Preservation 2009 Jonathan Crabtree February.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Developing a Records & Information Retention & Disposition Program:
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Kevin L. Glick Electronic Records Archivist Manuscripts and Archives Yale University ECURE Arizona State University March 2, 2005 Fedora and the Preservation.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
ITS Offsite Workshop 2002 PolyU IT Security Policy PolyU IT/Computer Systems Security Policy (SSP) By Ken Chung Senior Computing Officer Information Technology.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
Session 6: Data Integrity and Inspection of e-Clinical Computerized Systems May 15, 2011 | Beijing, China Kim Nitahara Principal Consultant and CEO META.
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Auditing Logical Access in a Network Environment Presented By, Eric Booker and Mark Ren New York State Comptroller’s Office Network Security Unit.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
Lesson 8-Information Security Process. Overview Introducing information security process. Conducting an assessment. Developing a policy. Implementing.
Security Baseline. Definition A preliminary assessment of a newly implemented system Serves as a starting point to measure changes in configurations and.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Asset & Security Management Chapter 9. IT Asset Management (ITAM) Is the process of tracking information about technology assets through the entire asset.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Module 9 Configuring Messaging Policy and Compliance.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
Lesson 9-Information Security Best Practices. Overview Understanding administrative security. Security project plans. Understanding technical security.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
1 Chapter Nine Conducting the IT Audit Lecture Outline Audit Standards IT Audit Life Cycle Four Main Types of IT Audits Using COBIT to Perform an Audit.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
NOAA/NESDIS/National Oceanographic Data Center Following the Flow of Two Underway Data Streams Within the U. S. National Oceanographic Data Center Steven.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Audit Trail LIS 4776 Advanced Health Informatics Week 14
DataNet Collaboration
An Overview of Data-PASS Shared Catalog
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Managing the IT Function
Storage & Digital Asset Management CIO Council Update
Sophia Lafferty-hess | research data manager
Odum Institute iRODS Policies to Support Preservation
Fedora and the Preservation of University Records ECURE
Research Data Management
Technical Issues in Sustainability
Robin Dale RLG OAIS Functionality Robin Dale RLG
Presentation transcript:

National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager

Policy Topics Policy-based Data Management Practical Policy Working Group outcomes – Data Center policies Applications – DataNet Federation Consortium analyzed 175 policies for Data sharing(research collaborations) SILS Digital library (personal collections) RDA Practical Policy(data centers) UNC-CH Protected data (secure medical workspace) Odum/Dataverse(archive) NSF data management plans(publication) – Science Observatory Network (real-time sensor data) – PECE/RPI (anthropology) – NOAA NCDC (archive)

National Science Foundation Cooperative Agreement: OCI Policy-based Data Management

Summary of the Problem Practical Policy Assertion or assurance that is enforced about a (data) collection (data set, digital object, file) by the creators of the collection Computer actionable policies are used to  enforce data management  automate administrative tasks  validate compliance with assessment criteria  automate scientific data processing and analyses Users motivated by issues related to scale, distribution

National Science Foundation Cooperative Agreement: OCI Practical Policy Working Group

Practical Policy members represented – 11 types of data management systems – 30 institutions – 2 testbeds iRODS Renaissance Computing Institute, DataNet Federation Consortium – DFC GPFS Institute of Physics of the Academy of Sciences, CESNET Garching Computing Centre – RZG Published two documents – Moore, R., R. Stotzka, C. Cacciari, P. Benedikt, “Practical Policy Templates” February, 2015, – Moore, R., R. Stotzka, C. Cacciari, P. Benedikt, “Practical Policy Implementations”, February, 2015, Policy Templates

Data Center Policies Contextual metadata extraction – Automate extraction of metadata from files Data access control – Automate application of appropriate access contrls Data backup – Automate creation of replicas Data format control – Automate identification of data format Data retention – Apply a retention period Disposition – Apply a disposition policy at end of retention period INLS 624 7

Data Center Policies Integrity (including replication) – Verify integrity and replace bad copies Notification – Manage events about changes to the collection Restricted searching – Manage searches on collection Storage cost reports – Generate cost report Use agreements – Manage use agreements before data are retrieved INLS 624 8

National Science Foundation Cooperative Agreement: OCI Digital Library Management

LifeTime Library Policies Requirements – Enable students to create a personal digital collection – Provide pedagogy mechanisms for experimenting with: Naming- File names Arrangement- Organization in collections Description- Tags and metadata Access controls- Sharing and publication Ingestion- Controlled loading of data Distribution- Storage locations INLS

Student Experiences Students invariably: – Changed their minds about the purpose of the collection – Changed their minds about the description Term definitions tended to drift over the semester – Changed their minds about the arrangement Added new collections for additional types of data Resulting collections had: – 1,000 – 10,000 files – 2 Gigabytes to 150 Gigabytes in size – 4-10 metadata attributes per file INLS

National Science Foundation Cooperative Agreement: OCI Protected Data

Protected Data Management UNC-CH has published an administrator’s guide for the management of protected data. This includes: – PIIPersonally Identifiable Information – PHIProtected Health Information – PCIPayment Card Industry information The question is whether each of the tasks specified in the guide can be automated as policies enforced by the data grid. See Chapter 6 of the Policy Examples Workbook – This specifies 51 tasks that should be managed by the administrator

Protected Data Tasks 1 Check for presence of PII on ingestion 2 Check for viruses on ingestion 3 Check passwords for required attributes 4 Encrypt data on ingestion 5 Encrypt data transfers 6 Federation - control data copies (access control) 7 Federation - manage remote data grid interactions (update rule base) 8 Federation - periodically copy data 9 Federation- manage data retrieval (update access controls) 10 Generate checksum on ingestion 11 Generate report of corrections to data sets or access controls 12 Generate report for cost (time) required to audit events 13 Generate report of types of protected assets present within a collection 14 Generate report of all security and corruption events 15 Generate report of the policies that are applied to the collections 16 List all storage systems being used 17 List persons who can access a collection INLS

Protected Data Tasks 18List staff by position and required training courses 19List versions of technology that are being used 20Maintain document on independent assessment of software 21Maintain log of all software changes, OS upgrades 22Maintain log of disclosures 23Maintain password history on user name 24Parse event trail for all accessed systems 25Parse event trail for all persons accessing collection 26Parse event trail for all unsuccessful attempts to access data 27Parse event trail for changes to policies 28Parse event trail for inactivity 29Parse event trail for updates to rule bases 30Parse event trail to correlate data accesses with client actions 31Provide test environment to verify policies on new systems 32Provide test system for evaluating a recovery procedure 33Provide training courses for users 34Replicate data sets on ingestion INLS

Protected Data Tasks 35 Replicate iCAT periodically 36 Set access approval flag 37 Set access controls 38 Set access restriction until approval flag is set 39 Set approval flag per collection for enabling bulk download 40 Set asset protection classifier for data sets based on type of PII 41 Set flag for whether tickets can be used on files in a collection 42 Set lockout flag and period on user name - counting number of tries 43 Set password update flag on user name 44 Set retention period for data reviews 45 Set retention period on ingestion 46 Track systems by type (server, laptop, router,….) 47 Verify approval flags within a collection 48 Verify files have not been corrupted 49 Verify presence of required replicas 50 Verify that no controlled data collections have public or anonymous access 51 Verify that protected assets have been encrypted INLS

Task Automation There are some unifying requirements across tasks: – Checking material for PII, viruses – Management of passwords – Generation of log files for all actions done – Creation of state information to track processes – Management of encryption – Management of access controls – Generation of audit trails – Parsing of events to demonstrate compliance over time – Verification that processes were correctly applied Many of these requirements can also be applied to digital libraries and research collaborations INLS

National Science Foundation Cooperative Agreement: OCI Preservation

Cross-Disciplinary Data Discovery and Geographically Distributed Preservation DFC April 2013 NSF Review Slide 19

Archive Policies The Dataverse network has about 800 GigaBytes of data that may contain protected information. An archive is needed with independent management of the material to ensure recovery in the case of a disaster. – Digital objects and provenance metadata must be re- loadable into Dataverse. – Assessment criteria need to be evaluated to verify integrity. – Access controls must be enforced on restricted data. – Dataverse naming convention must be retained. Approach is to replicate the data holdings into an iRODS data grid. INLS

Policies See chapter 5 of the Policy Examples Workbook – Odum preservation policies Preservation tasks include: – Staging files between Dataverse and iRODS – Checking data for presence of protected information – Periodic verification of integrity and replicas – Verification of access controls – Reports on usage statistics INLS

National Science Foundation Cooperative Agreement: OCI NSF Data Management Plans

The National Science Foundation has mandated that every project provide a 2-page description of how data will be managed. Each NSF directorate published guidelines on what the data management should include. An analysis of 12 sets of requirements identified 38 data management tasks that could be automated See Chapter 7 of Policy Template Workbook INLS

NSF DMP Requirements INLS

NSF DMP Requirements INLS

National Science Foundation Cooperative Agreement: OCI Science Observatory Network

Real-Time Sensor Data Harvest sensor data from the Antelope Real Time Sensor orb. – Manages environmental, oceanic, seismic data – More that 3,000 sensors across the US Register each sensor as an independent collection – Retrieve the most recent sensor data – Harvest sensor data periodically – Transform to JSON, netCDF – Provide access to archived data

National Science Foundation Cooperative Agreement: OCI PECE / RPI

Collection Management Policies Contextual metadata extraction Data access control Data backup Data format control Data retention Disposition Integrity (including replication) Notification Restricted searching Storage cost reports Use agreements INLS

National Science Foundation Cooperative Agreement: OCI NOAA NCDC

NOAA Climatic Data Center Manages an archive of climate data records received from multiple sources – Uses a staging area to Check input data for viruses Manage ingestion into a tape archive Challenges – Needed a way to improve security Eliminate direct access to storage within the NOAA firewall – Needed a way to automate management of each file Verify archival storage before file is deleted

ftp1 ftp4 ftp2 ftp5 ingest1 ingest2 Tape Disk Cache HDSS DMZ Landing Zone: Open for data delivery DMZ Firewall NCDC External Firewall FTP Load Balance ftp3 External Providers FTP/FTPS NCDC Internal Network FTP PUSH/PULL ftp iRODS Secure Ingest iRODS DMZ Grid /DMZ /Archive /NR2 /NR3 iRODS NCDC Grid /NCDC /NR2 /Ingest /NR3 /NR2 /Archive /NR3 iRODS is: Secure authentication Security via Obscurity (one to bind them) Uses a pull mechanism to move data into NCDC grid A virtual management tool (clean-up) Scope is entire grid iRODS

National Science Foundation Cooperative Agreement: OCI Policy Examples Workbook Policy Templates Workbook