Policy Based Data Management Environments (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan Mike Conway Antoine de Torcy Richard Marciano Jewel Ward

Slides:



Advertisements
Similar presentations
Texas Digital Library Services Preservation Network.
Advertisements

3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Integrated Rule Oriented Data System (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan
Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural.
A Very Brief Introduction to iRODS
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.
Extracting and Ingesting DDI Metadata and Digital Objects from a Data Archive into the iRODS extension of the NARA TPAP Using the OAI-PMH J. Ward, A. de.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
National Science Foundation Cooperative Agreement: OCI
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
National Data Infrastructure Projects EarthCube Layered Architecture (GEO) DataNet Federation Consortium (OCI) integrated Rule Oriented Data System (SDCI)
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of iRODS Integrated Rule-Oriented Data System
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Presentation transcript:

Policy Based Data Management Environments (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan Mike Conway Antoine de Torcy Richard Marciano Jewel Ward

Data Management Applications Data grids - Share data – RENCI regional data grid – Teragrid – Australian Research Collaboration Service Digital Libraries - Publish data – Texas Digital Library – French National Library Persistent archives - Preserve data – Carolina Digital Repository – NARA Transcontinental Persistent Archive Prototype Data processing pipelines – analyze data

Data Processing Pipelines Preservation Environment Ocean Observatories Initiative NARA Transcontinental Persistent Archive Prototype Carolina Digital Repository Large Synoptic Survey Telescope Digital Library Texas Digital Library French National Library Data Grid Teragrid Temporal Dynamics of Learning Center Australian Research Collaboration Service Taiwan National Archive

Cloud Storage Institutional Repositories Federal Repositories Carolina Digital Repository Texas Digital Library National Climatic Data Center National Optical Astronomy Observatory

5 User w/ Client Can Search, Access, Add and Manage Data & Metadata Access distributed data with Web-based Browser or iRODS GUI or Command Line clients. Overview of iRODS Architecture iRODS Data Server Disk, Tape, etc. iRODS Metadata Catalog Track information iRODS Data System iRODS Rule Engine Tracks Policies

Shared Collection

Ocean Observatories Initiative Sensors Cloud Computing External Repositories Cloud Storage Cache Message Bus Aggregate sensor data in cache SuperComputer Event Detection Remote locations Simulations Digital LibraryArchive Clients Remote Users Data Grid Multiple Protocols

8 8 Policy-based Data Management integrated Rule Oriented Data System Purpose - reason a data collection is assembled Properties- attributes needed to ensure the purpose Policies - controls for enforcing desired properties Procedures - functions that implement the policies State information - results of applying the procedures Assessment criteria - validation that state information conforms to the desired purpose Federation - controlled sharing of logical name spaces These are the essential elements for policy-based data management

Data Life Cycle Project Collection Private Local Policy Data Grid Shared Distribution Policy Digital Library Published Description Policy Data Processing Pipeline Analyzed Service Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies for a broader community Virtualize the stages of the data life cycle through policy evolution The driving purpose changes at each stage of the data life cycle

Infrastructure Independence Storage System Storage Protocol Access Interface Policy Enforcement Points Standard Micro-services Map from the actions requested by the client to multiple policy enforcement points. Map from policy to standard micro-services. Map from micro-services to standard Posix I/O operations. Map from standard Posix I/O operations to the protocol supported by the storage system Standard I/O Operations Data Grid

Lessons Learned Scalability / Reliability / Robustness – Maintain interactivity through appropriate catalog indexing – Manage network outages at the client level – Minimize memory used Assessment / Audit trails – Query preservation metadata to validate properties – Map audit trails to standard events – Track actions from policy enforcement to procedure execution Federation / Extensibility / Ease of Use – Replicate metadata and records to a deep archive – Extensible architecture that enables software evolution – Provide simplest possible mapping from policies to record series

Level of Sophistication How many different access mechanisms are needed? How many different policy enforcement points should be checked? How many standard functions are needed to implement preservation procedures? How many preservation metadata attributes are needed to track preservation properties? How many preservation policies are needed?

Data Grid Clients (35)

Policy Enforcement Points (71) ACTION acCreateUser acDeleteUser acGetUserbyDN acTrashPolicy acAclPolicy acSetCreateConditions acDataDeletePolicy acRenameLocalZone acSetRescSchemeForCreate acRescQuotaPolicy acSetMultiReplPerResc acSetNumThreads acVacuum acSetResourceList acSetCopyNumber acVerifyChecksum acCreateUserZoneCollections acDeleteUserZoneCollections acPurgeFiles acRegisterData acGetIcatResults acSetPublicUserPolicy acCreateDefaultCollections acDeleteDefaultCollections POST-ACTION POLICY acPostProcForCreateUser acPostProcForDeleteUser acPostProcForModifyUser acPostProcForModifyUserGroup acPostProcForDelete acPostProcForCollCreate acPostProcForRmColl acPostProcForModifyAVUMetadata acPostProcForModifyCollMeta acPostProcForModifyDataObjMeta acPostProcForModifyAccessControl acPostProcForOpen acPostProcForObjRename acPostProcForCreateResource acPostProcForDeleteResource acPostProcForModifyResource acPostProcForModifyResourceGroup acPostProcForCreateToken acPostProcForDeleteToken acPostProcForFilePathReg acPostProcForGenQuery acPostProcForPut acPostProcForCopy acPostProcForCreate PRE-ACTION POLICY acPreProcForCreateUser acPreProcForDeleteUser acPreProcForModifyUser acPreProcForModifyUserGroup acChkHostAccessControl acPreProcForCollCreate acPreProcForRmColl acPreProcForModifyAVUMetadata acPreProcForModifyCollMeta acPreProcForModifyDataObjMeta acPreProcForModifyAccessControl acPreprocForDataObjOpen acPreProcForObjRename acPreProcForCreateResource acPreProcForDeleteResource acPreProcForModifyResource acPreProcForModifyResourceGroup acPreProcForCreateToken acPreProcForDeleteToken acNoChkFilePathPerm acPreProcForGenQuery acSetReServerNumProc acSetVaultPathPolicy

Rules Applied at Each Policy Enforcement Point Example Disposition Procedure composed from micro-services acPurgeFiles() { msiGetIcatTime(*Time,unix); acGetIcatResults(remove,DATA_EXPIRY < '*Time',*List); forEachExec(*List) { msiDataObjUnlink(*List,*Status); msiGetValByKey(*List,DATA_NAME,*D); msiGetValByKey(*List,COLL_NAME,*E); writeLine(stdout,Purged File *E/*D at *Time ); }

iput../src/irm.cchecks 10 policy enforcement points srbbrick14:10900:ApplyRule#116:: acChkHostAccessControl srbbrick14:10900:GotRule#117:: acChkHostAccessControl srbbrick14:10900:ApplyRule#118:: acSetPublicUserPolicy srbbrick14:10900:GotRule#119:: acSetPublicUserPolicy srbbrick14:10900:ApplyRule#120:: acAclPolicy srbbrick14:10900:GotRule#121:: acAclPolicy srbbrick14:10900:ApplyRule#122:: acSetRescSchemeForCreate srbbrick14:10900:GotRule#123:: acSetRescSchemeForCreate srbbrick14:10900:execMicroSrvc#124:: msiSetDefaultResc(demoResc,null) srbbrick14:10900:ApplyRule#125:: acRescQuotaPolicy srbbrick14:10900:GotRule#126:: acRescQuotaPolicy srbbrick14:10900:execMicroSrvc#127:: msiSetRescQuotaPolicy(off) srbbrick14:10900:ApplyRule#128:: acSetVaultPathPolicy srbbrick14:10900:GotRule#129:: acSetVaultPathPolicy srbbrick14:10900:execMicroSrvc#130:: msiSetGraftPathScheme(no,1) srbbrick14:10900:ApplyRule#131:: acPreProcForModifyDataObjMeta srbbrick14:10900:GotRule#132:: acPreProcForModifyDataObjMeta srbbrick14:10900:ApplyRule#133:: acPostProcForModifyDataObjMeta srbbrick14:10900:GotRule#134:: acPostProcForModifyDataObjMeta srbbrick14:10900:ApplyRule#135:: acPostProcForCreate srbbrick14:10900:GotRule#136:: acPostProcForCreate srbbrick14:10900:ApplyRule#137:: acPostProcForPut srbbrick14:10900:GotRule#138:: acPostProcForPut srbbrick14:10900:GotRule#139:: acPostProcForPut srbbrick14:10900:GotRule#140:: acPostProcForPut

Micro-services - How many are needed?

Micro-services (229)

State Information Attributes - How Many?

State Information Attributes (205)

Management Policies Arrangement – Organize records in series, and manage policies on each series Authenticity – For every record, record provenance metadata Chain of custody – For every record, manage an audit trail Integrity – For every record, manage two replicas and verify checksums – For every record, enforce retention and disposition Trustworthiness – Fore each series, validate ISO MOIMS-rac assessment criteria

Create and Apply Management Policies Maintain a repository of rules in iRODS – Rules characterized as XML files with input parameters explicitly characterized as rule parameters Build policy template and save in policy repository – Select which rule sets will be applied Map policy template to a record series (collection) – Add policy parameters as metadata on the record series On ingestion of a file into the record series: – Retrieve policies and parameters from the record series metadata – Invoke associated rules and apply preservation procedures This decouples the choice of access interface from the execution of your preservation policies

iRODS Distributed Data Management

Supported Storage Systems File systems - Windows, Linux, Mac Tape archives - HPSS, Sam-QFS Repositories - Flickr, Web sites Cloud storage- Amazon S3, EC2 Relational database - PostgreSQL, Oracle, mySQL Under development – Table driven resources

Modular Architecture Authentication – GSI (PKI), Kerberos, Shibboleth, Challenge-response Authorization – Roles, user groups, resource groups, policy constraints, ACLs Transport – TCP/IP (parallel I/O streams), Reliable Blast UDP Metadata catalog – PostgreSQL, mySQL, Oracle Distributed rule engine – Scheduler, messaging system, execution engine, rule base

Data Management Environments Production Systems - archives – Carolina Digital Repository – NARA Transcontinental Persistent Archive Prototype Production Systems - digital libraries – French National Library – Texas Digital Library Production Systems - data grids – Australian Research Collaboration Service – French National Institute for Nuclear Physics and Particle Physics – NSF iPlant Collaborative – NSF Teragrid – RENCI regional data grid – NSF Temporal Dynamics of Learning Center – National Optical Astronomy Observatories

27 iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2011 Budget Supplement in the area of Human and Computer Interaction Information Management technology research. Reagan W. Moore NSF OCI “NARA Transcontinental Persistent Archives Prototype” NSF SDCI “Data Grids for Community Driven Applications”