Download presentation
Presentation is loading. Please wait.
Published byKaylee Kent Modified over 11 years ago
1
© 2006 Open Grid Forum Preservation Enviroment Research Group Rule-based preservation
2
© 2006 Open Grid Forum 2 OGF IPR Policies Apply I acknowledge that participation in this meeting is subject to the OGF Intellectual Property Policy. Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Appendix B of GFD-C.1, which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the OGF plenary session, any OGF working group or portion thereof, the OGF Board of Directors, the GFSG, or any member thereof on behalf of the OGF, the ADCOM, or any member thereof on behalf of the ADCOM, any OGF mailing list, including any group list, or any other list functioning under OGF auspices, the OGF Editor or the document authoring and review process Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions. Excerpt from Appendix B of GFD-C.1: Where the OGF knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non- discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification. OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.
3
© 2006 Open Grid Forum 3 OGF19 Preservation Environments Research Group Organizers: Reagan Moore (moore@sdsc.edu)moore@sdsc.edu "Bruce.Barkstrom" Goals: Analyze capabilities required by a preservation environment Define rule-based preservation environment - iRODS NARA Electronic Records Archive capability requirements RLG/NARA assessment criteria for a Trusted Digital Repository Barkstrom GGF paper - based on NASA Langley preservation model Demonstrate creation of a preservation environment based on data grid technology Demonstrate creation of preservation rules controlling a preservation environment Demonstrate replication of a collection Demonstrate federation of 17 SRB data grids (shared name spaces) Analyze capabilities that can be based on grid technology iRODS rule-oriented data system Participants: 19 contributors to data grid federation for GIN MIT - PLEDGE project on preservation policies SDSC - NARA research prototype persistent archive U Md - Producer Archive Workflow Network EU CASPAR, PLANETS; UK Digital Curation Centre
4
© 2006 Open Grid Forum 4 Managing Preservation Environments iRODS - integrated Rule-Oriented Data System
5
© 2006 Open Grid Forum 5 Community-specific Management Assessment criteria --> set of persistent state information Management policies --> set of rules Capabilities--> set of micro- services Challenge is defining the appropriate set of rules and micro-services that simplify the creation of the preferred preservation capabilities and associated management policies
6
© 2006 Open Grid Forum 6 Preservation Management Policies Authenticity Validate assertions made at time of data ingestion Validate existence of the descriptive (provenance) metadata Validate retention policy is consistent with submission agreement Integrity Maintain information about the management of the data Assertions made by the archivist Access controls, audit trails, checksums, replication, synchronization, federation Infrastructure independence Manage properties of records independently of choice of storage system Scalability Manage large collections (billions of records, petabytes of data, thousands of attributes) Aggregations across name spaces
7
© 2006 Open Grid Forum 7 iRODS Separate definition of management policies (rules) from definition of remote operations (micro- services) Control execution of all micro-services through application of rules Manage persistent state information for the results Query the persistent state information to validate assertions on preservation properties
8
© 2006 Open Grid Forum 8 iRODS - integrated Rule-Oriented Data System Resources Client InterfaceAdmin Interface Metadata Modifier Module Config Modifier Module Rule Modifier Module Consistency Check Module Confs Rule Base Metadata Persistent Repository Engine Rule Curren t State Rule Invoker Micro Service Modules Resource-based Services Micro Service Modules Metadata-based Services Service Manager Consistency Check Module Consistency Check Module
9
© 2006 Open Grid Forum 9 Managing Preservation Policies Require at least six name spaces for managing identity Logical storage resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name space Logical persistent state name space Require ability to federate name spaces Cross-register identity of object from each of the name spaces Require multiple levels of aggregation for each name space Typically three levels of aggregation Trust virtualization Ownership of the collection entities by the data grid
10
© 2006 Open Grid Forum 10 Metadata Attributes Associate state information with each name space User name Address, institution Group membership Type - (administrator, curator, owner, public) Logical file name System attributes Location, size, owner, checksum, container, … User-defined attributes Descriptive information Logical resource name Type of system Quotas
11
© 2006 Open Grid Forum 11 Federation Between Data Grids Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection A
12
© 2006 Open Grid Forum 12 Aggregation of Identifiers Users {Single user, group, federation} Resources {Single storage system, cached system, cluster} Files {Single file, container, directory} Metadata {Single attribute, hierarchical table, collection} Management policies {Single capability, set of capabilities, nested rules}
13
© 2006 Open Grid Forum 13 Demonstration of Rules Rule specified in four parts Single line, parts separated by the symbol | Name Conditions Functions calls Recovery calls Support multiple functions, separated by symbol ## Currently 15 rules Administrative Storage selection Data pre-processing Data post-processing Data deletion Parallel I/O
14
© 2006 Open Grid Forum 14 Rule-based Data Management Associate rules with combinations of name spaces Rule set for a particular collection Rule set for a particular user group Rule set for a particular user group when accessing a particular collection Rule set for a particular storage system Rule set for a particular micro-service Generic rules based on SRB operations
15
© 2006 Open Grid Forum 15 Administration Creation Rules acCreateUser | | msiCreateUser ##acCreateDefaultCollections ##msiCommit | msiRollback ##msiRollback##nop acVacuum(*arg1) | | delayExec(msiVacuum,*arg1) | nop acCreateDefaultCollections | | acCreateUserZoneCollections | nop acCreateUserZoneCollections | | acCreateCollByAdmin(/$rodsZoneProxy/home,$otherUserName) ##acCreateCollByAdmin(/$rodsZoneProxy/trash/home,$otherUserName) | nop##nop acCreateCollByAdmin(*parColl,*childColl) | | msiCreateCollByAdmin(*parColl,*childColl) | nop
16
© 2006 Open Grid Forum 16 Administration Deletion Rules acDeleteUser | | acDeleteDefaultCollections ##msiDeleteUser## msiCommit | msiRollback##msiRollback##nop acDeleteDefaultCollections | | acDeleteUserZoneCollections | nop acDeleteUserZoneCollections | | acDeleteCollByAdmin(/$rodsZoneProxy/home,$otherUserName) ##acDeleteCollByAdmin(/$rodsZoneProxy/trash/home,$otherUserName) | nop##nop acDeleteCollByAdmin(*parColl,*childColl) | | msiDeleteCollByAdmin(*parColl,*childColl) | nop
17
© 2006 Open Grid Forum 17 Data Manipulation Rules Rule for pre-processing on storage use acSetRescSchemeForCreate | | msiSetDefaultResc(demoResc,noForce) ##msiSetRescSortScheme(random) ##msiSetRescSortScheme(byRescType) | nop##nop##nop Rule for pre-processing on data reads acPreprocForDataObjOpen | | msiSortDataObj(random) | nop Rule for post processing data writes acPostProcForPut | | nop | nop acPostProcForCopy | | nop | nop Rule for setting number of threads for parallel I/O acSetNumThreads | | msiSetNumThreads(default,default,default) | nop Rule for data deletion policy setting acDataDeletePolicy | | nop | nop
18
© 2006 Open Grid Forum 18 iRODS Demonstration Demonstrate generic put command ilsresc ils -l nvo iput -R demoResc../src/icd.c nvo ils -l nvo Revise put command to automatically create a replica cp core.irb.1../../../server/config/reConfigs/core.irb ils -l nvo iput -R demoResc../src/ipwd.c nvo ils -l nvo Illustrate execution of a user-defined rule icd iput carl.ged foo1 irule -vF ruleInp3
19
© 2006 Open Grid Forum 19 iRODS Demonstration # iRODS Rule Base - core.irb # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recover call is made. # acPreprocForDataObjOpen | | msiSortDataObj(random) | nop acSetRescSchemeForCreate | | msiSetDefaultResc(demo2Resc,noForce) ##msiSetRescSortScheme(random) ##msiSetRescSortScheme(byRescType) | nop##nop##nop acDataDeletePolicy | | nop | nop acPostProcForPut | | nop | nop
20
© 2006 Open Grid Forum 20 iRODS Demonstration # iRODS Rule Base - core.irb # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recover call is made. # acPreprocForDataObjOpen | | msiSortDataObj(random) | nop acSetRescSchemeForCreate | | msiSetDefaultResc(demo2Resc,noForce) ##msiSetRescSortScheme(random) ##msiSetRescSortScheme(byRescType) | nop##nop##nop acDataDeletePolicy | | nop | nop acPostProcForPut | | nop | nop
21
© 2006 Open Grid Forum 21 iRODS Demonstration # iRODS Rule Base # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number of recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recovery call is made. # acPreprocForDataObjOpen | | msiSortDataObj(random) | nop acSetRescSchemeForCreate | | msiSetDefaultResc(demo2Resc,noForce) ##msiSetRescSortScheme(random) ##msiSetRescSortScheme(byRescType) | nop##nop##nop acDataDeletePolicy | | nop | nop acPostProcForPut | $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc) | nop acPostProcForPut | | nop | nop
22
© 2006 Open Grid Forum 22 iRODS Demonstration # This is an example of an input for the irule command. # This first input line is the rule body # The second input line is the input parameter in the format of label=value. # Multiple inputs can be specified using the '%' character as the separator. # The third input line is the output description. For multiple outputs use '% myTestRule | | msiDataObjOpen(*A,*S_FD) ##msiDataObjCreate(*B,null,*D1_FD) ##msiDataObjRead(*S_FD,100,*R1_BUF) ##msiDataObjWrite(*D1_FD,*R1_BUF,*W1_LEN) ##msiDataObjClose(*D1_FD,*junk2) ##msiDataObjCreate(*C,null,*D2_FD) ##msiDataObjRead(*S_FD,50000,*R2_BUF) ##msiDataObjWrite(*D2_FD,*R2_BUF,*W2_LEN) ##msiDataObjClose(*D2_FD,*junk3) ##msiDataObjClose(*S_FD,*junk4) *A=/tempZone/home/rods/foo1%*B=/tempZone/home/rods/foo2%*C=/tempZone/h ome/rods/foo3 *R1_BUF%*W2_LEN%*A
23
© 2006 Open Grid Forum 23 iRODS Demonstration Add and query metadata imeta add -d foo1 speed 100 "mph" imeta add -d foo1 length 200 "ft" imeta add -d foo2 speed 300 "mph" imeta add -d foo3 length 400 "ft" imeta ls -d foo1 imeta qu -d speed = 100 imeta qu -d speed ">=" 100 imeta qu -d length ">=" 100 Copy Metadata imeta ls -d foo1 imeta ls -d foo3 imeta cp -d -d foo1 foo3 imeta ls -d foo3
24
© 2006 Open Grid Forum 24 iRODS Demonstration Copy metadata attributes on a file to a collection imeta ls -C /tempZone/home/rods imeta cp -d -C foo1 /tempZone/home/rods imeta ls -C /tempZone/home/rods
25
© 2006 Open Grid Forum 25 Preservation Environments Working group task Define the sets of Assertions--> set of persistent state Management policies--> set of rules Capabilities--> set of micro-services Solicit groups willing to contribute to development of rule-based technology CASPAR PLANETS NARA UK e-Science data grid IN2P3 ARROW
26
© 2006 Open Grid Forum 26 Preservation Interoperability Preserve rules as property of each record Register versions of micro-services used to manipulate each record Register versions of persistent state information associated with each record When migrate record to a new preservation environment, migrate the rules, micro- services, and persistent state information
27
© 2006 Open Grid Forum 27 Preservation Evolution Can define new Rules Micro-services Persistent state information Can apply new rules in parallel with old rules, and take the most restrictive rule. Means preservation management policies, capabilities, and assertions can evolve over time.
28
© 2006 Open Grid Forum 28 More Information moore@sdsc.edu SRB: http://www.sdsc.edu/srb iRODS: http://irods.sdsc.edu/
29
© 2006 Open Grid Forum 29 Full Copyright Notice Copyright (C) Open Grid Forum (applicable years). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. The limited permissions granted above are perpetual and will not be revoked by the OGF or its successors or assignees.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.