Download presentation
Presentation is loading. Please wait.
1
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALTSALT
2
NHPRC Issued a Call… Design a digital preservation service with a business model for the archival community Design a digital preservation service with a business model for the archival community Fill the needs of archival repositories that cannot build and sustain their own electronic records archive Fill the needs of archival repositories that cannot build and sustain their own electronic records archive
3
DCAPE Project Distributed Custodial Archival Preservation Environments Distributed Custodial Archival Preservation Environments Project was funded by NHPRC in 2008 (RE10010-08) Project was funded by NHPRC in 2008 (RE10010-08) Officially started in December 2008 Officially started in December 2008 Project extended through April 2012 Project extended through April 2012 http://www.dcape.org/ http://www.dcape.org/ http://www.dcape.org/
4
What is Distributed Custodial Preservation? Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service Archival repository retains legal custody Archival repository retains legal custody Archival repository remains responsible for archival functions, including preservation and access Archival repository remains responsible for archival functions, including preservation and access Access to collections is controlled by archival repository Access to collections is controlled by archival repository
5
DCAPE Partners 28 people across 9 institutions and 2 staff at UNC, for a total of 32 participants 28 people across 9 institutions and 2 staff at UNC, for a total of 32 participants Cultural Entity: Getty Research Institute Cultural Entity: Getty Research Institute Cyberinfrastructure: West Virginia University, Carleton University (Canada) Cyberinfrastructure: West Virginia University, Carleton University (Canada) State Archives: California, Kansas, Michigan, Kentucky, North Carolina, New York State Archives: California, Kansas, Michigan, Kentucky, North Carolina, New York State Library: North Carolina State Library: North Carolina University Archives: Tufts University Archives: Tufts UNC: School of Information and Library Science (SILS), Sustainable Archives and Leveraging Technologies (SALT) UNC: School of Information and Library Science (SILS), Sustainable Archives and Leveraging Technologies (SALT)
6
DCAPE Goals Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Services are based on policies (rules) that are defined by the archivist Services are based on policies (rules) that are defined by the archivist Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE A series of rules might “look” like this: A series of rules might “look” like this: When files are ingested, replicate them in three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files. When files are ingested, replicate them in three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.
7
DCAPE Goals The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services. The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories. The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories. Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc. Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.
8
Project Tasks Execute service agreements between UNC and partners to govern use of the test collections. Execute service agreements between UNC and partners to govern use of the test collections. Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections. Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections. Ingest test collections into iRODS and validate the rules and services. Ingest test collections into iRODS and validate the rules and services. Develop business model (including costs) for sustaining a repository service based on iRODS. Develop business model (including costs) for sustaining a repository service based on iRODS. Develop model service agreements that define the standard and optional services of the repository. Develop model service agreements that define the standard and optional services of the repository.
13
Role of iRODS Preservation environment provides rule- based automation of archival functions (repeatable services) Preservation environment provides rule- based automation of archival functions (repeatable services) Standard and optional services will be available Standard and optional services will be available Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities
14
Life Cycle of Data Virtual Loading Dock Virtual Loading Dock Preservation Area Preservation Area SIP AIP DIP Reference Room Reference Room DIP
15
DCAPE Framework iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP
16
DCAPE Capabilities iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP 1 1 10 2, 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 15 Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP
17
iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP 1 1 10 2, 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 15 Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP DCAPE Capabilities Replication
18
Sample Rule sampleRule||delayExec( 1m 2m,assign(*path,/samplePath )##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePat h2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*D ataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pRes ult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_C oll,"*newpath"),nop,nop,nop),nop),nop)|nop
19
iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP 1 1 10 2, 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 15 Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP An Interface that is easy to manage the policies! 24
20
Hide the technical details Hide the technical details Show the information that archivists want to know Show the information that archivists want to know Be able to customize policies easily Be able to customize policies easily Web-based, no installation required Web-based, no installation required Interface - Requirements
21
iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP 1 1 10 2, 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 15 Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP Checksum Replication Demo I
32
iRODS Virtual Loading Dock Virtual Loading Dock V1 V2 V3 Preservation Area Preservation Area P1 P2 P3 SIP AIP DIP 1 1 10 2, 3, 4, 5, 6, 7, 8 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23 25, 26 24 15 Reference Room Reference Room R1R1 R1R1 R2R2 R2R2 DIP Checksum & Virus Check No Replication Demo II
42
DCAPE is More More than a storage service or environment More than a storage service or environment More than a reference tool More than a reference tool DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records
43
DCAPE Interface PlatformNavigationLoginAudit Trail Appearance Reporting Batch Operations Screen Views
44
DCAPE Metadata Follow Dublin Core model Follow Dublin Core model Allow customization Allow customization Encourage standardization Encourage standardization Define Define Source: creator, system, archivist Source: creator, system, archivist Level: collection, accretion, item Level: collection, accretion, item Accessibility: internal vs. public Accessibility: internal vs. public Fields: Required vs. optional Fields: Required vs. optional
45
DCAPE Workflow Define functionality at each stage Define functionality at each stage Virtual Loading Dock Virtual Loading Dock Pre-accessioning Pre-accessioning Ingestion Ingestion Preservation Area Preservation Area Archival storage Archival storage Data management Data management Administration Administration Preservation planning Preservation planning Reference Room Reference Room Access Access Common services Common services Management Management
47
DCAPE Business Model Non-profit Non-profit Fees for services Fees for services Fees for storage Fees for storage Storage and disaster prevention services Storage and disaster prevention services Software maintenance Software maintenance Access and connectivity Access and connectivity
48
MetaArchive Cooperative Encourage organizations to build their own preservation infrastructures rather than outsourcing to external vendors Encourage organizations to build their own preservation infrastructures rather than outsourcing to external vendors 3 levels of membership: 3 yr commitment 3 levels of membership: 3 yr commitment Basic costs: Basic costs: Equipment: 1 st year, $4.6K server purchase Equipment: 1 st year, $4.6K server purchase Staffing: 2% of a sys. admin’s time + POC admin + software eng. For content ingestion preparation Staffing: 2% of a sys. admin’s time + POC admin + software eng. For content ingestion preparation Storage: $1.00 / GB / year for content stored in n et. Storage: $1.00 / GB / year for content stored in n et. Yearly dues: Yearly dues: Sustaining Members: $5.5K / yr Sustaining Members: $5.5K / yr Preservation Members: $3K / yr Preservation Members: $3K / yr Collaborative Members: varies Collaborative Members: varies Cost scenarios: 2TB of content Cost scenarios: 2TB of content Sustaining Member: Preservation Member: Collaborative Member: $27.1K / 3 yrs --->($5.5K (membership) + $2K (space) )x 3 yrs + $4.6K (server) $19.6K / 3 yrs --->($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server) $22.6K/ 3 yrs --->($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)
49
Archive-It Subscription service from the Internet Archive, allowing institutions to build and preserve collections of born digital content Subscription service from the Internet Archive, allowing institutions to build and preserve collections of born digital content Allows users to crawl, scope, catalog, manage, and browse their archived collections Allows users to crawl, scope, catalog, manage, and browse their archived collections Collections are hosted at the IA data center and are available through URL and full-text search Collections are hosted at the IA data center and are available through URL and full-text search a minimum of 2 copies of each collection are kept online a minimum of 2 copies of each collection are kept online Cost Scenarios Cost Scenarios
50
Storage Cost Model Scenarios 1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage, assuming the need for two copies – one on disk and one on tape (iRODS)? 2. Question: What is the yearly cost of 6 million files (web crawl scenario) and 1 TB of storage, assuming the need for two tape copies (using iRODS)? 3. Question: What is the yearly cost of 100,000 files and 20 TB of storage with two tape copies (using iRODS)? Answer: $2,900 + $1,400 x 1.5 = $5,000 Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835 Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152
51
DCAPE Project http://dcape.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.