Presentation is loading. Please wait.

Presentation is loading. Please wait.

C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December.

Similar presentations


Presentation on theme: "C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December."— Presentation transcript:

1 C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December 2009

2 T HE B RANSCOMB * P YRAMID FOR C OMPUTING Campus, Research Lab High- end Small-Scale, Home FACILITIESAPPLICATIONS “Leadership-class” facilities. Maintained by national labs and centers. Substantive professional workforce Community codes and professional software. Maintained by large groups of professionals (NASTRAN, Powerpoint, WRF, Everquest) Mid-range university and research lab facilities. Maintained by professionals and non-professionals. Community software and highly-used project codes. Developed and maintained by some professionals and academics (CHARMM, GAMESS, etc.) Private, home, and personal facilities. Supported by users or their proxies. Research and individual codes. Supported by developers or their proxies. *Chairman, NSF Blue-Ribbon Panel on High-Performance Computing (1993) BECS|||ES09

3 T HE B ERMAN * P YRAMID FOR D ATA Campus, Library, Data Center Small-Scale, Home FACILITIESCOLLECTIONS National-scale data repositories, archives, and libraries. High capacity, high reliability environment maintained by professional workforce. Reference, important, and irreplaceable data collections (PDB, PSID, Shoah, HathiTrust, etc. ) Local libraries and data centers. Commercial data storage. Medium capacity, medium-high reliability. Maintained by professionals. Research data collections. Developed and maintained by some professionals and academics Private repository. Supported by users or their proxies. Low-medium reliability, low capacity. Personal data collections. Supported by developers or their proxies. *VC/R, RPI High- End BECS|||ES09

4 T ICK … T ICK … T ICK … T ICK … T ICK There is a pressing need to preserve digital assets that represent the intellectual capital of scientific disciplines, educational communities, and government and cultural agencies. Many of these assets are increasingly at risk, whether as a consequence of: – lack of financial support; – evolution of storage and delivery systems, access mechanisms, or encoding formats; – calamity; or, – neglect. BECS|||ES09

5 ISSUE: F RAGILITY Dynamic: – May be revised or updated  instances, versions, editions – May change cumulatively or interactively  e.g., contributions to a listserv – May be available in various “views” More easily altered [without recognition] More easily corrupted Storage media have shorter life spans BECS|||ES09

6 ISSUE: C OMPLEXITY Linkages between and amongst them may change Increasingly, data and associated metadata cannot—or should not—be separated Some resources, like multimedia, are so closely linked to the software and hardware technologies that render them that they cannot be used outside those proprietary environments Need to be renderable on a variety of delivery devices Require access technologies that are changing at an ever-increasing pace BECS|||ES09

7 ISSUE: S ELECTION Intellectual question  What is “worth” archiving? – Business content (e.g., patient records) – Scientific content (e.g., PDB) – Scholarly content (e.g., Electronic Cultural Atlas ) – Cultural content (e.g., Shoah) – “Official” content (e.g., Govt. docs.) Physical question  What is the ‘archival unit?’ – What is its extent? – What are its boundaries? Links? Content of links? Intellectual and physical selection dimensions are not separate, but interrelated. E.g., determination of extent of digital object is necessary before harvest-based selection can take place. Selection criteria cannot be generalized because they are dependent on the goals and policies of the particular stakeholder. BECS|||ES09

8 Q UESTIONS, Q UESTIONS, Q UESTIONS Who gets to decide what’s worth preserving? Who’s responsible for preserving it? – Where? – How? – For how long? Who gets access? – Why? – When? Who pays? –Content creators? –Content users? –The government? BECS|||ES09

9 I T T AKES A V ILLAGE … “The successful preservation of valuable digital assets will require the expertise and collaboration of many institutions, both public and commercial, to help craft the reliable, economically sustainable, and trusted environments necessary for housing, managing, and ensuring our global knowledge over time.” — Fran Berman et al. “The Need to Formalize Trust Relationships in Digital Repositories.” EDUCAUSE Review, Vol. 43, No. 3 (May/June 2008) BECS|||ES09

10 W HO IS C HRONOPOLIS ? Chronopolis is being developed by a national consortium, with initial funding from the Library of Congress/NDIIPP Chronopolis partners are : – San Diego Supercomputer Center (SDSC) and the UC San Diego Libraries (UCSDL) – University of Maryland Institute for Advanced Computer Studies (UMIACS) – National Center for Atmospheric Research (NCAR) in Boulder, Colorado BECS|||ES09

11 W HAT IS C HRONOPOLIS ? Digital preservation environment using a data grid framework Designed to leverage capabilities at SDSC/UCSDL, UMIACS, and NCAR Emphasizes heterogeneous and redundant data storage systems Has a current storage capacity of 50 TB x 3 nodes Has three geographically distributed copies of data Includes detailed auditing and monitoring BECS|||ES09

12 P ARTNER R OLES SDSC, UMIACS, NCAR – Storage and network support – Complete copy of all data – Network testing – SRB support – Advanced data services UCSD Libraries – Metadata services, including analysis and specification BECS|||ES09

13 D ATA P ROVIDERS California Digital Library – 12 TB of data – Crawls of political and government Web sites – ARC files, uniform size – BagIt protocol for data transfer Inter-University Consortium for Political and Social Research (ICPSR) – 10 TB of data – 40+ years of social science research – Millions of files – Already using SRB North Carolina State University Libraries – 6 TB of data – State and local geospatial data – BagIt protocol for data transfer Scripps Institution of Oceanography – 1 TB of data – 50 years of data from SIO research cruises – Already using SRB BECS|||ES09

14 C ORE C HRONOPOLIS T OOLS Storage Resource Broker (SRB)  Data Grid Management System that provides hierarchical logical namespaces to manage the organization of data Auditing Control Environment (ACE)  Software to protect the integrity of digital assets in the long term SRB Replication Monitor  Webapp that watches registered directories and ensures that copies exist at designated mirrors BagIt  Hierarchical file packaging format for the exchange of generalized digital content Web Portal  Supplies data providers with transparent, in-depth look at their holdings in all locations BECS|||ES09

15 C OMING U P Updated auditing procedures Updated Web portal Automation of collection ingest New collections and storage nodes TRAC certification Fully-fledged business model BECS|||ES09

16 http://chronopolis.sdsc.edu minor@sdsc.edu BECS|||ES09


Download ppt "C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December."

Similar presentations


Ads by Google