C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010.
Archiving research data in the cloud or in a local repository Michele Kimpton, CEO DuraSpace CNI Dec 2014.
The Frame NSF-funded national supercomputer centers Centers have hosted significant projects: TeraGrid, NPACI, GEON, SCEC, Chronopolis Fostered development.
ELECTRONIC RECORDS PRESERVATION ARCHIVES OF MICHIGAN.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
CC 2007, 2011 attribution - R.B. Allen Information System Architectures and Services.
MacKenzie Smith Associate Director for Technology MIT Libraries.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Knowledge Environments for Science and Engineering: Current Technical Developments James French, Information and Intelligent Systems Division, Computer.
SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.
Trusted Datagrids: Library of Congress Projects with UCSD Ardys Kozbial – UCSD Libraries David Minor - SDSC.
Toward a Distributed and Collaborative Framework for Preservation Martin Halbert, UNT Dean of Libraries David Minor, Chronopolis Program Manager Katherine.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
The Fundamentals of Preserving Knowledge Assets Pacific Neighborhood Consortium 2010 Catherine Quinlan, Dean of the USC Libraries USC's Dual Approach.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
World Data Center for Human Interactions in the Environment Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
OAIS Open Archival Information System. “Content creators, systems developers, custodians, and future users are all potential stakeholders in the preservation.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
The Global Video Grid: DigitalWell Update & Plan For SRB Integration Myke Smith, Manager Streaming Media Technologies University of Washington / ResearchChannel.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
November 2004 NDIIPP: Future Directions and Relevance to Other Countries Beth Dulabahn Office of Strategic Initiatives Library of Congress November 7,
Interoperability within the Grid NDIIPP Partners Meeting Arlington, VA July 9, 2008 Interoperability within the Grid Robert H. McDonald Digital Preservation.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Background Researchers and funders continue to be concerned about the lack of archiving of scientific data. Such data can be useful to researchers, educators,
Digital library infrastructure -- systems Repositories for storing digital resources protect, manage, deliver, and preserve digital resources over time.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
The Role of Academic Libraries in the Digital Data Universe Break-Out Session: New Partnership Models Bob Hanisch and Brian Schottlaender Co-Leaders ARL.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
National Archives and Records Administration Status of the ERA Project RACO Chicago Meg Phillips August 24, 2010.
Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
NATIONAL TREASURES DATA PRESERVATION WITH METADATA Sharon Shin Metadata Coordinator Federal Geographic Data Committee Secretariat ASPRS-Reno 2006.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Research Data Management At the Smithsonian PASIG, Washington, DC May 24, 2013.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
SAN DIEGO SUPERCOMPUTER CENTER Replication Policies for Federated Digital Repositories Robert H. McDonald Chronopolis Project Manager
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Libraries in the digital age Collection & preservation for generational access part two The LOCKSS Program.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Trustworthiness of Preservation Systems
Joseph JaJa, Mike Smorul, and Sangchul Song
Implementing an Institutional Repository: Part II
Technical Issues in Sustainability
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Successful Data Curation for Large Data Archives
Presentation transcript:

C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December 2009

T HE B RANSCOMB * P YRAMID FOR C OMPUTING Campus, Research Lab High- end Small-Scale, Home FACILITIESAPPLICATIONS “Leadership-class” facilities. Maintained by national labs and centers. Substantive professional workforce Community codes and professional software. Maintained by large groups of professionals (NASTRAN, Powerpoint, WRF, Everquest) Mid-range university and research lab facilities. Maintained by professionals and non-professionals. Community software and highly-used project codes. Developed and maintained by some professionals and academics (CHARMM, GAMESS, etc.) Private, home, and personal facilities. Supported by users or their proxies. Research and individual codes. Supported by developers or their proxies. *Chairman, NSF Blue-Ribbon Panel on High-Performance Computing (1993) BECS|||ES09

T HE B ERMAN * P YRAMID FOR D ATA Campus, Library, Data Center Small-Scale, Home FACILITIESCOLLECTIONS National-scale data repositories, archives, and libraries. High capacity, high reliability environment maintained by professional workforce. Reference, important, and irreplaceable data collections (PDB, PSID, Shoah, HathiTrust, etc. ) Local libraries and data centers. Commercial data storage. Medium capacity, medium-high reliability. Maintained by professionals. Research data collections. Developed and maintained by some professionals and academics Private repository. Supported by users or their proxies. Low-medium reliability, low capacity. Personal data collections. Supported by developers or their proxies. *VC/R, RPI High- End BECS|||ES09

T ICK … T ICK … T ICK … T ICK … T ICK There is a pressing need to preserve digital assets that represent the intellectual capital of scientific disciplines, educational communities, and government and cultural agencies. Many of these assets are increasingly at risk, whether as a consequence of: – lack of financial support; – evolution of storage and delivery systems, access mechanisms, or encoding formats; – calamity; or, – neglect. BECS|||ES09

ISSUE: F RAGILITY Dynamic: – May be revised or updated  instances, versions, editions – May change cumulatively or interactively  e.g., contributions to a listserv – May be available in various “views” More easily altered [without recognition] More easily corrupted Storage media have shorter life spans BECS|||ES09

ISSUE: C OMPLEXITY Linkages between and amongst them may change Increasingly, data and associated metadata cannot—or should not—be separated Some resources, like multimedia, are so closely linked to the software and hardware technologies that render them that they cannot be used outside those proprietary environments Need to be renderable on a variety of delivery devices Require access technologies that are changing at an ever-increasing pace BECS|||ES09

ISSUE: S ELECTION Intellectual question  What is “worth” archiving? – Business content (e.g., patient records) – Scientific content (e.g., PDB) – Scholarly content (e.g., Electronic Cultural Atlas ) – Cultural content (e.g., Shoah) – “Official” content (e.g., Govt. docs.) Physical question  What is the ‘archival unit?’ – What is its extent? – What are its boundaries? Links? Content of links? Intellectual and physical selection dimensions are not separate, but interrelated. E.g., determination of extent of digital object is necessary before harvest-based selection can take place. Selection criteria cannot be generalized because they are dependent on the goals and policies of the particular stakeholder. BECS|||ES09

Q UESTIONS, Q UESTIONS, Q UESTIONS Who gets to decide what’s worth preserving? Who’s responsible for preserving it? – Where? – How? – For how long? Who gets access? – Why? – When? Who pays? –Content creators? –Content users? –The government? BECS|||ES09

I T T AKES A V ILLAGE … “The successful preservation of valuable digital assets will require the expertise and collaboration of many institutions, both public and commercial, to help craft the reliable, economically sustainable, and trusted environments necessary for housing, managing, and ensuring our global knowledge over time.” — Fran Berman et al. “The Need to Formalize Trust Relationships in Digital Repositories.” EDUCAUSE Review, Vol. 43, No. 3 (May/June 2008) BECS|||ES09

W HO IS C HRONOPOLIS ? Chronopolis is being developed by a national consortium, with initial funding from the Library of Congress/NDIIPP Chronopolis partners are : – San Diego Supercomputer Center (SDSC) and the UC San Diego Libraries (UCSDL) – University of Maryland Institute for Advanced Computer Studies (UMIACS) – National Center for Atmospheric Research (NCAR) in Boulder, Colorado BECS|||ES09

W HAT IS C HRONOPOLIS ? Digital preservation environment using a data grid framework Designed to leverage capabilities at SDSC/UCSDL, UMIACS, and NCAR Emphasizes heterogeneous and redundant data storage systems Has a current storage capacity of 50 TB x 3 nodes Has three geographically distributed copies of data Includes detailed auditing and monitoring BECS|||ES09

P ARTNER R OLES SDSC, UMIACS, NCAR – Storage and network support – Complete copy of all data – Network testing – SRB support – Advanced data services UCSD Libraries – Metadata services, including analysis and specification BECS|||ES09

D ATA P ROVIDERS California Digital Library – 12 TB of data – Crawls of political and government Web sites – ARC files, uniform size – BagIt protocol for data transfer Inter-University Consortium for Political and Social Research (ICPSR) – 10 TB of data – 40+ years of social science research – Millions of files – Already using SRB North Carolina State University Libraries – 6 TB of data – State and local geospatial data – BagIt protocol for data transfer Scripps Institution of Oceanography – 1 TB of data – 50 years of data from SIO research cruises – Already using SRB BECS|||ES09

C ORE C HRONOPOLIS T OOLS Storage Resource Broker (SRB)  Data Grid Management System that provides hierarchical logical namespaces to manage the organization of data Auditing Control Environment (ACE)  Software to protect the integrity of digital assets in the long term SRB Replication Monitor  Webapp that watches registered directories and ensures that copies exist at designated mirrors BagIt  Hierarchical file packaging format for the exchange of generalized digital content Web Portal  Supplies data providers with transparent, in-depth look at their holdings in all locations BECS|||ES09

C OMING U P Updated auditing procedures Updated Web portal Automation of collection ingest New collections and storage nodes TRAC certification Fully-fledged business model BECS|||ES09

BECS|||ES09