National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Introduction to Research Data Management Services, January 2013 The Analysis Stage Analyzing the data from the 4 exercises.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
National Geospatial Digital Archive Greg Janée. Greg Janée May 31, Outline Two preservation misadventures Digital preservation problems Genesis.
Long-term Preservation as a Relay Greg Janée University of California at Santa Barbara.
A Very Brief Introduction to iRODS
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
CC 2007, 2011 attribution - R.B. Allen Information System Architectures and Services.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
The digital scholar’s workbench Ian Barnes ELPUB 2007 Vienna — 13th to 15th June 2007.
NDIIPP and NGDA National Preservation Network For Digital Content.
Framework for Model Creation and Generation of Representations DDI Lifecycle Moving Forward.
North Carolina Geospatial Data Archiving Project (NCGDAP) Project Overview Partnership –University library (NCSU) and state agency (NCCGIA) –$520,000 funding,
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
David Giaretta Associate Director (Development) Funders: DCC Development Digital Curation Centre a centre of expertise in data curation and preservation.
Statewide Digitization and the FCLA Digital Archive Priscilla Caplan, Florida Center for Library Automation Statewide Digitization Planners Meeting OCLC,
PeDALS Persistent Digital Archives & Library System Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library,
Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in.
Finding a New Way Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library, Archives and Public Records Using.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
ESIP 2009 Summer Meeting, UC Santa Barbara, CA, July 7 – 10, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich InfoAnalytics.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
National Digital Information Infrastructure and Preservation Program (NDIIPP) CNI Project Briefing December 5, 2005.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
November 2004 NDIIPP: Future Directions and Relevance to Other Countries Beth Dulabahn Office of Strategic Initiatives Library of Congress November 7,
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
National Geospatial Digital Archive Greg Janée University of California at Santa Barbara.
Greg Janée topics Fedora NGDA project activities Two study ideas MODIS Preservation as series-of-handoffs.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
NGDA Architecture Update Greg Janée. Greg Janée May 16, Three motivations Archival has to be cheap & easy –little incentive –no funding Need to.
National Geospatial Digital Archive Greg Janée UC Santa Barbara.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan Florida Center for Library Automation (FCLA)
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Content Transfer NDIIPP Meeting July 9, 2008 Jane Mandelbaum, LC.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Al Cornish, Systems Librarian Washington State University Libraries Preserving Access to Multimedia Collections.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
Nancy J. Hoebelheinrich, Metadata Coordinator, Stanford University 1 Metadata for the NGDA: Developing a Shared Approach Joint UCSB / Stanford meeting.
Meeting of the Member States Expert Group on Digitisation and Digital Preservation , Luxembourg European Archival Records and Knowledge Preservation.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Metadata Issues in Long-term Management of Data and Metadata
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
DAITSS and the Florida Digital Archive
Statewide Digitization and the FCLA Digital Archive
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
CNI Project Briefing December 5, 2005
Presentation transcript:

National Geospatial Digital Archive Greg Janée University of California at Santa Barbara

Greg Janée DCC seminar A misadventure in preservation 1976 –Viking probes go to Mars –soil data is analyzed for evidence of life 1999 –USC neurobiologist Joseph Miller asks for data –NASA has data on tape! But... –tapes coded “in a format so old that the programmers who knew it had died”

Greg Janée DCC seminar Paradox of preservation Is the data valuable? –yes: had to travel to another planet to get it Is the data being used? –no –perhaps never again How much am I willing to pay for its preservation? –as close to zero as possible

Greg Janée DCC seminar Is it worth preserving? Keith’s equation * : –(current value) = (intrinsic value) - (cost to use) Greg’s equation: –item is worth preserving for time duration T if: (intrinsic value) * Prob T (usage) >  T (preservation costs) + (cost to use) *apologies to Keith Johnson, Stanford libraries

Greg Janée DCC seminar Project genesis NDIIPP –Library of Congress, 2000 –$100M – NGDA –UCSB (MIL) & Stanford (Branner Library) –$2.6M, 3 years –geospatial data –

Greg Janée DCC seminar

7 Project goal “How can we preserve geospatial data on a national scale and make it available to future generations?” No focus on a particular collection Geospatial data –discrete chunks –relatively highly-structured, well-defined –but 90% of our work is generic

Greg Janée DCC seminar Idea #1 Archival has to be cheap & easy –must be distributed –little incentive, no funding –not sexy

Greg Janée DCC seminar NGDA approach Compromise: define cheap archive –fundamental approach: preservation by co-archival of object semantics –ingest: one step up from crawling –web access –notable for what’s missing: discovery, usability Foundation for additional functionality –e.g., migration –prototype archives will offer ADL, OAI access

Greg Janée DCC seminar Idea #2 Archival systems must be designed with their own demise in mind –archival objects will long outlive any system that manages them –system-level migrations will occur –at inopportune times

Greg Janée DCC seminar system databasestorage handle resolver database Typical repository architecture database handle resolver database fragile

Greg Janée DCC seminar NGDA architecture storage subsystem standard, public data model archival system databases, caches, etc. bulk loader ingest ADLOAI Web access

Greg Janée DCC seminar Post-NGDA architecture storage subsystem standard, public data model Web

Greg Janée DCC seminar Storage system requirements Req’s: –associate UUIDs/RIDs with bitstreams –retrieve global/local bitstream by UUID/RID –determine (parent) UUID of any bitstream –list all UUIDs Satisfied by: –any filesystem –any kind of UUIDs tag:library.ucsb.edu,2005:identifier

Greg Janée DCC seminar Archival objects manifest UUID component RID UUID

Greg Janée DCC seminar Archival object representation Components are files Manifest is an XML document Other approaches –OAIS: archival information packages (AIPs) –XMLtape

Greg Janée DCC seminar Ingest Ingest template defines –common structure of objects to be ingested –necessary validations –associations to other objects assumes pre-loading of semantic definitions –policies, rights, etc. Represents choke point –requires human evaluation

Greg Janée DCC seminar Format registry We’re developing one –who isn’t? Serves as archive of format specifications How broadly to interpret “format”? –traditional file format –product –series, collection, arbitrary set

Greg Janée DCC seminar Format dependencies Consider dependency graph induced by format specifications Def: a format is recoverable if the format of its specification is recoverable Axioms: plain text, HTML are recoverable PDF HTML GIF GeoTIFF CSS plain text TIFF “dessicated” version

Greg Janée DCC seminar Challenges Making ingest easy, easier, easier-er,... GIS formats –very complex: topology, layer, coverage, project –proprietary MODIS –multiple petabytes –format (HDF) is not well-defined –moving to on-demand computation of products –lineage important –copious additional semantics

Greg Janée DCC seminar Misadventure, redux What if there had been an NGDA-like solution? –format specification would have been archived Limitations –data not necessarily immediately usable –format specification itself not necessarily viewable But limitations can be addressed according to usage, available resources

Greg Janée DCC seminar Questions for you Archival systems –definition? functionality? Storage systems –definition? functionality? Archival object representation –discrete files vs. AIPs? GIS formats –“dessicated” form?