Beyond Borders SAA Annual Meeting San Diego, August 5-9, 2012 University of California Curation Center California Digital Library Stephen Abrams Unified.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn.
A Micro-Services-Based Approach for Curation and Preservation Solutions Stephen Abrams Patricia Cruse John Kunze Perry Willett University of California.
Preservation of Software Barbara Sierman (digital preservation manager) E-Humanities Software and Tools Sustainability,
Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry (UDFR) Overview and Next Steps to an Operational.
Unified Digital Format Registry a semantic registry for digital preservation Sustaining the Unified Digital Format Registry (UDFR) Stephen Abrams UC Curation.
Metadata Descriptions statements descriptions records.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
LIFECYCLE METADATA FOR DIGITAL OBJECTS Danielle Cunniff Plumer School of Information The University of Texas at Austin Summer 2014.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Data modeling Goal: Agree on data modeling process and ontology.
Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011.
Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry (UDFR) Understanding the System and Service.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
H a r v a r d U n i v e r s i t y L i b r a r y Global Digital Format Registry An Update July 2006.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
3. Technical and administrative metadata standards Metadata Standards and Applications.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
NOBLE Digital Library. How does it work? The NOBLE Digital Library uses the DSpace platform. Image files and metadata are imported into DSpace using.
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
The NSDL Registry Jon Phipps Stuart Sutton Diane Hillmann Ryan Laundry Cornell U. U. of Washington.
Digital Preservation Dale Flecker Stephen Abrams February 15, 2007 HUL University Library Council.
3. Technical and administrative metadata standards Metadata Standards and Applications Workshop.
Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.
Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Metadata: first principles Pat Bell Knowledge, Analysis and Intelligence.
Repositories collect lots of technical metadata, but lack tools to use it to better understand the objects in their care, and to apply it precisely in.
Ensuring Enduring Access: A Forum on Digital Preservation, July 21, 2009.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Tackling concrete digital preservation challenges with SPRUCE Paul Wheatley SPRUCE Project Manager University of Leeds Twitter:
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Curation Micro-Services “It’s a Series of Tubes” Curation Micro-Services “It’s a Series of Tubes”
H ARVARD U NIVERSITY L IBRARY The Global Digital Format Registry (GDFR) Project Stephen Abrams Harvard University Andreas Stanescu OCLC CNI Fall Task Force.
UC3 Standards and Best Practices for Datasets and Other Supplemental Journal Article Materials UC3 Stephen Abrams Patricia Cruse John Kunze.
Update on UDFR (Unified Digital Format Registry) NDIIPP Meeting June 25, 2009 Andrea Goethals.
Preservation and Archiving Special Interest Group Spring Meeting San Francisco, May 2008 Preservation Characterization Stephen Abrams California.
Interoperability through Library APIs Library Technology Services Open House 7/30/15.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
JH VE 2 The Fifth International Conference on Preservation of Digital Objects British Library, September 2008 What? So What? The Next-Generation.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Use Cases and Functional Requirements Goal: Agree on prioritization and scope of requirements Sources – UDFR Technical Working Group: The Functional Requirements.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Research Data Services from the ASU Libraries Mary Whelan GIS Data Manager.
A Short Tutorial to Semantic Media Wiki (SMW) [[date:: July 21, 2009 ]] At [[part of:: Web Science Summer Research Week ]] By [[has speaker:: Jie Bao ]]
Global Digital Format Registry Progress Andrea Goethals, Harvard University Library NDIIPP Digital Preservation Partners’ Meeting Arlington, VA July 9,
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Open Access and Institutional Repositories. Accra, June 2007 Institutional repositories in SA research institutions: the DISA experience Dr D Peters.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
An Introduction to EZID University of California Curation Center Team California Digital Library August, 2011 UC3 Summer Webinar Series.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
Pcdm, iiif, & interoperability esmé dplafest
PREMIS in Archivematica PETER VAN GARDEREN Artefactual Systems Inc. American Library Association New Orleans - June 24, 2011.
Wikidata as a digital preservation knowledgebase
The National Archives Washington DC July 10, 2008
Avalon's Role in the Digital Collections Ecosystem
Global Digital Format Registry (GDFR)
Digital Project Lifecycle Curating Across the Curriculum
Nancy Y. McGovern Digital Preservation Officer, ICPSR IASSIST 2007
Presentation transcript:

Beyond Borders SAA Annual Meeting San Diego, August 5-9, 2012 University of California Curation Center California Digital Library Stephen Abrams Unified Digital Format Registry (UDFR) A Community Resource for Effective Preservation

Why are formats important? “Format” is the dividing line between bits and information  A set of syntactic and semantic rules for mapping between bits and information ffd8ffe000104a ffed0fb f746f73686f e d03e90a e e666f f40240ffeeffee fc d SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...

Unified Digital Format Registry “A reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community”  “Unification” of the function and holdings of ● PRONOM ● GDFR ( Global Digital Format Registry )  Library of Congress/NDIIPP funding  Open source platform  Semantic wiki  Open contribution and editing / strong provenance

Representation information What you need to know about something in order to exploit that thing meaningfully [OAIS/ISO 14720] Information that lets you answer important preservation questions  What format is it?  What are its significant properties?  Is it valid?  Is it at risk?  How can I read it? Render it? Play it?  What can it be transformed into, and how?

Technology stack OntoWiki OntoWiki Virtuoso quadstore Virtuoso quadstore Zend framework Zend framework PHP PHP Apache httpd Apache httpd RDF RDF RDFauthor/ JavaScript RDFauthor/ JavaScript HTTP / SPARQL HTTP / SPARQL Erfurt API Erfurt API Noid NOID Noid NOID

Ontology Abstract Base Abstract Product Abstract Format File Format Character Encoding Compression Algorithm Media Hardware Software Document File Agent IPR specification reference file holder owner creator maintainer ipr Controlled Vocabulary … … Holding Process embodies product input / output dependency Abstract Signature External Signature Internal Signature signature Digest digest Assessment Grammar grammar assessment holder

Initial data loads PRONOM as of  846 file formats 28 character encodings 17 compression algorithms 1,237 identifiers 548 external signatures 494 internal signatures 71 MIME types ( not in IANA ) 156 agents 268 software packages 2,080 software processes 23 IPR statements 217 relationships 7,816 Special thanks to TNA ► Tim Gollins ► Tracey Powell ► Spencer Ross

Initial data loads MIME types from Appspot as of  “Routinely scrapped from IANA using code in the mediatypes Google Code project”  809 application/* 125 audio/* 39 image/* 19 message/* 14 model/* 14 multipart/* 51 text/* 56 video/* 1,127  Plus 71 defined by PRONOM

Data licensing PRONOM data contributed under UK Open Government License (OGL) Other submissions contributed under under Creative Commons Attribution license (CC-BY)

Search or browse for information

Review provenance

Annotate information

Contribute or edit information

Next steps Operational control  CDL will continue to host the UDFR for one year while a more permanent hosting strategy can be identified Administrative control  The “admin” role – necessary for adding user privileges, modifying the ontologies, and bulk imports – is held by CDL staff  How can this responsibility be shared? Technical control  Who will share “committer” responsibility for the codebase?  How to coordinate additional development activity?

Next steps Technical development  Synchronization with PRONOM and other external sources of bulk imports  UI enhancements to provide lower-barrier learning curve  RESTful API ( in additional to SPARQL endpoint )  Replication to mirror sites  Others? Bring under the OPF code repository/issue tracking umbrella

Next steps Import additional data sources  Library of Congress Sustainability of Digital Formats  IT History Society hardware database  National Library of Australia Mediapedia  NIST NSRL (National Software Reference Library)  Stanford CPUdb  TOTEM (Trustworthy Online Technical Environment Metadata) database  Other candidates?

Next steps Use it Contribute or refine information Contribute to open source development Tell us what you think

For more information UDFR UC Curation Center Stephen Abrams Lisa Dawn Colvin Patricia Cruse John Kunze Margaret Low Mark Reyes Abhishek Salve Marisa Strong AKSW, Universität Leipzig Philipp Frischmuth Norman Heino Sebastian Tramp Library of Congress Martha Anderson Leslie Johnston National Archives [UK] Tim Gollins Tracey Powell Spenser Ross