Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.

Slides:



Advertisements
Similar presentations
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Advertisements

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
METS: An Introduction Structuring Digital Content.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
MODS, METS, and other metadata standards
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
3. Technical and administrative metadata standards Metadata Standards and Applications.
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
WMES3103 : INFORMATION RETRIEVAL
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
Image Metadata Summary of 4/18/99 NISO/DLF Image Metadata Meeting ( Howard Besser UCLA School of Education & Information.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Core Issues in Digital Preservation: Text and Images Jacob Nadal, Preservation Officer UCLA Library.
© Tanner, KCL 2007 How do I decide if JPEG 2000 is for me? Choosing standards when there are so many… Simon Tanner Director.
3. Technical and administrative metadata standards Metadata Standards and Applications Workshop.
Chapter 6 Text and Multimedia Languages and Properties
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
Ensuring Enduring Access: A Forum on Digital Preservation, July 21, 2009.
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Document Formats How to Build a Digital Library Ian H. Witten and David Bainbridge.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
Quality Levels of Reproduction Adolf Knoll National Library of the Czech Republic.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Introduction to metadata
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
OAIS, Designated Communities & Metadata Jerome McDonough Graduate School of Library & Information Science University of Illinois Urbana-Champaign
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
The Importance of Standards in Digital Preservation Tina Norris Kayla Payne Jennifer
Digitization & Digital Preservation
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
MULTIMEDIA Multimedia is the field concerned with the computer- controlled integration of text, graphics, drawings, still and moving images (Video), animation,
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
An Introduction to PREMIS Jenn Riley Metadata Librarian IU Digital Library Program.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Nancy J. Hoebelheinrich, Metadata Coordinator, Stanford University 1 Metadata for the NGDA: Developing a Shared Approach Joint UCSB / Stanford meeting.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Joint Meeting of CSUL Committees,
Criteria for Assessing Repository Trustworthiness: An Assessment
DRM in Proprietary Products and Digital Archive
Introduction to Metadata
Introduction to DSpace
Metadata for research outputs management
How do I decide if JPEG 2000 is for me?
Metadata in Digital Preservation: Setting the Scene
Image Metadata Summary of 4/18/99 NISO/DLF Image Metadata Meeting
Presentation transcript:

Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008

I love standards. There are so many of them to choose from.

Standards & Sustainability  Disclosure: Are complete specifications available? For free?  Adoption: To what extent is the standard already used?  Documentation: Is the specification clear and straightforward? Are there additional resources to assist in understanding the standard?

Standards & Sustainability  External Dependencies: To what extent does use of the standard rely on particular hardware or software? On other standards? On other non-standards?  Impact of Patents: If patents cover some or all of the standard, are licensing issues likely to complicate use of the standard?  Technological Protection Measures: Does the standard rely on technological protection measures which will inhibit your ability to preserve data? Tip of the hat to Library of Congress Sustainability Of Digital Formats Site

Part I: How to Operate an Archive

Open Archival Information System Reference Model  Developed by the Consultative Committee For Space Data Systems  Adopted as ISO 14721:2003  Available at /650x0b1.pdf /650x0b1.pdf  Provides definitions of components of an archive, their relationship to each other, a set of mandatory responsibilities for an archive, and both functional and data models.

OAIS Reference Model: Mandatory Responsibilities  Negotiate for an accept information from producers  Obtain sufficient control of information to ensure long-term preservation (including necessary IP permissions and authority to migrate)  Determine which communities should be the Designated Communities and should be able to understand the information provided  Ensure that the information to be preserved is independently understandable to the designated community (i.e., they can understand it without the assistance of experts who created it ).  Follow documented policies and procedures ensuring information is preserved against reasonable contingencies  Make the information available to the designated community

OAIS Functional Model

OAIS Functional Model: Ingest

OAIS Functional Model: Archival Storage

OAIS Functional Model: Data Management

OAIS Functional Model: Access

OAIS Functional Model: Preservation Planning

OAIS Functional Model: Administration

OAIS Data Model

Part II: How to Create Content for an Archive

Archival Content  A syllogism to ponder:  No digital media can be read without a hardware device designed to read the media format.  It is exceedingly rare for a hardware device intended to read a specific digital media format to be manufactured for more than 30 years, and many have had shorter lifespans.  Therefore, if your content is not device independent, it is not really archival.

Archival Content: Text  Some Issues to Consider When Examining Text Standards  Technical aspects of character encoding  Character Repertoire (Script & Language Support)  Line Break Handling & Line Orientation  Indexing  Formatting  Other processing

Archival Content: Text  A Standard for Characters  Unicode ISO/IEC  Two variable length encodings (UTF-8, UTF-16) and a fixed length encoding (UTF-32). In UTF-8, byte order is not an issue. In UTF-16 and UTF-32, big-endian and little-endian encodings are supported.  Over 100K characters, supporting 75 different scripts and many additional symbols and diacritics, with room for expansion to 1,114,112 characters.  Support for a variety of line breaking mechanisms  Support for different text directionality, including algorithms specifying the appropriate handling of text of mixed directionality

Archival Content: Text  A Standard for Syntax  XML (World Wide Web Consortium)  Standards for Semantics  Chemical Markup Language, Chemical Industry Data Exchange  Astronomical Markup Language, Astronomical Dataset Markup Language, Astronomical Instrument Markup Language  Earth Science Markup Language, Geography Markup Language, NetCDF Markup Language, ArcGIS Markup Language  MathML  Etc., etc., etc….

Archival Content: Images  Some Issues to Consider When Examining Image Standards  Color Depth  Color Space  Color Management  Image Resolution Scalability  Compression

Archival Content: Images  Tagged Image File Format (TIFF) to 64-bit color depth, supports grayscale, RGB, YCbCr, CMYK and CIELab color spaces, supports embedded ICC color profiles, raster format, supports uncompressed as well as lossless and lossy DCT-based compression  JPEG 2000 (ISO/IEC 15444) bits per channel with multiple channels (including alpha & transparency), supports wide array of color spaces with sRGB and sYCC as defaults, supports ICC color profiles, raster format, supports uncompressed as well as lossless and lossy wavelet based compression  Scalable Vector Graphics uses sRGB color spaces, supports ICC Color Profiles, vector format

Archival Content: Audio/Video  Some Issues to Consider when Examining Audio/Video Standards  Audio sampling rate  Audio bit depth  Video frame rate  Video color space/depth  Compression  Good News: Audio/Video is a bit more standardized than text/image world  Bad News: Lossless digital audio is rare; lossless digital video is almost nonexistent.

Archival Content: Audio/Video  Broadcast WAVE Audio (EBU Standard N )  For video, picture is less clear. Proprietary solutions dominate market. Many of these (e.g., QuickTime, WMV) do support lossless image frame and audio data. MXF, a SMPTE standard, is gaining some traction in digital library circles (and the movie industry)

Archival Content: Data  Some disciplinary de facto standards (e.g., Chemical Markup Language). Cover Pages ( is a good source for information on many of the major ones.  No single standard for general use for data encoding, although many contenders

Archival Content: Data  Binary Format Description Language (BDFL) - - XML language based on the Extensible Scientific Interchange Language (XSIL) that supports documentation of binary and ASCII data  eXtensible Data Format (XDF) -- scientific data format supporting hierarchical data structures, N-dimensional arrays, scalar and vector fields, user-defined coordinate systems

Archival Content: Data  Data Format Description Language (DFDL) -- A language for describing the structure or binary and character encoded data to expose their structure, format and metadata so that machine processes can work upon them.  Data Documentation Initiative (DDI) -- An effort by the ICPSR at Univ. of Michigan to develop an XML format for documenting social science data sets. XML files can be used to produce either bibliographic descriptions of data sets or SAS/SPSS/STATA data definition statements.

Archival Content: Data  Hierarchical Data Format (HDF5) -- General purpose file format (with supporting software library) for storing scientific data, developed by NCSA. Uses two fundamental structures, groups and data sets, where a data set is an N-dimensional array of data elements with metadata.

Archival Content: Paper  ANSI/NISO Z , Permanence of Paper for Publications and Documents in Libraries and Archives  ISO , Information and documentation -- Paper for documents -- Requirements for permanence

Part III: How to Create Metadata for an Archive

Metadata: Identifiers  Persistence is important, but…  Clarity on what is being identified may be more important (or, why an OpenURL is not a call number).  Standards proliferate in this space; choice of any identifier may depend on:  Social concerns (for whom am I identifying something?)  Identifier/address resolution (how do I find a copy/item using this identifier?)

Metadata: Structural  Metadata intended to identify the components of an object and their relationship to each other in order to support the object’s navigation and use  Metadata Encoding & Transmission Standard (METS)  MPEG-21 Digital Item Declaration Language  XML Formatted Data Units (XFDU)  OAI-ORE

Metadata: Provenance  Metadata documenting the origins and life- cycle of a digital object  PREMIS Data Dictionary for Preservation Metadata 2.0  Joint project of OCLC & RLG  Defines metadata element set that “supports the viability, renderability, understandability, authenticity and integrity of digital objects in a preservation context.”

Metadata: Provenance The PREMIS Data Model

Metadata: Provenance  PREMIS Object Metadata:  Identifier  Category  Preservation Level  Significant Properties  Characteristics (fixity, size, format, etc.)  Original Name  Storage  Environment  Signature  Relationships to other Objects, Events, Rights

Metadata: Provenance  PREMIS Event Metadata  Identifier  Type  Date & Time  Details  Outcome  Relationship to Agents and Objects

Metadata: Provenance  PREMIS Agent Metadata  Identifier  Name  Type

Metadata: Provenance  PREMIS Rights Metadata  Rights Statement  Rights Basis  Copyright Information  License Information  Statute Information  Rights Granted  Relationship to Objects and Agents

Metadata: Administrative  Technical Metadata  Z39.87 and MIX  Technical Metadata for Text (TextMD)  AES-X098 Administrative Metadata for Audio Objects  SMPTE RP Metadata Dictionary  Rights Metadata  Standards, yes. That you want to use, no.

Metadata: Descriptive  Issues to consider:  Nature of object to be described  Real purpose(s) of description  Community(ies) that will utilize description  Supporting standards of descriptive practice and controlled vocabularies

Metadata: Descriptive  Library/Archives/Museums/Educators  MARC, MODS, Dublin Core  EAD  VRA Core, CDWA  IEEE LOM  Data Repositories  Data Documentation Initiative  Content Standard for Digital Geospatial Metadata  Darwin Core  Access to Biological Collection Data (ABCD)

How to Evaluate an Archive

Evaluating Archives  Trustworthy Repositories Audit & Certification (TRAC) Criteria & Checklist  =58&l3=162&l4=91 =58&l3=162&l4=91  Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) 

Exercise: URLs  Images  yce.tif yce.tif  yce.jp2 yce.jp2

Exercise: URLs  METS Schema, Documentation, Namespace     PREMIS Schema, Documentation, Namespace     MIX Schema, Documentation, Namespace   oject_key=b897b0cf3e2ee526252d9f830207b3cc9f3b6c2c oject_key=b897b0cf3e2ee526252d9f830207b3cc9f3b6c2c 