Download presentation
Presentation is loading. Please wait.
Published byMarjorie Watkins Modified over 9 years ago
1
Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008
2
I love standards. There are so many of them to choose from.
3
Standards & Sustainability Disclosure: Are complete specifications available? For free? Adoption: To what extent is the standard already used? Documentation: Is the specification clear and straightforward? Are there additional resources to assist in understanding the standard?
4
Standards & Sustainability External Dependencies: To what extent does use of the standard rely on particular hardware or software? On other standards? On other non-standards? Impact of Patents: If patents cover some or all of the standard, are licensing issues likely to complicate use of the standard? Technological Protection Measures: Does the standard rely on technological protection measures which will inhibit your ability to preserve data? Tip of the hat to Library of Congress Sustainability Of Digital Formats Site http://www.digitalpreservation.gov/formats/sustain/sustain.shtml
5
Part I: How to Operate an Archive
6
Open Archival Information System Reference Model Developed by the Consultative Committee For Space Data Systems Adopted as ISO 14721:2003 Available at http://public.ccsds.org/publications/archive /650x0b1.pdf http://public.ccsds.org/publications/archive /650x0b1.pdf Provides definitions of components of an archive, their relationship to each other, a set of mandatory responsibilities for an archive, and both functional and data models.
7
OAIS Reference Model: Mandatory Responsibilities Negotiate for an accept information from producers Obtain sufficient control of information to ensure long-term preservation (including necessary IP permissions and authority to migrate) Determine which communities should be the Designated Communities and should be able to understand the information provided Ensure that the information to be preserved is independently understandable to the designated community (i.e., they can understand it without the assistance of experts who created it ). Follow documented policies and procedures ensuring information is preserved against reasonable contingencies Make the information available to the designated community
8
OAIS Functional Model
9
OAIS Functional Model: Ingest
10
OAIS Functional Model: Archival Storage
11
OAIS Functional Model: Data Management
12
OAIS Functional Model: Access
13
OAIS Functional Model: Preservation Planning
14
OAIS Functional Model: Administration
15
OAIS Data Model
16
Part II: How to Create Content for an Archive
17
Archival Content A syllogism to ponder: No digital media can be read without a hardware device designed to read the media format. It is exceedingly rare for a hardware device intended to read a specific digital media format to be manufactured for more than 30 years, and many have had shorter lifespans. Therefore, if your content is not device independent, it is not really archival.
18
Archival Content: Text Some Issues to Consider When Examining Text Standards Technical aspects of character encoding Character Repertoire (Script & Language Support) Line Break Handling & Line Orientation Indexing Formatting Other processing
19
Archival Content: Text A Standard for Characters Unicode 5.1 - ISO/IEC 10646 Two variable length encodings (UTF-8, UTF-16) and a fixed length encoding (UTF-32). In UTF-8, byte order is not an issue. In UTF-16 and UTF-32, big-endian and little-endian encodings are supported. Over 100K characters, supporting 75 different scripts and many additional symbols and diacritics, with room for expansion to 1,114,112 characters. Support for a variety of line breaking mechanisms Support for different text directionality, including algorithms specifying the appropriate handling of text of mixed directionality
20
Archival Content: Text A Standard for Syntax XML (World Wide Web Consortium) Standards for Semantics Chemical Markup Language, Chemical Industry Data Exchange Astronomical Markup Language, Astronomical Dataset Markup Language, Astronomical Instrument Markup Language Earth Science Markup Language, Geography Markup Language, NetCDF Markup Language, ArcGIS Markup Language MathML Etc., etc., etc….
21
Archival Content: Images Some Issues to Consider When Examining Image Standards Color Depth Color Space Color Management Image Resolution Scalability Compression
22
Archival Content: Images Tagged Image File Format (TIFF) 6.0 -- 1 to 64-bit color depth, supports grayscale, RGB, YCbCr, CMYK and CIELab color spaces, supports embedded ICC color profiles, raster format, supports uncompressed as well as lossless and lossy DCT-based compression JPEG 2000 (ISO/IEC 15444) -- 1-48 bits per channel with multiple channels (including alpha & transparency), supports wide array of color spaces with sRGB and sYCC as defaults, supports ICC color profiles, raster format, supports uncompressed as well as lossless and lossy wavelet based compression Scalable Vector Graphics 1.2 -- uses sRGB color spaces, supports ICC Color Profiles, vector format
23
Archival Content: Audio/Video Some Issues to Consider when Examining Audio/Video Standards Audio sampling rate Audio bit depth Video frame rate Video color space/depth Compression Good News: Audio/Video is a bit more standardized than text/image world Bad News: Lossless digital audio is rare; lossless digital video is almost nonexistent.
24
Archival Content: Audio/Video Broadcast WAVE Audio (EBU Standard N22 - 1997) For video, picture is less clear. Proprietary solutions dominate market. Many of these (e.g., QuickTime, WMV) do support lossless image frame and audio data. MXF, a SMPTE standard, is gaining some traction in digital library circles (and the movie industry)
25
Archival Content: Data Some disciplinary de facto standards (e.g., Chemical Markup Language). Cover Pages (http://xml.coverpages.org) is a good source for information on many of the major ones.http://xml.coverpages.org No single standard for general use for data encoding, although many contenders
26
Archival Content: Data Binary Format Description Language (BDFL) - - XML language based on the Extensible Scientific Interchange Language (XSIL) that supports documentation of binary and ASCII data eXtensible Data Format (XDF) -- scientific data format supporting hierarchical data structures, N-dimensional arrays, scalar and vector fields, user-defined coordinate systems
27
Archival Content: Data Data Format Description Language (DFDL) -- A language for describing the structure or binary and character encoded data to expose their structure, format and metadata so that machine processes can work upon them. Data Documentation Initiative (DDI) -- An effort by the ICPSR at Univ. of Michigan to develop an XML format for documenting social science data sets. XML files can be used to produce either bibliographic descriptions of data sets or SAS/SPSS/STATA data definition statements.
28
Archival Content: Data Hierarchical Data Format (HDF5) -- General purpose file format (with supporting software library) for storing scientific data, developed by NCSA. Uses two fundamental structures, groups and data sets, where a data set is an N-dimensional array of data elements with metadata.
29
Archival Content: Paper ANSI/NISO Z39.48-1992, Permanence of Paper for Publications and Documents in Libraries and Archives ISO 9706-1994, Information and documentation -- Paper for documents -- Requirements for permanence
30
Part III: How to Create Metadata for an Archive
31
Metadata: Identifiers Persistence is important, but… Clarity on what is being identified may be more important (or, why an OpenURL is not a call number). Standards proliferate in this space; choice of any identifier may depend on: Social concerns (for whom am I identifying something?) Identifier/address resolution (how do I find a copy/item using this identifier?)
32
Metadata: Structural Metadata intended to identify the components of an object and their relationship to each other in order to support the object’s navigation and use Metadata Encoding & Transmission Standard (METS) MPEG-21 Digital Item Declaration Language XML Formatted Data Units (XFDU) OAI-ORE
33
Metadata: Provenance Metadata documenting the origins and life- cycle of a digital object PREMIS Data Dictionary for Preservation Metadata 2.0 Joint project of OCLC & RLG Defines metadata element set that “supports the viability, renderability, understandability, authenticity and integrity of digital objects in a preservation context.”
34
Metadata: Provenance The PREMIS Data Model
35
Metadata: Provenance PREMIS Object Metadata: Identifier Category Preservation Level Significant Properties Characteristics (fixity, size, format, etc.) Original Name Storage Environment Signature Relationships to other Objects, Events, Rights
36
Metadata: Provenance PREMIS Event Metadata Identifier Type Date & Time Details Outcome Relationship to Agents and Objects
37
Metadata: Provenance PREMIS Agent Metadata Identifier Name Type
38
Metadata: Provenance PREMIS Rights Metadata Rights Statement Rights Basis Copyright Information License Information Statute Information Rights Granted Relationship to Objects and Agents
39
Metadata: Administrative Technical Metadata Z39.87 and MIX Technical Metadata for Text (TextMD) AES-X098 Administrative Metadata for Audio Objects SMPTE RP210.10-2007 Metadata Dictionary Rights Metadata Standards, yes. That you want to use, no.
40
Metadata: Descriptive Issues to consider: Nature of object to be described Real purpose(s) of description Community(ies) that will utilize description Supporting standards of descriptive practice and controlled vocabularies
41
Metadata: Descriptive Library/Archives/Museums/Educators MARC, MODS, Dublin Core EAD VRA Core, CDWA IEEE LOM Data Repositories Data Documentation Initiative Content Standard for Digital Geospatial Metadata Darwin Core Access to Biological Collection Data (ABCD)
42
How to Evaluate an Archive
43
Evaluating Archives Trustworthy Repositories Audit & Certification (TRAC) Criteria & Checklist http://www.crl.edu/content.asp?l1=13&l2 =58&l3=162&l4=91 http://www.crl.edu/content.asp?l1=13&l2 =58&l3=162&l4=91 Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) http://www.repositoryaudit.eu/ http://www.repositoryaudit.eu/
44
Exercise: URLs Images http://people.lis.uiuc.edu/~jmcdonou/Br yce.tif http://people.lis.uiuc.edu/~jmcdonou/Br yce.tif http://people.lis.uiuc.edu/~jmcdonou/Br yce.jp2 http://people.lis.uiuc.edu/~jmcdonou/Br yce.jp2
45
Exercise: URLs METS Schema, Documentation, Namespace http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/standards/mets/docs/mets.v1-7.html http://www.loc.gov/standards/mets/docs/mets.v1-7.html http://www.loc.gov/METS/ http://www.loc.gov/METS/ PREMIS Schema, Documentation, Namespace http://www.loc.gov/standards/premis/v1/PREMIS-v1-1.xsd http://www.loc.gov/standards/premis/v1/PREMIS-v1-1.xsd http://www.loc.gov/standards/premis/ http://www.loc.gov/standards/premis/ http://www.loc.gov/standards/premis/v1 http://www.loc.gov/standards/premis/v1 MIX Schema, Documentation, Namespace http://www.loc.gov/standards/mix/mix20/mix20.xsd http://www.loc.gov/standards/mix/mix20/mix20.xsd http://www.niso.org/kst/reports/standards?step=2&gid=None&pr oject_key=b897b0cf3e2ee526252d9f830207b3cc9f3b6c2c http://www.niso.org/kst/reports/standards?step=2&gid=None&pr oject_key=b897b0cf3e2ee526252d9f830207b3cc9f3b6c2c http://www.loc.gov/mix/v20 http://www.loc.gov/mix/v20
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.