The Library of Congress Audio-Visual Prototyping Project Carl Fleischhauer Office of Strategic Initiatives, Library of Congress Sound Savings Conference University of Texas, Austin July 25, 2003 This slide show: lcweb.loc.gov/rr/mopic/avprot/SoundSavings03.ppt
National Audio-Visual Conservation Center New Library of Congress facility for the Motion Picture, Broadcasting, and Recorded Sound Division (M/B/RS) Facility funded by the Packard Humanities Institute Will be in Culpeper, Virginia, 70 miles from Washington Planned to go operational 2005
Audio-Visual Prototyping Project Collections from the M/B/RS division and the American Folklife Center at LC Emphasis: reformatting endangered materials, especially magnetic tapes and instantaneous discs Current work: audio Future activities: video, copyright MP3s, content from web sites Prototyping period:
Motive 1: Alternative Preservation Approach Shortcomings of conventional practice: reformatting onto analog magnetic tape 1 Short life expectancy 2 Generation loss with each copy 3 Cessation of manufacture of analog tape and tape recorders
Motive 1: Alternative Preservation Approach Desire to work in the digital realm Emerging issues –Deterioration of tangible born-digital, e.g., CD-Rs acquired by LC from music composer copyrights –Emerging issue: preserving intangible born-digital content, e.g., MP3s from copyright and other acquisitions
Motive 2: Provide Access Limited access, since most items protected by copyright or require consideration of folk performer prerogatives LC researchers on Capitol Hill, collections in Culpeper Possible future authorized remote research sites
Illustration : sample preserved item We want to reproduce the artifact as a whole This example is a Marine Corps recording from the South Pacific in WW II –Audio from a disc copy of an Amertape Recording Film original (film-with-grooves) –Images depict the film container and the disc label
Initial display of navigation tree and thumbnails
Close-up display of image & file-level metadata
Preservation Concept Content takes the form of information packages aka digital objects Information packages consist of data (e.g., audio and image files, ) and metadata
Preservation Concept Not a CD or DVD approach Packages managed in digital repository Repository is server and storage-system based Paradox: –Content at any given moment depends upon systems and media –Content must be system and media independent
Four issues 1. Selecting the target format for reformatting 2. Quality of the reformatted copy 3. Shaping the object/package and the importance of metadata 4. Longevity in “media-less” environment
Issue 1 Selecting the format
Selecting the format Disclosure –are specifications and tools available? Adoption –is the format already in wide use? Transparency –is encoding open to analysis with basic tools?
Selecting the format Self-documentation –does object include metadata that explains how to render or understand context? Fidelity –support for high resolution audio Sound field –support for stereo and/or surround sound
Audio formats Audio masters –Bitstream: PCM sampling, uncompressed –File format: WAVE (higher res) –One-bit-deep formats (e.g., SONY DSD) of interest but “ahead of the game” for us Service files –WAVE (lower res) and MP3
Image formats Image Masters –Bitstream: Uncompressed bitmapped –File format: TIFF S ervice copies –JPEGs
Issue 2 Quality of the Reformatted Copy
Key Parameters Sampling frequency –Render the waveform as “dots” –More dots contribute to greater accuracy, capable of rendering high frequency sounds –Expressed as kilocycles per second or kiloherz –Compare to spatial resolution for images –Higher “pixels or dots per inch” contribute to better clarity
Key Parameters Word length, bit depth –Greater bit depth means greater precision in locating the sample in terms of amplitude –Greater bit depth means greater capacity to represent dynamic range –Expressed as bits per sample –Compare to tonal resolution (color) for images –Higher “bits per pixel” mean more accurate color
Staff discussion of parameters... Consensus on word length –Everyone is sold that 24 bit is better than 16 –Based on listening, objective measurement possible –“Extra data will protect you when the original has wide or varying dynamics, or if an operator makes a mistake.” –Compare to imaging and a downstream benefit Master image at 12 or 16 bits per channel Manipulate for aesthetic effect, save at 8 bits No gaps in your histogram
Staff discussion of parameters... Less consensus on sampling frequency –Some of us thought this was the relevant question: “What is the range of frequencies we might expect in this item?” 78 rpm disc from the acoustic era –8-10 kilocycles per second, or less –Rule of thumb: digitally sample at 2x frequency –Will 25 kilocycles per second suffice? Folk music collector with a Nagra in 1970s –14-18 kilocycles per second –Will 44 or 48 kilocycles suffice?
Staff discussion of parameters... Engineers advocated sampling frequencies of 96 or even 192 kHz Discussion tended to look at practical production issues and possible downstream options Objective measurement is not relevant to some of these factors
Very high resolution desired because: –“There may be hard-to-hear harmonics that you won’t want to lose.” –“Copies with less noise and less distortion can more successfully be restored in a post-process.” –“In the future we’ll have better enhancement tools and post-processing, so save as much raw information as you can.” –“What if you need extra data to support certain types of resource discovery?” Staff discussion of parameters...
Inherent fidelity of the original items not decisive. Informal A-B listening comparisons were helpful but not conclusive. Proposal to carry out empirical comparison of restoration actions applied to a high-res and a medium-res master.
Audio resolution for prototyping project Result of preceding discussion: the engineers work at the upper limit of the tools they have Reformatted content –Audio masters 96 kHz/24 bit mono or stereo (some at 48/24) –Service files 44.1 kHz/16 bit WAVE 256 kbps MP3 (if stereo)
Image resolution for prototyping project Reformatted content –Borrow approach from other digitization projects –Image Masters lines/pixels per inch 24 bit color –S ervice copies Same-size JPEGs
Two Sidebars
Sidebar on practices Professional equipment –For example, professional analog-to-digital converters Some details –Masters as flat transfers, avoid/minimize cleanup –Copy mono discs with stereo cartridge, hope for future process to “find the best groove wall”
Sidebar on practices Professional workers –Supervise and perform expert work Work requires knowledge and skills with antique formats and new digital technology
Sidebar on practices Some ideas for the future –Include apprentice workers in work team –Sort originals by “transfer efficiency” category –Use expert systems to help monitor transfers, spot anomalies –For some categories, copy two or three items at once Inspired by –PRESTO project in Europe ( –I mage-based recovery from discs (
Sidebar on objective measurement Imaging: targets Audio: test tones Outputs from targets/tones measure the performance of equipment They do not measure actual “content” images or sounds directly.
Sidebar on objective measurement Tools and practices not mature, even for imaging Need performance measures for digital systems –You can’t believe your scanner when it says 300 ppi Measure what actually comes through the system –Imaging example: use modulation transfer function (MTF) as a yardstick for delivered spatial resolution –Pass-fail point not yet established for image reformatting projects
Sidebar on objective measurement Tentative use of standard ITU test sequences known as CCITT 0.33 –28-second series of tones to test satellite broadcast transmissions, mono and stereo –Recordings of the tones can be used to determine the frequency response, distortion, and signal-to-noise ratio produced in a given recording system –Pass-fail point not yet established for sound reformatting projects
Issue 3 Shaping the information package and the importance of metadata
Information package Complex entity with multiple parts Data and Metadata Data in this context means the audio, video, or image bitstreams Metadata includes –Descriptive –Administrative –Structural
Descriptive metadata in the AV project For object as a whole –Often copy of descriptive data in LC central catalog –MODS XML schema Optional additional descriptive metadata for individual parts of object –Song titles, artists for disc sides or cuts –Names of writers in manuscript file folder –MODS “related items”
Administrative metadata in the AV project Persistent identifier, “ownership” info Documentation of reformatting today and digital migration tomorrow About the source and actions taken to prepare items for digitization, e.g., clean, bake About the digitizing process Rights data or at least categorization of objects for management of access
Structural metadata in the AV project Relationships between parts of objects Example: long-playing record album –Box, front –Three discs, two sides each (audio segments) –Disc label (images) –Booklet, cover and 28 pages (images)
Illustration: three-lp-disc boxed set with booklet
Encoding the metadata AV project is using the emerging Metadata Encoding and Transmission Standard (METS)
METS XML output (partial) displayed in Internet Explorer
Added metadata for long-term preservation To support long term content management Examples: –“Fixity” info, e.g., checksums to monitor file changes –Pointers to documentation for file formats –Pointers to documentation of the hardware/software environment required to render files No practice yet in AV prototyping project See RLG-OCLC preservation metadata report –
Overall anxiety... Are we trying to capture too much metadata? Tools to automate the creation of metadata, especially administrative metadata, are critical
Issue 4 Longevity in a media-less environment
Future LC repository Intersection of the AV project and Culpeper center with LC-wide digital planning (NDIIPP) LC repository design will be in terms of the NASA Open Archival Information System (OAIS) reference model
PRODUCERSPRODUCERS ADMINISTRATION DATA MANAGEMENT ARCHIVAL STORAGE INGEST ACCESS CONSUMERSCONSUMERS PRESERVATION PLANNING Reference Model for an Open Archival Information System (OAIS) SIP: Submission information package
PRODUCERSPRODUCERS ADMINISTRATION DATA MANAGEMENT ARCHIVAL STORAGE INGEST ACCESS CONSUMERSCONSUMERS PRESERVATION PLANNING Reference Model for an Open Archival Information System (OAIS) AIP: Archival information package
PRODUCERSPRODUCERS ADMINISTRATION DATA MANAGEMENT ARCHIVAL STORAGE INGEST ACCESS CONSUMERSCONSUMERS PRESERVATION PLANNING Reference Model for an Open Archival Information System (OAIS) DIP: Dissemination information package
PRODUCERSPRODUCERS ADMINISTRATION DATA MANAGEMENT ARCHIVAL STORAGE INGEST ACCESS CONSUMERSCONSUMERS PRESERVATION PLANNING Reference Model for an Open Archival Information System (OAIS) Current plan: The Culpeper facility will produce and submit packages to LC’s future digital repository.
While we wait for the OAIS- compliant repository... Continue to use UNIX-filesystem based storage Orderly file storage, masters segregated from service copies METS metadata stored for now as individual XML files Virtual information packages are “ready to submit” METS also supports end-user display
What about smaller archives and libraries? The digital approach to content preservation depends on significant computer infrastructure Will we have a few consortial repositories to serve many smaller archives? Who and how would such arrangements be made?
What about smaller archives and libraries? Holding action? For audio, make multiple CD-Rs or DVD-Rs? Write to data tape? LC is challenged to give good advice today
Web Sites LC audio-visual prototyping project – LC enterprise-wide digital preservation planning – Metadata Encoding and Transmission Standard (METS) –
Thank you...