Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999
Design and Development of a Digital Library System u Early experiences in digital conversion u Development of Profiles in Science system u Critical role of metadata in system design – Framework for collection management – Foundation for Web delivery – Standards for resource description
Lessons learned from an early Digital Library Project u Digital conversion work begun in 1992 u Some 1,500 historical documents (about 40,000 pages) – mimeographs – oddly sized documents – interview transcripts u some audio segments – photographs and other memorabilia
Processing the early Collection u Created index (metadata) records – templates varied by document type – included topical index terms – compensated for poor OCR quality u Created digital master copy of the documents (TIFF) – subsequently derived first GIF and then PDF
Profiles in Science u Profiles in Science Web Site launched in September 1998 u Archival collections of eminent biomedical scientists donated to the NLM – Include text, audio, still images, video – Include books, journal volumes, pamphlets, diaries, letters, manuscripts, photographs – Metadata for each item in the collection
Profiles in Science u Educational and research applications – Scholars of the history of medicine and the biological sciences – Students gain an appreciation of the methods and success of science u Allows anyone to look “behind the scenes” at how science is done
Profiles Collections u Two collections currently available – Oswald T. Avery, Joshua Lederberg u Careful attention to copyright and intellectual property concerns u Electronic “exhibit” – Initial set of digitized items for exhibit u Papers continue to be digitized – Full paper collections available for scholarly use at NLM
Wide Range of Document Types
Collection -specific Categories of Information
Contextualizing the Content
Tiff and Pdf Documents
Zoom Pdf for Detail
Tiff & Jpeg Documents
Tiff & Jpeg Photographs
High Resolution Tiff
Streaming Video
Search across Full Data Set
Experiment: Profiled Scientist as Interactive User u Digital documents available before release to the public u Online annotation capability u Annotations complement original document – Give additional detail, set document in context, add keywords
Sample Annotation
Design of the Profiles in Science System u A single underlying system that is designed to handle the entire life-cycle of a large-scale digital conversion project u Principles – modularity – adherence to standards – extensibility u Metadata forms core of system
System Architecture
Metadata-driven Document Conversion u Interpret metadata in its broadest sense – data about data u Use metadata to drive the entire system u The metadata record is the basic unit in the system, managing the – digitization process – display and organization of the data – network-based resource discovery
Metadata: Framework for Collection Management u Metadata entry system manages all aspects of digitization process – Unique identifiers bind digital master files, Web-derivatives, and metadata records – Enforces quality control (pull-down menus, validation, error messages) – Reports that manage workflow – Security measures
Metadata: Display and Organization of the Data u Series of programs generate HTML from metadata RDBMS – Include consistency checking u Programs generate alternative views – alphabetical, chronological, resource type, content area u Filtering mechanisms for access management
Metadata: Networked-based Resource Discovery u Dublin Core elements derived from metadata entry system – simplicity – semantic interoperability – international consensus – modularity
Sample Metadata Record on Web Site
Digital Conversion Projects u Conversion projects involve extensive human and computational resources u Therefore, it is important to design systems that – Are extensible – Automate processes whenever possible – Adhere to standards – Ensure the persistence of the data
Profiles in Science