NLM Digital Collections Update for DCFedoraUsersGroup January 22, 2013 John Doyle National Library of Medicine.

Slides:



Advertisements
Similar presentations
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
Advertisements

A Standardized DigiTool Ingest Approach to Internet Archive Digitized Books Joseph Shubitowski IGeLU 2008, September 9, 2008.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Vital Implementation Update Vital Implementation Update 11 th January 2006 Paul Bevan – Glen Robson –
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Mark J. Myers Electronic Records Archivist, KY Dept for Libraries and Archives (2001-May, 2014) Electronic Records Specialist, TX State Library and Archive.
Effective Tools for Digital Object Management University of North Texas Libraries Digital Projects Unit Jeremy D. Moore Lab Manager Sarah Lynn Fisher Digital.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
JSTOR User Services l February 2009 Using the JSTOR Interface User Services, February 2009.
Depositing e-material to The National Library of Sweden.
University of Adelaide Library Life Impact The University of Adelaide The well connected catalogue Patricia Scott, Denise Tobin and Helen Attar.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
Glen Robson Ioan Issac-Richards Vicky Philips
Progress in Access Technologies: NLM Video Search Jennifer Marill Chief, Technical Services Division Edward Luczak Systems Architect, Office of Computer.
The Cornell Veterinarian A Metadata Perspective.
OCLC Online Computer Library Center CONTENTdm 4.3 Claire Cocco Global Product Manager CONTENTdm October 3, 2007.
A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.
Putting it all together for Digital Assets Jon Morley Beck Locey.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
SobekCM’s Community Ecosystems & Socio-Technical Practices Presented by Mark V. Sullivan June 10 th, 2014 Sobek image created by Jeff Dahl and is shared.
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Web-based workflow software to support book digitization and dissemination The Mounting Books project books.northwestern.edu Open Repositories 2009 Meeting,
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
Goals for Shared ILS Development √ 4.10 upgrade (2/7/12) √ (Feb – April, 2012) √ 4.12 upgrade (5/31/12) 4.12 bug fix release – late.
EBSCOhost 2.0 GOLD/GALILEO ANNUAL USERS GROUP CONFERENCE August 1, 2008.
Interoperability through Library APIs Library Technology Services Open House 7/30/15.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Library needs and workflows Diane Boehr Head of Cataloging National Library of Medicine, NIH, DHHS
Dermot Frost Digital Repository of Ireland Trinity College Dublin.
Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
PREPARING FOR RESOURCE DESCRIPTION AND ACCESS (RDA) CATHY SALIKA NICOLE SWANSON CARLI Annual Meeting, Nov 9, 2012.
Introduction to metadata
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
ALA Annual Meeting Claire Cocco Global Product Manager CONTENTdm Users Group June 30th, 2008.
METADATA FOR ACCESS Monica Figueroa & Eve Grünberg.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Fedora Metadata The Basics 9/9/2008. Mini Glossary Fedora: ‘ Flexible Extensible Digital Repository Object Architecture;’ asset repository, metadata architecture.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Managing ETDs with Associated Complex Digital Objects Gabrielle V. Michalek Director, Scholarly Publishing, Archives and Data Services Carnegie Mellon.
What is Fedora Commons, and Why Should You Care? Cole Hudson and Graham Hukill.
Fedora Service Framework Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
MARC Tags to BIBFRAME Vocabulary: a new view of metadata Sally McCallum Library of Congress ALA - January 2014.
Post-ALA Annual July 11, 2008 Pre-Conference Workshop: The Care and Feeding of Compound Objects Geri Ingram OCLC Digital Collection Services Manager, User.
NLM Update and Still Image Serving April 27, 2016 John Doyle, Doron Shalvi, TA Nguyen National Library of Medicine.
What the Tech? – Stephen Boss ChartsBin -- Visualize data using this data/map overlay web application. Digital Public Library of America (DPLA)/UW Digital.
RDA Cataloging and DOI Assignments for NOAA Technical Publications NOAA Central Library October 2015.
Information modeling and infrastructures for metadata
Overview: Fedora Architecture and Software Features
Introduction to Metadata
Managing ETDs with Associated Complex Digital Objects
Making the Migration Moving Legacy Collections into Hydra
DIGITAL ARCHIVES Into the Light
Metadata to fit your needs... How much is too much?
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Beyond Description: Metadata for Catalogers in the 21st Century
NLM Digital Repository The Search for a New Book viewer
Presentation transcript:

NLM Digital Collections Update for DCFedoraUsersGroup January 22, 2013 John Doyle National Library of Medicine

The Story So Far 2 Texts Texts –7,866 books, incl. 225 multi-vol sets –Medical Heritage Library  1.7m pages  In-house digitization –1 multi-part report Audiovisuals Audiovisuals –70 films –2 thematic collections

The Saga Continues Serials Serials –NIH Institute annual reports –61 volume printed index of historical citations –Journals may be coming soon Oral Histories Oral Histories Still Images Still Images Born-digital resources Born-digital resources Citation dataset Citation dataset

Public Interface: “Digital Collections” Browse & Search (Muradora) Browse & Search (Muradora)  Supports multiple collections, diverse content  Resource display page: metadata, datastreams Book Viewer (NWU) Book Viewer (NWU)  Open source software from Northwestern University  Open source JPEG2000 server (Djatoka) Video Player with Search (NLM) Video Player with Search (NLM)  Features video transcript search and play-ahead jump  HHS Innovates finalist (top 6), Fall

Replacing Muradora Muradora codebase is aging Muradora codebase is aging –No community development or support Newer community projects reaching maturity Newer community projects reaching maturity –Islandora –Hydra Priority is to preserve/enhance resource search and browse Priority is to preserve/enhance resource search and browse Probably retain the book and video viewing applications Probably retain the book and video viewing applications 5

Current Developments Workflows Workflows –Increasingly concurrent content projects –Moving from project-specific to project-agnostic Data Services Data Services –Programmatic access – search web service –Bulk data –Need to pin down use cases Fedora framework upgrading Fedora framework upgrading –Journaling for propagating changes across multiple Fedora instances 6

Current Developments Periodic checksum checking Periodic checksum checking –Make use of recent Fedora enhancements in this area Third copy of content Third copy of content –“Just in case” copy, not primary disaster recovery –Amazon Glacier seems to be a good fit Descriptive Metadata Descriptive Metadata –More automated updating of ILS –Need to update Fedora/Solr post-ingest 7

Related Activities Internet Archive Internet Archive –Over 6,500 books uploaded as part of MHL project –Only selected datastreams going up –Expect to continue sending books to IA going forward Hathi Trust Hathi Trust –Working group delivered recommendations last year –Participation could involve an IA-to-HT path –Some bibliographic challenges to be met

NLM Digital Collections Support for Multi-volume texts January 22, 2013 Nancy Fallgren, Doron Shalvi National Library of Medicine

Outline Regular book processing Regular book processing Regular book data model and presentation Regular book data model and presentation What is a multi-volume? What is a multi-volume? Multi-volume metadata issues Multi-volume metadata issues Multi-volume scanning and identifiers Multi-volume scanning and identifiers Multi-volume metadata generation and workflow Multi-volume metadata generation and workflow Asynchronous volume processing (a.k.a. Jail) Asynchronous volume processing (a.k.a. Jail) Multi-volume data model and presentation Multi-volume data model and presentation Software adjustments Software adjustments Questions Questions 10

Regular book processing Voyager record Voyager record –One to one relationship between BIB record and digital object Metadata processing Metadata processing –MARCXML to OAI-DC and DMDINDEX Preingest process Preingest process –Create derivatives –Generate FOXML –Locate files Ingest into Fedora Ingest into Fedora 11

Regular book data model 12 IDTYPEMIMETYPELABEL PID-- Fedora persistent identifier DCXtext/xml Dublin Core metadata for this object RELS-EXTXapplication/rdf+xml RDF statements about this object MARCXMLMtext/xmlMARCXML metadata DMDINDEXXtext/xml DMDINDEX descriptive metadata METSMtext/xml METS file for entire book OCREtext/plain Book OCR - full text of entire book PDFEapplication/pdfPDF of entire book THUMBEimage/jpeg JPG Thumbnail image of selected page in book PreviewEimage/jpegJPG Preview image of selected page in book

Regular book presentation 13

What is a Multi-volume? Multiple volume monographic series Multiple volume monographic series –All volumes share the same series title –Each volume may or may not have a unique title –The series has a finite beginning and end Unanalyzed cataloging, i.e., the entire set is cataloged as a single unit, individual volumes do not have their own catalog/BIB records Unanalyzed cataloging, i.e., the entire set is cataloged as a single unit, individual volumes do not have their own catalog/BIB records Not journals or serials Not journals or serials 14

Multi-volume metadata issues One to many relationship between the Voyager BIB record (for the series) and the digital objects (each volume) One to many relationship between the Voyager BIB record (for the series) and the digital objects (each volume) –NLM UID (MARC 035$9) is the basis for each digital object’s PID –Disambiguating volume titles Distinguishing multi-vol pre- and post-ingest processing workflows from monograph workflows Distinguishing multi-vol pre- and post-ingest processing workflows from monograph workflows

Scanning Spreadsheets: UIDs and volume nos.

From spreadsheet to XML

Set/Parent MARCXML

New child/volume MARCXML

Set/Parent DC

Child/Volume DC

Disambiguating Multi-volume workflows Transform pre-ingest manifests (UID lists) Transform pre-ingest manifests (UID lists) –Remove all UIDs with “X#” suffix Transform post-ingest manifests Transform post-ingest manifests –Remove all “X#” suffixes from UIDs –De-dupe the remaining list –Add only set/parent url to BIB records DREPSERIES code DREPSERIES code

Asynchronous Volume processing a.k.a. Jail Do not pass GO, do not collect $200 Do not pass GO, do not collect $200 Volumes are scanned and processed Volumes are scanned and processed asynchronously asynchronously Set object created for first child part Set object created for first child part Standard processing and review workflow Standard processing and review workflow Volumes held in Jail – no further processing – until all volumes pass manual review on Fedora QA system Volumes held in Jail – no further processing – until all volumes pass manual review on Fedora QA system Once all volumes reviewed, full set promoted to Production Once all volumes reviewed, full set promoted to Production

Multi-volume set data model 24 IDTYPEMIMETYPELABEL PID-- Fedora persistent identifier DCXtext/xml Dublin Core metadata for this object RELS-EXTXapplication/rdf+xml RDF statements about this object MARCXMLMtext/xmlMARCXML metadata DMDINDEXXtext/xml DMDINDEX descriptive metadata THUMBEimage/jpeg JPG Thumbnail image of selected page in set PreviewEimage/jpegJPG Preview image of selected page in set Same data model as book, but no METS, OCR or PDF Same data model as book, but no METS, OCR or PDF

Multi-volume part data model 25 IDTYPEMIMETYPELABEL PID-- Fedora persistent identifier DCXtext/xml Dublin Core metadata for this object RELS-EXTXapplication/rdf+xml RDF statements about this object MARCXMLMtext/xmlMARCXML metadata DMDINDEXXtext/xml DMDINDEX descriptive metadata METSMtext/xml METS file for entire book OCREtext/plain Book OCR - full text of entire book PDFEapplication/pdfPDF of entire book THUMBEimage/jpeg JPG Thumbnail image of selected page in book PreviewEimage/jpegJPG Preview image of selected page in book Same data model as book Same data model as book

Multi-volume relationships 26 SetPart fedora:hasPart fedora:isPartOf

Multi-volume presentation - set 27

Multi-volume presentation - part 28

Software adjustments Creation of new content models – mvset, mvpart Creation of new content models – mvset, mvpart New process to generate FOXML, capture thumb New process to generate FOXML, capture thumb New relationships in RELS-EXT New relationships in RELS-EXT Adjustment of UI and business logic to handle sets – link to all parts, query part names from Solr Adjustment of UI and business logic to handle sets – link to all parts, query part names from Solr Adjustment of UI to handle child parts – link back to set Adjustment of UI to handle child parts – link back to set Hide basic display of dc.relation – info in hotlinks instead Hide basic display of dc.relation – info in hotlinks instead More abstract content models, to reduce redundant changes, would have helped More abstract content models, to reduce redundant changes, would have helped

Demonstration