Digital Libraries, Archives, and Large Data Sets Alexa T. McCray National Library of Medicine Bethesda, Maryland USA WHOI, June 3, 2004.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PubMed Central Mahyar Ahmadpour-B. Kowsar Publicatin Corp. Kowsar Editorial Meeting 1 September 19th, 2013 Tehran, Iran.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
Brian A. Carlsen Apelon, Inc. Tools For Classification Integration Networked Knowledge Organization Systems/Services Workshop June 28, 2001.
WMS: Democratizing Data
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
Metadata : Setting the Scene or a Basic Introduction Wendy Duff University of Toronto, Faculty of Information Studies.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Themes Architecture Content Metadata Interoperability Standards Knowledge Organisation Systems Use and Users Legal and Economic Issues The Future.
Aligning library-domain metadata with the Europeana Data Model Sally CHAMBERS Valentine CHARLES ELAG 2011, Prague.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
Media Arts and Technology Graduate Program UC Santa Barbara MAT 259 Visualizing Information Winter 2006George Legrady1 MAT 259 Visualizing Information.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Digital Preservation Initiatives in the United States A Summary Deanna B. Marcum.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
Building A Repository for Digital Objects
Joseph JaJa, Mike Smorul, and Sangchul Song
Knowledge Management Systems
Statewide Digitization and the FCLA Digital Archive
DIGITAL LIBRARY MANAGEMENT
Metadata to fit your needs... How much is too much?
Presentation transcript:

Digital Libraries, Archives, and Large Data Sets Alexa T. McCray National Library of Medicine Bethesda, Maryland USA WHOI, June 3, 2004

What is a digital library? “… an electronic information access system that offers the user a coherent view of an organized, selected, and managed body of information.” (Lynch, 1995) An organization that provides the resources “… to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use…” (Waters, 1998)

Data Creation Data Capture, Management, and Preservation Data Access Conceptual Model of a Digital Library Content Creators Users

Long Term Preservation & Archiving OAIS (Open Archival Information System) standard - Developed by NASA for long term preservation, archiving, data management, and access Both digital and physical archives Address impacts of changing technology - New media and data formats - Changing user community

OAIS Framework for data management Functional model for - Preservation planning - Data management - Archival storage - Persistent access

Digital Libraries Initiative Research initiative lead by the National Science Foundation in collaboration with a number of other Federal agencies Research goal is to investigate improved methods for creating, managing and accessing large information resources and repositories

Research Foci Content and Collections Systems-centered digital library research Human-centered digital library research Testbeds and Applications

Content and Collections Data capture, representation, preservation Metadata Domain specific information objects Intellectual property rights New economic and business models for digital libraries

Systems-centered Research Open, networked architectures System scalability Intelligent agents Systems evaluation and performance studies Data compression Authentication

Human-centered Research Information discovery and retrieval methods Intelligent user interfaces Information visualization User and usability studies Social implications of digital libraries

Testbeds and Applications Specialized tools for e.g., - Document mark up - Metadata encoding Specialized applications for specific domains Allow development of new methods for knowledge discovery and data mining

The Vocabulary Problem Same string, different meaning Different string, same meaning Different string, similar meaning - Unrecognized relationship Implicit conventions Implicit hierarchies - Variety of relationships

Unified Medical Language (UMLS) System Long term National Library of Medicine project Problem the UMLS is attempting to solve: - Provide integrated access to biomedical information in disparate biomedical information systems Bibliographic, factual databases, decision support systems, knowledge-based systems

UMLS Knowledge Sources Metathesaurus - Large number of biomedical concepts SPECIALIST Lexicon - General English and biomedical lexical items, tools for recognizing linguistic variation Semantic Network - Conceptual framework for the UMLS

Metathesaurus Metathesaurus - Over one million concepts; 90 families of vocabularies - Broad coverage of the vocabulary used in the biomedical sciences Basic science research Clinical medicine Health services

Broad Coverage of Biomedicine Several perspectives - clinical terms (SNOMED) - information sciences (MeSH, CRISP) - administrative terminologies (ICD-CM, CPT-4) Specialized vocabularies - genomics (Gene Ontology, NCBI organism taxonomy) - medical devices (UMD) - anatomy (UWDA, Neuronames)

From the Vocabularies to the Metathesaurus Vocabularies - terms - hierarchies Metathesaurus - organizes terms - organizes concepts - Relates concepts to other concepts Metathesaurus = Thesaurus of Thesauri

Common UMLS Representation One concept, multiple terms and strings - renal cell carcinoma CUI: C , LUI: L , SUI:S renal cell carcinomas CUI: C , LUI: L , SUI:S hypernephroma CUI: C , LUI: L , SUI:S Grawitz tumor CUI: C , LUI: L , SUI:S

Lexical Tools Manage lexical variation - Perform lexical transformations Generate inflectional variants, normalized forms Depend the SPECIALIST lexicon Used for preliminary algorithmic mapping as new vocabularies are added to the Metathesaurus

Digital Library Case Study: ClinicalTrials.gov Centralized system at NLM - Content provided by individual data providers, both federal and from the private sector Standard set of data elements in XML (eXtensible Markup Language) format - Summary; recruitment information; eligibility criteria; study design; intervention being studied, location and contact information

ClinicalTrials.gov

System Architecture: ClinicalTrials.gov

Digital Library Case Study: Profiles in Science Large scale digital conversion project Archival collections of eminent biomedical scientists of the twentieth century - Books, journal volumes, pamphlets, diaries, letters, manuscripts, photographs Materials in a variety of formats - Text, audio, still images, video Testbed for experiments in digital preservation

Profiles in Science

Metadata-driven Document Conversion Interpret metadata in broadest sense Use metadata to drive the entire system Metadata record is the basic unit in the system, managing the - Digitization process - Display and organization of the data - Network-based resource discovery - Archiving and Preservation

Metadata: Framework for Collection Management Metadata entry system manages all aspects of digitization process - Unique identifiers bind digital master files, Web-derivatives, and metadata records - Enforces quality control (pull-down menus, validation, error messages) - Reports that manage workflow - Security measures

Metadata: Display and Organization of the Data Series of programs generate Web pages from metadata database - Include consistency checking, validation Programs generate alternative views - alphabetical, chronological, resource type, content area

Metadata: Networked-based Resource Discovery “Dublin Core” metadata elements derived from metadata entry system - simplicity - semantic interoperability - international consensus

Metadata: Ensuring Preservation and Persistence Archiving responsibility Permanence rating Preservation actions History of origin

Broad Categories of Metadata Elements Content specific Medium specific Process specific Storage information Physical characteristics Preservation/provenance information

System Architecture: Profiles in Science

Digital Resources at the National Library of Medicine Four levels of permanence - Permanent: unchanging content, e.g., Profiles in Science scanned document - Permanent: stable content, e.g., MEDLINE record - Permanent: dynamic content, e.g., NLM home page - Permanence not guaranteed, e.g., fact sheets

Preservation of Digital Information “The conclusion reached by the impressive group of 21 experts was alarming – there is, at present no way to guarantee the preservation of digital information.” (Rothenberg, 1999) “Technological obsolescence [is] the greatest threat to digital collections.” (Kenney & Rieger, 2000)

Preservation Research at the National Library of Medicine Image Migration Framework - Prototype for image conversion, analysis, and preservation - Associated preservation metadata - Current experiments converting from one image format to another

TIFF to PNG to TIFF

Concluding Remarks Digital library data management - Requires technical decisions Adherence to standards, planning for change - Involves social issues Sharing of data and knowledge Open access to information - Implies promises to our users Integrity, currency, and persistence of data