Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003.

Slides:



Advertisements
Similar presentations
Current State of Play in Digital Preservation Peter B. Hirtle Cornell University Library Society of American Archivists.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
IST Humboldt University Berlin, Germany – Computer and Media Service – Electronic Publishing Group Birgit Matthaei, 4th Sept. 2003, Bath,
National Diet Library Digital Archive Portal - PORTA - Gateway to digital information in Japan April 3, 2008 Hideki Takeuchi Planning.
Relational Database and Data Modeling
Presented to: By: Date: Federal Aviation Administration Registry/Repository in a SOA Environment SOA Brown Bag #5 SWIM Team March 9, 2011.
A centre of expertise in digital information management IMS Digital Repositories Interoperability Andy Powell UKOLN,
FAIR – Focus on Access to Institutional Resources William J Nixon DAEDALUS Project, University of Glasgow e-libraries for e-learning.
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
Collection-level description in practice Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London, 22 February 2002.
An introduction to collections and collection-level description Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London,
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
NSF – DLF – JISC/UKOLN Digital Library Service Registry Workshop National Science Foundation, Arlington, VA March 2006 The University of Illinois.
University of Michigan’s OAI Metadata Harvesting Project Kat Hagedorn OAIster Librarian, UM April 16, 2002.
University of Michigan’s OAI Metadata Harvesting Project Kat Hagedorn OAIster Librarian, UM May 12, 2002.
NSDL 2 nd Generation Mathematics Digital Library ASEE Annual Meeting June 13, 2005 Portland, OR William H. Mischo
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
IMLS Grant: University of Michigan’s Role Kat Hagedorn
OAIster Kat Hagedorn University of Michigan Libraries September 12, 2007.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 An introduction to the NSDL William Y. Arms Cornell University.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
IMLS NLG Collection Registry & Item-Level Metadata Repository at the University of Illinois Timothy W. Cole Mathematics Librarian &
Metadata Repositories for Interoperable/Shareable Metadata.
Creating rich shareable metadata: The DLF Aquifer MODS implementation guidelines Sarah L. Shreeves University of Illinois at Urbana-Champaign ALA Annual.
Supporting further and higher education The UK FAIR Programme: OAI in context Chris Awre OAI3, CERN, February 2004.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries.
I Never Met a Data I Didn’t Like Metadata Issues in Local and Shared Digital Collections Presentation to ALCTS Electronic Resources Interest Group January.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
Digitization Training and Metadata The View from Two UIUC Projects Sarah L. Shreeves University of Illinois at Urbana-Champaign Truth and Consequences.
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
OCKHAM: Fostering DL Interoperability through Reference Models and Lightweight Protocol Networks Martin Halbert Emory University Director for Library Systems.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
DAEDALUS - An ePrints Case Study William J Nixon Service Development Susan Ashworth Advocacy.
Search Interoperability, OAI, and Metadata An Introduction to the OAI Protocol for Metadata Harvesting Sarah Shreeves University of Illinois at Urbana-Champaign.
DLF Fall Forum The Distributed Library: OAI for Digital Library Aggregation UIUC’s Role: Registry of OAI Data Providers
Do Real Archivists Use OAI? Mid-Atlantic Regional Archives Conference Gettysburg, PA October 31, 2003 Chris Prom Assistant University Archivist University.
Metayogi Increasing the Accessibility of the Semantic Web Karim Tharani Doug Macdonald Rachel Heidecker.
Digital Library Development: Springboard to State-Wide Access Barbara I. Dewey Dean of Libraries University of Tennessee.
OAI metadata: why and how Jenn Riley Metadata Librarian Indiana University.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Utility of an OAI Service Provider Search Portal
Lightweight Reference Models and the OCKHAM Framework
Outline Pursue Interoperability: Digital Libraries
Integrating Access for Information Discovery and More
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Institutional Repositories
IMLS Grant: University of Michigan’s Role
Presentation transcript:

Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

Mellon Metadata Initiative – Slide 2ECDL 2003 – Trondheim, Norway Overview Highlights of the Mellon projects Findings regarding metadata harvesting Questions about the context of metadata and metadata harvesting Next steps, subsequent research projects

Highlights of the Projects

Mellon Metadata Initiative – Slide 4ECDL 2003 – Trondheim, Norway Andrew W. Mellon Foundation Mellon is a major U.S. private philanthropic foundation that has been involved with the OAI-PMH from the beginning Sought to foster projects exploring how the OAI-PMH could be used by libraries and other organizations supporting research to make metadata concerning scholarly collections more visible to users Funded seven projects in 2001 with total of US $1.5M

Mellon Metadata Initiative – Slide 5ECDL 2003 – Trondheim, Norway Seven Projects 1.University of Illinois at Urbana-Champaign 2.The University of Michigan (OAIster) 3.Emory University (MetaArchive) 4.SOLINET / ASERL (AmericanSouth) 5.The Research Libraries Group (RLG) 6.University of Virginia 7.(Woodrow Wilson International Center for Scholars at the Smithsonian)

Mellon Metadata Initiative – Slide 6ECDL 2003 – Trondheim, Norway Highlights of Projects OAIster and UIUC Repository harvested millions of records and developed sophisticated search tools Emory and SOLINET MetaScholar projects harvested focused collections, enhanced existing OSS harvesting tools, formed teams of scholars and librarians to study the process and context of metadata harvesting for research portals Other projects examined internal uses of OAI-PMH for cultural scholarship

Findings Concerning Metadata Harvesting

Mellon Metadata Initiative – Slide 8ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Slow Adoption of the OAI-PMH Most institutions with cultural materials collections have not yet implemented the protocol in the period This is due to many reasons: lack of institutional priority, insufficient technical staff, little organizational understanding of the benefits of the protocol However, both Emory and Illinois found that centralized regional centers providing relatively modest OAI technical expertise to other libraries was very effective in fostering adoption of the protocol

Mellon Metadata Initiative – Slide 9ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Problems with Institutional Metadata Wide variations in implementation of Unqualified Dublin Core (UDC) descriptive metadata elements Duplication of records between collaborating institutions, difficult to de-dupe due to lack of unique inter-institutional identifiers Format incompatibilities/collisions, especially between Encoded Archival Descriptions (EAD) and UDC record perspectives Inconsistent access restrictions to content leads to confusion by users

Mellon Metadata Initiative – Slide 10ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Problems with Inst. Metadata (cont.) No controlled vocabulary in effect for any UDC field, nor would this make sense for most fields Although universal systems such as US Library of Congress Subject Headings (LCSH) exist, they are not granular enough for most repositories No uniform mechanism in place to express dates or locations (coverage), which can mean many things in UDC, and no authority control for creator field 96% of institutional repositories using Eprints software do not use standard controlled vocabularies

Mellon Metadata Initiative – Slide 11ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Need for Metadata Gardening The best way to make metadata effective cross- institutionally is to coordinate the entire life cycle of metadata production Uncoordinated harvesting is relatively easy to do, but the resulting metadata aggregation then suffers from all the problems previously described and needs remediation (which may be effectively impossible)

Mellon Metadata Initiative – Slide 12ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Need for Metadata Gardening (cont.) Coordinated gardening of metadata is the long- standing solution to this problem Examples include virtually any community of information users that have come up with consistent standards for the metadata they share The problem is that new information communities are still forming, having been enabled by the OAI-PMH Mature information communities are mature precisely because they have well-understood standards and practice in using and sharing information

Findings Concerning Metadata Context

Mellon Metadata Initiative – Slide 14ECDL 2003 – Trondheim, Norway Metadata Context Metadata without a context is useless, much like encrypted information without the key Metadata is considered useful precisely because it is created in particular contexts by particular communities OAI-PMH only prescribes UDC format UDC is some context, and is (probably?) better than nothing, but many groups inaccurately thought that it was enough context to build robust discovery systems around

Mellon Metadata Initiative – Slide 15ECDL 2003 – Trondheim, Norway Metadata Context Findings: Recovering Context Different opinions among the projects over how to recover context for aggregated heterogeneous metadata OAIster made some efforts to normalize some UDC metadata fields after harvesting (UDC type field) Illinois developed mechanism for displaying original EAD context of records disaggregated from finding aid series information Emory/SOLINET AmericanSouth has a team of nationally renowned scholars studying how online scholarship can contextualize metadata and vice versa

Mellon Metadata Initiative – Slide 16ECDL 2003 – Trondheim, Norway Metadata Context Findings: Harvesters vs. other Discovery Systems How do we understand harvesters vs. online catalogs, Google, and commercial databases? How do we articulate the difference to users? What information should we aggregate and make searchable? Metadata and crawled web content? Very different information realms need to be bridged through new federated search mechanisms

Next Steps and Subsequent Research

Mellon Metadata Initiative – Slide 18ECDL 2003 – Trondheim, Norway Next Steps for Emory, Michigan, and Illinois All of these projects learned a great deal during the Mellon Metadata Harvesting Initiative that has informed their subsequent planning for new services All of these projects are in the process of being mainstreamed using various strategies All of these projects continue to grapple with metadata quality and context issues

Mellon Metadata Initiative – Slide 19ECDL 2003 – Trondheim, Norway Next Steps: Illinois Additional research is being undertaken on the integration of EAD and OAI Beginning a three year collaboration with the research libraries of other Committee on Institutional Cooperation (CIC) institutions to study the potential of OAI-PMH to facilitate resource sharing NSF grant to develop digital libraries for scientific communities in connection with National Science Digital Library (NSDL) Institute for Museum and Library Services (IMLS) grant to develop an OAI-based registry of IMLS projects

Mellon Metadata Initiative – Slide 20ECDL 2003 – Trondheim, Norway Next Steps: Michigan Working on further techniques for metadata remediation –De-duplication –Normalization of more UDC fields –Further tailoring of metadata for research purposes Exploring use of OAIster in connection with campus courseware initatives

Mellon Metadata Initiative – Slide 21ECDL 2003 – Trondheim, Norway Next Steps: Emory Undertaking further modeling of scholarly portals based on metadata harvesting, with application to an international Irish Literature portal New grant from the Mellon Foundation to build on previous projects –Experiments in semantic clustering of metadata using support vector machines –Exploration of combining metadata harvesting and web crawling –Developing frameworks for federating loosely-coupled digital library components

Mellon Metadata Initiative – Slide 22ECDL 2003 – Trondheim, Norway Appreciation Enormous thanks go to the Andrew W. Mellon Foundation for advancing the understanding of metadata harvesting applications through these projects Mellon continues to be a driving force in the United States and internationally for research into digital library experiments benefiting scholarly communication

Mellon Metadata Initiative – Slide 23ECDL 2003 – Trondheim, Norway Contacts Martin Halbert Kat Hagedorn Joanne Kaczmarek