OAI-PMH Projects in the University Library Briefing for GSLIS Digital Library Fellows Timothy W. Cole (t-cole3@uiuc.edu) Jenny Benevento (benevent@uiuc.edu) Muriel Foulonneau (mfoulonn@uiuc.edu) University of Illinois at Urbana-Champaign Grainger Library | 28 October 2005 http://dli.grainger.uiuc.edu/Publications/TWCole/OAI-Projects.ppt
Projects IMLS Digital Collections & Content (http://imlsdcc.grainger.uiuc.edu/) Funded by IMLS through September 2007 Project Coordinator: Jenny Benevento Near-term: Add LSTA funded projects (incl. spring workshop); integrate collection & item-level interfaces OAI-CIC Collaborative Metadata Sharing Project (http://cicharvest.grainger.uiuc.edu/) Funded by CIC through June 2006 Project Coordinator: Muriel Foulonneau Near-term: one-on-one usability testing; more research on enrichment of metadata; analysis of metadata 28 Oct 2005 t-cole3@uiuc.edu
IMLS DCC – Context Project began December 2002 Project Objectives: Implement a collection registry of digital collections created or developed with funding from IMLS NLG program Use OAI-PMH to implement an item-level metadata repository for content objects contained in NLG collections Carry out associated research related to: Utility and usability of Registry & Repository Current metadata practices of IMLS NLG grantees Implications for interoperability (Framework of Guidance for Building Good Digital Collections) 28 Oct 2005 t-cole3@uiuc.edu
Accomplishments to Date Collection Registry & item-level repository publicly accessible: http://imlsdcc.grainger.uiuc.edu/collections/ http://imlsdcc.grainger.uiuc.edu/search/ Selected Research Publications to Date: Shreeves, S.L. & Cole, T.W. 2003. Developing a Collection Registry for IMLS NLG Digital Collections [Poster Abstract]. In DC-2003: Proceedings of the International DCMI Metadata Conference and Workshop p. 241-242. Cole, T.W. & Shreeves, S.L. 2004. Search and discovery across collections: The IMLS Digital Collections and Content Project. Library Hi Tech 22(3): 307-322 Palmer, Carole L. and Ellen M. Knutson. 2004. Metadata practices and implications for federated collections. In Proceedings of the 67th Annual Meeting of the American Society for Information Science and Technology , Edited by Linda Schamber & Carol L. Barry. Medford, NJ: Information Today, Inc: 456-462. Shreeves, S.L., Knutson, E.M., Stvilia, B., Palmer, C.L., Twidale, M.B., & Cole, T.W. (2005). Is ‘quality' metadata ‘shareable' metadata? The implications of local metadata practice on federated collections. In H.A. Thompson (Ed.) Proceedings of the Twelfth National Conference of the Association of College and Research Libraries, April 7-10 2005, Minneapolis, MN . Chicago, IL: Association of College and Research Libraries. 28 Oct 2005 t-cole3@uiuc.edu
IMLS-DCC Collection Registry 108 primary NLG collection records Includes additional 40 records for sub-collections Also 29 associated / 61 physical collections 158 projects represented Objects represented in registered collections Images: prints, posters, photographs Text: books, archival finding aids, newspapers Digital surrogates of physical objects: artifacts, specimens, textiles Sound: oral histories, sound files, wax cylinders Interactive resources Moving image Datasets 28 Oct 2005 t-cole3@uiuc.edu
Collection-Level Description 28 Oct 2005 t-cole3@uiuc.edu
IMLS-DCC Metadata Repository 30 collections represented 35 repositories harvested ~200,000 item-level records indexed Harvest in Simple Dublin Core & MARC 21 XML Testing with Qualified Dublin Core Issues explored: How metadata best practices are being adopted Interface customization based on audience Local context versus global context Intentional creation and collection development of digital collections 28 Oct 2005 t-cole3@uiuc.edu
Sample of Metadata Statistics http://imlsdcc. grainger. uiuc Collection 1 Collection 2 Collection 3 Collection 4 Number of records in collection 27,444 14,425 1,599 35 Type of collection Large collaborative digitization project Large academic library Small academic library and public library collaboration Small academic library % of records with DC element <title> 100 99 % of records with DC element <creator> 59 57 83 % of records with DC element <subject> 97 28 Oct 2005 t-cole3@uiuc.edu
Plans for next 2 years Other themes for extension phase IMLS has extended grant through 2007 Will add a handful of Illinois LSTA-funded projects in 2006 Workshop spring 2006 for LSTA projects More LSTA projects added in 2007 Other themes for extension phase Integration of item-level & collection-level services Metadata normalization, transformation, and enrichment Interface & metadata design for targeted audiences (GEM) Collection Identity & metadata granularity Knowledge diffusion of metadata best practices 28 Oct 2005 t-cole3@uiuc.edu
IMLS DCC project contact information Tim Cole PI, IMLS Digital Collections and Content University of Illinois Library at Urbana-Champaign t-cole3@uiuc.edu Jenny Benevento Project Coordinator, IMLS Digital Collections and Content University of Illinois at Urbana-Champaign benevent@uiuc.edu 217-244-7809 28 Oct 2005 t-cole3@uiuc.edu
OAI-CIC collaborative metadata sharing project Sept. 2003 – June 2006 10 -13 participants 19 data providers in 10 CIC institutions. University of Iowa has 2 repositories in test From 350,000 to 520,000 resources Objectives Reinforcing the regional collaboration Investigating metadata shareability OAI – PMH as technical infrastructure for metadata sharing A Web interface to CIC aggregated material 28 Oct 2005 t-cole3@uiuc.edu
Repositories behavior By collection + regular increase By collection only Regular increase No change Dead 28 Oct 2005 t-cole3@uiuc.edu
Configuration of harvests Handle deleted records Harvests set descriptions 3 metadata formats UDC QDC MODS Different degrees of XML validations Harvest either by set or full repository 28 Oct 2005 t-cole3@uiuc.edu
We have particularly encouraged Use of rich metadata formats Division of repositories into sets Creation of set descriptions The use of resumption tokens 28 Oct 2005 t-cole3@uiuc.edu
Areas of ongoing research Clustering Topicality? Multiple levels of granularity Multiple metadata formats Data normalization and enrichment Collection definition The resources behind the URLs 28 Oct 2005 t-cole3@uiuc.edu
Workflow : 3 streams Rebuild tables and indexes Normalize Data Harvest SQL database archive Institution / repository Rebuild DLXS Indexes Classify for DLXS Institution / type Institution / collection Digital only 28 Oct 2005 t-cole3@uiuc.edu
Additional workflow for collections Item database merge Collections.xml SQL db enrich Harvest set descriptions DC Coll. records DLXS MARC records Grab thumbshots of collection homepages 28 Oct 2005 t-cole3@uiuc.edu
Types of reprocessing Selection Cleaning Normalization Augmentation Customization for ingest in applications Looking at ways to reprocess metadata to support specific services / functionalities for end-users 28 Oct 2005 t-cole3@uiuc.edu
Metadata processing by DL function FIND CO-LOCATE 28 Oct 2005 t-cole3@uiuc.edu
More DL functions CO-LOCATE SELECT INTERPRET OBTAIN / IDENTIFY 28 Oct 2005 t-cole3@uiuc.edu
Faceted access points 28 Oct 2005 t-cole3@uiuc.edu
Potential objectives & obstacles Objectives / Benefits Experiment with a Faceted interface Should maximize utility of metadata reprocessing Should avoid DP to repeat values for multiple purposes Retain provenance of metadata elements Problems Verbose enriched records with a lot of redundancy Not sure how to share this back out with others 28 Oct 2005 t-cole3@uiuc.edu
Resources behind the URLs <title>My resource</title> <date>04 <title>My resource</title> <date>04 404 Page not found <title>My resource</title> <date>04 <title>My resource</title> <date>04 <title>My resource</title> <date>04 <title>My resource</title> <date>04 28 Oct 2005 t-cole3@uiuc.edu
Thumbnails Metadata schema enrichment The Thumbgrabber application One element from the Picture Australia schema The Thumbgrabber application thumbnails and thumbshots – currently 35,000 How to convey information? Jump-off pages Additional metadata record ? Same problems trying to grab full content 28 Oct 2005 t-cole3@uiuc.edu
Integrated Access to CIC Metadata 4 views / filtering http://nergal.grainger.uiuc.edu/cgi/b/bib/oaister 28 Oct 2005 t-cole3@uiuc.edu
Geographic access to resources 28 Oct 2005 t-cole3@uiuc.edu
Experiment to use collection level descriptions 28 Oct 2005 t-cole3@uiuc.edu
Collection-enabled functions Co-locating resources Grouping results Browsing collections Filtering results Selecting relevant search results Interpret item description Search granularity Search collections only Search items with collection information 28 Oct 2005 t-cole3@uiuc.edu
Adding context 28 Oct 2005 t-cole3@uiuc.edu
Filtering / selecting 28 Oct 2005 t-cole3@uiuc.edu
Searching / selecting source / co-locating 28 Oct 2005 t-cole3@uiuc.edu
Suggesting complementary resources 28 Oct 2005 t-cole3@uiuc.edu
Match cases for multiple terms queries Item desc. Collection desc. Case A Part of Query Rest of Query Case B No match All of Query Case C Case D Case E Case F Case G Case H Case J 28 Oct 2005 t-cole3@uiuc.edu
Test with real-life queries # of queries with at least 1 item-level match of the case % of queries with at least 1 item-level match of the case Case A 287 17.00% Case B 21 1.20% Case C 761 45.10% Case D 25 1.50% Case E 20 Case F 222 13.10% Case G 1,639 97.00% Case H 940 55.70% Case J 945 56.00% Partial match Rest of match Full match No match 28 Oct 2005 t-cole3@uiuc.edu
Outputs to date Contributions to DLF-NSDL best practices http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents Articles JCDL 2005 – Denver : Using Collection Descriptions to Enhance an Aggregation of Harvested Item-Level Metadata ECDL 2005 - Vienna: Strategies for reprocessing aggregated metadata ICDAT 2005 - Taipei : Metadata aggregation for digital libraries D-Lib – Jan 2006 – Automated capture of thumbnails and thumbshots for use by metadata aggregation services Sci Tech Lib. 2006 : The CIC metadata portal: A collaborative effort in the area of digital libraries 28 Oct 2005 t-cole3@uiuc.edu
DLXS interface Integration of collection – item features in DLXS Collection level descriptions And classification by Type and / or Subject? Faceted search Additional access points OAI provider repository Federated search target 28 Oct 2005 t-cole3@uiuc.edu
Metadata Topicality Granularity At collection level Automatic classification (might work better with full content) Granularity A format and guidelines for collection / set descriptions? Can collections live without items? What about Websites -> descriptions? Is there anything we can do with EAD files? => the DLF-NSDL best practices for shareable metadata 28 Oct 2005 t-cole3@uiuc.edu
Usability testing Quantitative information on discoverability Using OAIster logs User testing Indiana University contributed to build a plan to test Collections and context Thumbnails 28 Oct 2005 t-cole3@uiuc.edu
CIC metadata project contact information Tim Cole UIUC PI for OAI-CIC metadata sharing project University of Illinois Library at Urbana-Champaign t-cole3@uiuc.edu Muriel Foulonneau Project Coordinator, OAI-CIC metadata harvesting service University of Illinois at Urbana-Champaign mfoulonn@uiuc.edu 217-244-7809 28 Oct 2005 t-cole3@uiuc.edu