Download presentation
Presentation is loading. Please wait.
1
OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005
2
What is OAIster? Is/was a means for UM to test the OAI protocol… (hence the name) A method for sharing metadata among institutions and groups of people A means of developing a search service for end-users worldwide
3
Basics of OAI
4
What does OAIster collect? Harvests all metadata from all OAI data providers (within reason) Only keeps metadata that points to digital objects, e.g., articles, photographs, datasets, etc. in digitized form All available via search service…
5
Searching OAIster Time to show off OAIster… http://www.oaister.org/
6
A little history Service is now 3.5 years old Started with 66 data providers and a little over 200K records Now have 572 data providers and “a little” over 6 million records 37% US, 63% international
7
Visibility of OAI Surprising who hasn’t made their metadata shareable through OAI Harvard, Yale, Stanford…the big ones Initially perplexing, but now clearer: always done at the end only recently thought of at initiation of projects truthfully, many institutions not collaborative…
8
Examples of data providers Many data providers are huge, e.g., arXiv: physics preprint and postprint articles pubmed: medical articles, although restricted pictureaustralia: images from govt and academic institutions in Australia lcoa: Library of Congress digital archives usc: U South California census data
9
Examples of data providers Most are small, though Many around 100 records Value of making their records available increased visibility inclusion in bigger search service than theirs incorporation in Yahoo! Search
10
Yahoo! Search Two years ago, collaborated with team at Yahoo! Search to send our metadata to them for indexing e.g., “gardens at albury” in Yahoo! Search know it’s not static html roboting IspartOf Victorian Railways collection. IspartOf Victorian Railways collection. Many, many more hits Also send metadata to Google
11
System design UM harvester Record storage XSLT transformation tool BibClass indexes OAI-enabled DC records Non-OAI- enabled DC records XSL stylesheets (per source type) Search interface (XPAT)
12
Transformation of metadata Most metadata needs to be brushed off adding an http:// to the front of URLs Or raked removing instances of <![CDATA[ Or wrung out instead of “Where’s Waldo,” it’s “Where’s the incorrect UTF-8 character?” And should be normalized…
13
Why normalize? Sample date values <date>2-12-01</date><date>2002-01-01</date><date>0000-00-00</date><date>1822</date> between 1827 and 1833 between 1827 and 1833 <date>18--?</date> November 13, 1947 November 13, 1947 SEP 1958 SEP 1958 235 bce 235 bce Summer, 1948 Summer, 1948
14
Why use a CV? Sample subject values <subject>30,51,52</subject> 1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson]. 1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson]. Slavery--United States--Controversial literature Slavery--United States--Controversial literature view of interior with John Henry sculpture view of interior with John Henry sculpture Particles (Nuclear physics) -- Research. Particles (Nuclear physics) -- Research.
15
Best practices Fixing more than half of the data providers is cumbersome Individuals at OAI-enabled institutions started a “Best Practices” group to inform data providers what they ought to do http://oai-best.comm.nsdl.org/cgi- bin/wiki.pl?TableOfContents
16
2nd phase OAI “Best Practices” group sponsored by the Digital Library Federation, which also… Sponsors our latest grant Better and more easily calculated statistics Search interface improvements Clustering / classification techniques Using richer metadata
17
Clustering / classification Using automated means to take a selection of metadata and determine “what it’s about” Working with Emory University (one of our grant partners) to test their tool Results will be integrated into search so can search in smaller group of OAIster records
18
Using richer metadata Data providers must use simple Dublin Core Very sparse schema for describing objects dc:title must contain main title, sorted title and alternative titles dc:subject doesn’t distinguish between geographical, hierarchical, temporal…
19
Using richer metadata Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) from LOC Developed testbed for grant deliverables currently only shows MODS work… http://www.hti.umich.edu/m/mods/
20
Other stuff Well, make it smaller somehow… Clean up Boolean interface squinch fields together include more normalization Make it available through federated search Proselytize sharing metadata Test, test, test
21
Contact me Kat Hagedorn UM Library Information Technology khage@umich.edu www.oaister.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.