Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mixed content, mixed metadata: Information discovery in the NSDL.

Similar presentations


Presentation on theme: "Mixed content, mixed metadata: Information discovery in the NSDL."— Presentation transcript:

1 Mixed content, mixed metadata: Information discovery in the NSDL

2 - 2 - Experience from American Memory and NSDL Caroline R. Arms and William Y. Arms Mixed content, mixed metadata: information discovery in a messy world In Metadata in Practice, Editors: Diane Hillmann and Elaine Westbrooks, ALA Editions (forthcoming)

3 - 3 - The Integration Task is to provide a coherent set of collections and services across great diversity (all digital collections relevant to science education). The National Science Digital Library http://nsdl.org/

4 - 4 - Mixed Content Examples: NSDL-funded collections at Cornell Atlas. Data sets of earthquakes, volcanoes, etc. Reuleaux. Digitized kinematics models from the nineteenth century Laboratory of Ornithology. Sound recording, images, videos of birds and other animals. Nuprl. Logic-based tools to support programming and to implement formal computational mathematics.

5 - 5 - Effective Information Discovery Before Digital Information Searching (a)Resources separated into categories of related materials. Each category organized, indexed and searched separately. (b)Catalogs and indexes built on tightly controlled metadata standards, e.g., MARC, MeSH headings, etc. (c)Search engines used Boolean operators and fielding searching. (d)Query languages and search interfaces assumed a trained user. (e)Resources were physical items.

6 - 6 - Effective Information Discovery With Homogeneous Digital Information Comprehensive metadata with Boolean retrieval Can be excellent for well-understood categories of material, but requires standardized metadata and relatively homogeneous content (e.g., MARC catalog). Full text indexing with ranked retrieval Can be excellent, but methods developed and validated for relatively homogeneous textual material (e.g., TREC ad hoc track).

7 - 7 - Mixed Metadata: the Chimera of Standardization Technical reasons (a)Characteristics of formats and genres (b)Differing user needs Social and cultural reasons (a)Economic factors (b)Installed base

8 - 8 - Cross-Domain Metadata Dublin Core "... indexes [such as Lycos] are most useful in small collections within a given domain. As the scope of their coverage expands, indexes succumb to problems of large retrieval sets and problems of cross-disciplinary semantic drift. Richer records, created by content experts, are necessary to improve search and retrieval." [Weibel 1995]

9 - 9 - Information Discovery in a Messy World Web search engines have adapted to a very large scale. Other techniques, such as cross-domain metadata and federated searching have failed to scale up. What new concepts and techniques have enabled this adaptation? What can we learn that is applicable to other information discovery tasks? How is NSDL making use of this understanding?

10 - 10 - Information Discovery in a Messy World Building blocks Brute force computation The expertise of users -- human in the loop Methods (a)Better understanding of how and why users seek for information (b)Relationships and context information (c)Multi-modal information discovery (d)User interfaces for exploring information

11 - 11 - Understanding How and Why Users Seek for Information Homogeneous content All documents are assumed equal Criterion is relevance (binary measure) Goal is to find all relevant documents (high recall) Hits ranked in order of similarity to query Mixed content Some documents are more important than other Goal is to find most useful documents on a topic and then browse Hits ranked in order that combines importance and similarity to query

12 - 12 - Relationship and Contextual Information Methods for capturing context Analysis of citations and links (e.g., PageRank) Mining usage logs (e.g., customers who buy the same product) Reviews (e.g., reputation management) Structural relationships (e.g., domain names)

13 - 13 - Multi-Modal Information Discovery With mixed content and mixed metadata, the amount of information about the various resources varies greatly but clues from many difference sources can be combined. "The fundamental premise of the research was that the integration of these technologies, all of which are imperfect and incomplete, would overcome the limitations of each, and improve the overall performance in the information retrieval task." [Wactlar, 2000]

14 - 14 - User Interfaces for Exploring Information Search index Return hits Browse content Return objects

15 - 15 - NSDL: The Spectrum of Interoperability LevelAgreementsExample FederationStrict use of standardsAACR, MARC (syntax, semantic, Z 39.50 and business) HarvestingDigital libraries exposeOpen Archives metadata; simplemetadata harvesting protocol and registry GatheringDigital libraries do notWeb crawlers cooperate; services mustand search engines seek out information

16 - 16 - Users Collections NSDL Repository The NSDL Repository Services The repository is a resource for service providers. It holds information about every collection and item known to the NSDL, including contextual information.

17 - 17 - NSDL Search Service: First Phase Portal Search and Discovery Service Collections SDLIP harvest crawl NSDL Repository Inquery -> Lucene

18 - 18 - NSDL Search Service: First Phase Approach (a)Collections map metadata to Dublin Core, provide via Open Archives protocol. (b)Search service augments Dublin Core metadata with indexing of full-text where available. (c)User interface returns snippets derived from the metadata, links to full content and to metadata.

19 - 19 - NSDL Search Service: First Phase Weaknesses (a)Ranking by similarity to query not sufficient. (b)Snippets do not indicate why item was returned (e.g., terms in full text but not in metadata). (c)Dublin Core records provide limited information. (d)Browsing environment limited. (e)Most users begin their search with a Web search engine (e.g., Google)

20 - 20 - NSDL Search Service: Second Phase Developments Metadata (a)Accept any metadata that is available in a range of formats (b)System for reviews and annotations, with reputation management Search system (a)Multimodal retrieval and ranking (b)Dynamic generation of snippets by search engine

21 - 21 - NSDL Search Service: Second Phase Developments (cont.) Usability and human factors (a)Wider range of browsing tools (e.g., collection visualization) (b)Filters by education level and education quality, where known Web compatibility (a)Expose records for Web crawlers to index (b)Browser bookmarklet to add NSDL information to Web pages

22 Mixed content, mixed metadata: Information discovery in the NSDL


Download ppt "Mixed content, mixed metadata: Information discovery in the NSDL."

Similar presentations


Ads by Google