Mixed content, mixed metadata: Information discovery in the NSDL.


Similar presentations
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:

1 William Y. Arms Cornell University October 25, 2002 The National Science Digital Library (NSDL) as an Example of Information Science Research.
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Information Retrieval Review
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
1 DLESE in Context: Educational Computing, Digital Libraries and Scientific Education William Y. Arms Cornell University.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
1 CS 430 / INFO 430 Information Retrieval Lecture 22 Metadata 4.
1 NSDL The National Science Foundation's National Digital Library for Science, Mathematics, Engineering and Technology Education [a.k.a. Smete, NSDL, Learns,...]
SCORM-NSDL Workshop May 18, Educational Materials are Scattered across the Internet NASA Math Forum State standards Scientific American Ask.
1 CS 430: Information Discovery Lecture 21 Web Search 3.
Rethinking the library catalogue: making search work for the library user Sally Chambers The European Library
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
1 William Y. Arms September 26, 2002 A Research Program for Information Science with the NSDL as an Example.
1 An introduction to the NSDL William Y. Arms Cornell University.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Overview of Search Engines
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2.
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
NSDL: OAI and a large- scale digital library Carl Lagoze, Cornell University NSDL Director of Technology
Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
1 CS430: Information Discovery Lecture 18 Usability 3.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Mixed content, mixed metadata: Information discovery in the NSDL.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
1 The NSDL Program Stephen Griffin National Science Foundation.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Introduction to the Semantic Web and Linked Data
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
1 CS 430 / INFO 430 Information Retrieval Lecture 17 Metadata 4.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems.
Discovery and Metadata March 9, 2004 John Weatherley
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Alexandria Digital Library ADL Metadata Architecture Greg Janée.
Search Engine Architecture
Building Search Systems for Digital Library Collections
NSDL: OAI and a large-scale digital library
CS 430 / INFO 430 Information Retrieval
Overview of Information Retrieval
NSDL Data Repository (NDR)
Introduction to Information Retrieval
Web archives as a research subject
Discussion Class 9 Google.
Presentation transcript:

Mixed content, mixed metadata: Information discovery in the NSDL

- 2 - Experience from American Memory and NSDL Caroline R. Arms and William Y. Arms Mixed content, mixed metadata: information discovery in a messy world In Metadata in Practice, Editors: Diane Hillmann and Elaine Westbrooks, ALA Editions (forthcoming)

- 3 - The Integration Task is to provide a coherent set of collections and services across great diversity (all digital collections relevant to science education). The National Science Digital Library

- 4 - Mixed Content Examples: NSDL-funded collections at Cornell Atlas. Data sets of earthquakes, volcanoes, etc. Reuleaux. Digitized kinematics models from the nineteenth century Laboratory of Ornithology. Sound recording, images, videos of birds and other animals. Nuprl. Logic-based tools to support programming and to implement formal computational mathematics.

- 5 - Effective Information Discovery Before Digital Information Searching (a)Resources separated into categories of related materials. Each category organized, indexed and searched separately. (b)Catalogs and indexes built on tightly controlled metadata standards, e.g., MARC, MeSH headings, etc. (c)Search engines used Boolean operators and fielding searching. (d)Query languages and search interfaces assumed a trained user. (e)Resources were physical items.

- 6 - Effective Information Discovery With Homogeneous Digital Information Comprehensive metadata with Boolean retrieval Can be excellent for well-understood categories of material, but requires standardized metadata and relatively homogeneous content (e.g., MARC catalog). Full text indexing with ranked retrieval Can be excellent, but methods developed and validated for relatively homogeneous textual material (e.g., TREC ad hoc track).

- 7 - Mixed Metadata: the Chimera of Standardization Technical reasons (a)Characteristics of formats and genres (b)Differing user needs Social and cultural reasons (a)Economic factors (b)Installed base

- 8 - Cross-Domain Metadata Dublin Core "... indexes [such as Lycos] are most useful in small collections within a given domain. As the scope of their coverage expands, indexes succumb to problems of large retrieval sets and problems of cross-disciplinary semantic drift. Richer records, created by content experts, are necessary to improve search and retrieval." [Weibel 1995]

- 9 - Information Discovery in a Messy World Web search engines have adapted to a very large scale. Other techniques, such as cross-domain metadata and federated searching have failed to scale up. What new concepts and techniques have enabled this adaptation? What can we learn that is applicable to other information discovery tasks? How is NSDL making use of this understanding?

Information Discovery in a Messy World Building blocks Brute force computation The expertise of users -- human in the loop Methods (a)Better understanding of how and why users seek for information (b)Relationships and context information (c)Multi-modal information discovery (d)User interfaces for exploring information

Understanding How and Why Users Seek for Information Homogeneous content All documents are assumed equal Criterion is relevance (binary measure) Goal is to find all relevant documents (high recall) Hits ranked in order of similarity to query Mixed content Some documents are more important than other Goal is to find most useful documents on a topic and then browse Hits ranked in order that combines importance and similarity to query

Relationship and Contextual Information Methods for capturing context Analysis of citations and links (e.g., PageRank) Mining usage logs (e.g., customers who buy the same product) Reviews (e.g., reputation management) Structural relationships (e.g., domain names)

Multi-Modal Information Discovery With mixed content and mixed metadata, the amount of information about the various resources varies greatly but clues from many difference sources can be combined. "The fundamental premise of the research was that the integration of these technologies, all of which are imperfect and incomplete, would overcome the limitations of each, and improve the overall performance in the information retrieval task." [Wactlar, 2000]

User Interfaces for Exploring Information Search index Return hits Browse content Return objects

NSDL: The Spectrum of Interoperability LevelAgreementsExample FederationStrict use of standardsAACR, MARC (syntax, semantic, Z and business) HarvestingDigital libraries exposeOpen Archives metadata; simplemetadata harvesting protocol and registry GatheringDigital libraries do notWeb crawlers cooperate; services mustand search engines seek out information

Users Collections NSDL Repository The NSDL Repository Services The repository is a resource for service providers. It holds information about every collection and item known to the NSDL, including contextual information.

NSDL Search Service: First Phase Portal Search and Discovery Service Collections SDLIP harvest crawl NSDL Repository Inquery -> Lucene

NSDL Search Service: First Phase Approach (a)Collections map metadata to Dublin Core, provide via Open Archives protocol. (b)Search service augments Dublin Core metadata with indexing of full-text where available. (c)User interface returns snippets derived from the metadata, links to full content and to metadata.

NSDL Search Service: First Phase Weaknesses (a)Ranking by similarity to query not sufficient. (b)Snippets do not indicate why item was returned (e.g., terms in full text but not in metadata). (c)Dublin Core records provide limited information. (d)Browsing environment limited. (e)Most users begin their search with a Web search engine (e.g., Google)

NSDL Search Service: Second Phase Developments Metadata (a)Accept any metadata that is available in a range of formats (b)System for reviews and annotations, with reputation management Search system (a)Multimodal retrieval and ranking (b)Dynamic generation of snippets by search engine

NSDL Search Service: Second Phase Developments (cont.) Usability and human factors (a)Wider range of browsing tools (e.g., collection visualization) (b)Filters by education level and education quality, where known Web compatibility (a)Expose records for Web crawlers to index (b)Browser bookmarklet to add NSDL information to Web pages

Mixed content, mixed metadata: Information discovery in the NSDL