JISC/NSF PI Meeting, June 24-25 Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.

Slides:



Advertisements
Similar presentations
Contextual Linking Architecture Christophe Blanchi June Corporation for National Research Initiatives Approved for.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Interoperability Scenarios All Working Groups Meeting May, Rome, Italy.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Features and Uses of a Multilingual Full-Text Electronic Theses and Dissertations (ETDs) System Yin Zhang Kent State University Kyiho Lee, Bumjong You.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Information Retrieval in Practice
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Overview of Search Engines
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Metadata Extraction for NASA Collection June 21, 2007 Kurt Maly, Steve Zeil, Mohammad Zubair {maly, zeil,
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
CiNii Articles is a service that provides information on scholastic articles, with an emphasis on Japanese papers. It allows users to find the articles.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
An OAI-Compliant Federated Physics Digital Library for the NSDL Department of Computer Science Old Dominion University, Norfolk, VA In Collaboration.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
A Comparative Study of Specification Models for Autonomic Access Control of Digital Rights K. Bhoopalam,K. Maly, R. MukkamalaM. Zubair Old Dominion University.
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
© Ex Libris Ltd. All Rights Reserved. From Library Systems to Information SystemsMetaLib Jenny Walker ICOLC 2001.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Information Retrieval
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Feb 21-25, 2005ICM 2005 Mumbai1 Converting Existing Corpus to an OAI Compliant Repository J. Tang, K. Maly, and M. Zubair Department of Computer Science.
Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Information Retrieval in Practice
Search Engine Architecture
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Attributes and Values Describing Entities.
NSDL Data Repository (NDR)
Introduction to Information Retrieval
Database Design Hacettepe University
Attributes and Values Describing Entities.
Presentation transcript:

JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer Science Old Dominion University, Norfolk, VA K. Maly, M. Zubair, M. Nelson In Collaboration With Los Alamos National Laboratory (R. Luce) & American Physical Society (M. Doyle)

Motivation Lack of a federation service that provides an unified interface to diverse collections in the physics domain having metadata that differ in richness, syntax, and semantics

Motivation Dissemination and discovery of Physics resources Contributors LANL, APS, AIP, CERN researchers, teachers Users Students, teachers, researchers

Arc: The Basic Federation Engine

Challenges Resource Discovery –Diversity in metadata richness –Lack of controlled vocabulary –Ease of discovering (formula based discovery) –Cross linking support –Classification Creation and Maintenance –Freshness of metadata –Dynamic nature of collections –Filtering Economic Sustainability –Rights management –Who pays? For what?

Issues – No controlled vocabulary Different subject classifications Same authors but different rendering Same affiliation but different form

Interactive resource discovery approach components

Issues - Equation based search Representing search query Rendering of equations and embedding them into the HTML display Integrating into search interface Identifying equations inside the metadata Filtering equations Equation storage

Filtering Equations Errors in equation encoding, some examples: – missing "$" in LaTeX representation – illegal LaTeX symbols Simple equations like "n=3"

Filtering/categorizing Equations Approach: Use of "Stop Equation File" similar to "Stop Word File" used for indexing. In equation filtering context, the stop equation file consists of rules in form of regular expressions, which describe the LaTeX string to be dropped. The regular expression approach gives us the flexibility to describe easily variety of strings to be filtered.

How to search for records using equations? Three search alternatives (or any combination of these) for the user: Search for docs containing all formulae found in a) abstracts b) subject fields of documents containing user input ‘keywords’ Search for docs containing formulae defined by category (e.g. integrals, moments, limits) Browse formulae by various categorizations and search for docs containing selected formulae

Issues - Cross Linking References Obtaining references from full-text documents or parallel metadata sets Bad format of such references when obtained from full text Needed standard way to represent across collections

Issues – Name similarity Authors use different names for themselves and their affiliation Could use authority files, difficult to create and maintain across different collections

Similarity approach Clustering Iterative refinement approach: Coarse level clusters based on approximate string matching (edit-distance, soundex, n-gram) Refining clusters based on affiliation where available Presentation Allow user to follow search by clicking authors and then selecting appropriate, i.e., no authority files

Homogenizing User Space Enabling Web users to discover information in OAI collections (DP-9 Service) – Enabling OAI users to discover information in Web enabled non-OAI compliant collections/databases/web sites

DP-9 Service for Exposing OAI Collections to Web

Web Enabled Non-OAI Compliant Collections/Databases/ Web Sites Web Enabled Non-OAI Compliant Collections/Databases/ Web Sites Web Enabled Non-OAI Compliant Collections/Databases/ Web Sites OAI Service Provider Gateway to Non-OAI Collections WIDL Description (XML based language) WIDL Description (XML based language) WIDL Description (XML based language) Vac: Gateway Service for Harvesting Non-OAI Collections

Sample Description in WIDL of a Web enabled Non-OAI Collection

Federation/archives Consistency

Future Tasks Post processing of search results for easier navigation Exploiting richer metadata and handling diversity in metadata across all participating collections Concentrate on interactive search interface for resource discovery Data normalization, authority files, filtering Investigating different schemes for maintaining federation/archives consistency More high level services beyond formula based search and cross-linking User testing!!!!

Links ODU DL research group: – Main federation engine: – NSDL research: – ITR/IM research –

Not used

Automated metadata mapping approach