MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany.

Slides:



Advertisements
Similar presentations
OAF Workshop, May 13-14, 2002, Pisa.CYCLADES IST CYCLADES An Open Collaborative Virtual Archive Environment Umberto Straccia.
Advertisements

CYCLADES Kickoff 19/02/01 Gudrun Fischer, Norbert Fuhr University of Dortmund (Germany) CYCLADES Acess Service.
Relevance Feedback User tells system whether returned/disseminated documents are relevant to query/information need or not Feedback: usually positive sometimes.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Chapter 5: Introduction to Information Retrieval
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
The OCLC Metadata Switch Project Jean Godby, Thomas Hickey, Diane Vizine-Goetz OCLC Office of Research Digital Library Federation May 14, 2003.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Information Retrieval Review
ISP 433/533 Week 2 IR Models.
1 CS 502: Computing Methods for Digital Libraries Lecture 20 Multimedia digital libraries.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
U of R eXtensible Catalog Team MetaCat. Problem Domain.
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität.
Chapter 5: Information Retrieval and Web Search
Information Retrieval – Introduction and Survey Norbert Fuhr University of Duisburg-Essen Germany
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Information Retrieval – Introduction and Survey Norbert Fuhr University of Duisburg-Essen Germany
GESIS Dr. Maximilian Stempfhuber Head of Research and Development Social Science Information Centre, Bonn, Germany How to deal with heterogeneity when.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.
Vector Space Models.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
More on Document Similarity and Clustering How similar are these two documents (Again) ? Are these two documents about the same topic ?
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Improvement of Semantic Interoperability based on Metadata Registry(MDR) Doo-Kwon Baik Dept. of CSE Korea University.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Accessing a national digital library: an architecture for the UK DNER
Panagiotis G. Ipeirotis Tom Barry Luis Gravano
Information Retrieval on the World Wide Web
Information Retrieval
Data Model.
CYCLADES An Open Collaborative Virtual Archive Environment
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Presentation transcript:

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 2 Synopsis 1.Retrieval in Digital Libraries 2.Architecture 3.Terminology 4. Query process in detail query transformation resource selection data fusion 5. Resource gathering in detail 6.Project organisation

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 3 Retrieval in Digital Libraries ? ! ! ? ? ? ! query 1 query 3 query 4 DL 1 DL 3 DL 4 DL 2 result 1 result 3 result 4

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 4 Retrieval in Digital Libraries result 1 result 3 result 4 query 1 query 3 query 4 result 1 result 3 result 4 MIND ? ! ! ? ? ? ! query result query 1 query 3 query 4 DL 1 DL 3 DL 4 DL 2 ? ! ! ? ? ? !

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 5 Federated Digital Libraries Database-oriented approaches: –heterogeneity Information retrieval approaches: –vagueness and imprecision MIND bases on information retrieval approaches, extensions: –heterogeneity (e.g. query language, schema) –multimedia (text, facts, images, speech) –non-co-operative libraries (query interface only)

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 6 MIND Architecture Dispatcher: –library-independet work Co-operating proxies: –extend functionality of non- co-operating library –provide all information required by the dispatcher –standard implementation with textual resource descriptions (XML)

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 7 Terminology Media typePredicateDomainPredicate NameData type Schema Attribute Condition Query Condition PredicateAttributeValue (0.4,year,>,1998) Number of documents Attribute Document Value Attribute Value

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 8 Heterogenous schemas Required: uncertain mapping between schemas, used to transform user query to proprietary query Query Transformation titlecreator Dublin Core titleauthor RFC MARC 21 uncertain

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 9 Query Transformation Task: –transform user query to proprietary query Proxy: –transforms query condition by condition titleidentifiercreator

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 10 Query Transformation Attribute/Predicate: –mapping modeled in probabilistic Datalog probabilistic extension to Horn predicate logic weights for facts and rules –certain mapping rules dc_creator_equals(D,V)  marc_100_equals(D,V) dc_creator_equals(D,V)  marc_700_equals(D,V) dc_creator_equals(D,V)  marc_710_equals(D,V) –uncertain mapping rules 0.4 marc_100_equals(D,V)  dc_creator_equals(D,V) –rules and probabilities will be learned

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 11 Query Transformation Comparison value: –necessary, when domains do not match dates: “ ” versus “September 9, 2001” authors: “Fuhr, N.” versus “Norbert Fuhr” classification schemas: DDC versus ACM languages: German versus English image colour histogram: different dimensions –transformation: goal: automatic transformation several methods possible, unclear which will be used possibly: simple hardcoding in proxy

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 12 Resource Selection Task: –find relevant libraries w.r.t. the query Method: –decision-theoretic model –cost factors computation and communication time charges for delivery retrieval quality –goal: retrieve many relevant documents at low expected costs

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 13 Resource Selection DL 1 DL 2 DL 3 DL 4 s=(3,0,1,2) T s 1 =3 s 2 =0 s 3 =1 s 4 =2 Task: –calculate optimum selection vector s=(s 1,...,s l ) T expected retrieval costs EC i (s i ) minimal overal (summed up) expected costs Proxies: –calculate EC i (j), 1  j  n Dispatcher: –calculates optimum selection s=(s 1,...,s l ) T

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 14 Resource Selection

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 15 Resource Selection expected number of relevant documents in library –required: last sum of indexing weights indexing weightcondition weight estimate (relevance feedback if available) from resouce description

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 16 sum of indexing weights: –text, speech: e.g. normalised tf idf values as indexing weight –facts, images: feature vectors over continuous domain V clusters V j  V, centroid v i f: V  V  [0,1] retrieval metric approximation for indexing weight sum: Resource Selection V2 VV2 V centroid v 2 V1 VV1 V

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 17 Data Fusion Task: –optimise overall retrieval quality Proxies: –modify weights of their documents (normalisation) based on global idf values –provide local df values –create summaries Dispatcher: –merges results w.r.t. normalised document weights –computes global idf values

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 18 Resource Description Schema Uncertain schema mapping Statistical description of collection: –text, speech: terms document frequencies (df) sum of indexing weights –facts, images: clusters of feature vectors centroid vector, cluster radius (number of clusters determines granularity of metadata) number of vectors in cluster

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 19 Resource Gathering Task: –create and update resource description Proxy: –uses query-based sampling for statistical descriptions iterative retrieval of documents assumption: union of results is representative for whole collection extract resource description w.r.t. document sample –learns uncertain schema mapping rules –goal: learns library schema

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 20 Project Organisation Funded by the EU commision (FP 5) Duration: –January June 2003 Project participants: –University of Strathclyde (UK) (Coordinator) –University of Dortmund (Germany) –University of Florence (Italy) –University of Sheffield (UK) –Carnegie Mellon University (USA)

MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann, University of Dortmund, Germany 21 Conclusion MIND deals with vagueness and imprecision heterogeneity multimedia resource selection data fusion non-co-operation (resource descriptions) in federated digital libraries