CADIAL search engine at INEX

Slides:



Advertisements
Similar presentations
1 Automatic Indexing with the EuroVoc Thesaurus Enabling Cross-lingual Search Marie Francine Moens Katholieke Universiteit Leuven, Belgium Frane Šarić
Advertisements

Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Chapter 5: Introduction to Information Retrieval
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
XML Ranking Querying, Dagstuhl, 9-13 Mar, An Adaptive XML Retrieval System Yosi Mass, Michal Shmueli-Scheuer IBM Haifa Research Lab.
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
K.U. Leuven Leuven Morphological Normalization and Collocation Extraction Jan Šnajder, Bojana Dalbelo Bašić, Marko Tadić University of Zagreb.
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Information Retrieval in Practice
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Hybrid XML Retrieval Revisited Jovan Pehcevski PhD Candidate School of CS and IT, RMIT University
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Lesson 2 Technology: Federated Searching Explained.
Information Retrieval
ICAIL 2007 DESI Workshop Panel presentation Marie-Francine Moens Centre for Law and ICT/ Department of Computer Science Katholieke Universiteit Leuven,
Overview of Search Engines
Bruxelles, Computer Aided Document Indexing System (CADIS) with Eurovoc Bojana Dalbelo Bašić Faculty of Electrical Engineering and Computing.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
Leuven, Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty.
Project meeting Zagreb Computer Aided Document Indexing for Accessing Legislation Joint Flemish-Croatian project 5th project meeting Zagreb.
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Linking Wikipedia to the Web Antonio Flores Bernal Department of Computer Sciencies San Pablo Catholic University 2010.
K. Zagoris, K. Ergina and N. Papamarkos Image Processing and Multimedia Laboratory Department of Electrical & Computer Engineering Democritus University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Text Based Information Retrieval Text Based Information Retrieval H02C8A H02C8B Marie-Francine Moens Karl Gyllstrom Katholieke Universiteit Leuven.
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Search Engine Architecture
Searching Tutorial By: Lola L. Introduction:  When you are using a topic, you might want to use “keyword topics.” Using this might help you find better.
Compiling, processing and accessing the collection of legal regulations of the Republic of Croatia T. Didak Prekpalaj, T. Horvat, D. Miletić, D. Mokriš.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Performance Measurement. 2 Testing Environment.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
A search engine is a web site that collects and organizes content from all over the internet Search engines look through their own databases of.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
An Empirical Study of Learning to Rank for Entity Search
Search Engine Architecture
Information Retrieval and Web Search
Toshiyuki Shimizu (Kyoto University)
Murat Açar - Zeynep Çipiloğlu Yıldız
Computer Aided Document Indexing System for Accessing Legislation A Joint Venture of Flanders and Croatia Bojana Dalbelo Bašić Faculty of Electrical Engineering.
Citation-based Extraction of Core Contents from Biomedical Articles
Search Engine Architecture
A Suite to Compile and Analyze an LSP Corpus
Information Retrieval and Web Design
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

CADIAL search engine at INEX Jure Mijić1, Marie-Francine Moens2, Bojana Dalbelo Bašić1 1Faculty of Electrical Engineering and Computing jure.mijic@fer.hr, bojana.dalbelo@fer.hr 2Department of Computer Science, Katholieke Universiteit Leuven sien.moens@cs.kuleuven.be INEX 2008 Schloss Dagstuhl Conference Center, Wadern, Germany 2008-12-16 ITI2008 Cavtat 2008-06-25

Presentation overview What is CADIAL project? System overview Ranking model Ad hoc results Conclusion Future work INEX 2008 Dagstuhl 2008-12-16

What is CADIAL project? Bilateral project between the Government of Flanders and the Ministry of Science, Education and Sports of the Republic of Croatia Aims of the CADIAL project: Provide access to a collection of Croatian legislative documents Enable the use of the Eurovoc thesaurus, an EU standard thesaurus for document indexing and retrieval INEX 2008 Dagstuhl 2008-12-16

System overview Built with expandability in mind Supports multiple information retrieval models Supports morphological normalization modules An indexer tool is used for document indexing Input documents are in XML format Output is an index database (a base structure for every search engine model)‏ Index database is upgraded with additional data required by the model (various statistical information)‏ INEX 2008 Dagstuhl 2008-12-16

Ranking model Language model Additional features Element priors based on element location and depth Smoothing on document and collection level Additional features Support for CAS queries Support for +/- keyword operators Simple overlapping element removal Stemming INEX 2008 Dagstuhl 2008-12-16

Ad hoc results Our runs: Three CO runs One returning only documents Two returning elements Three CAS runs with various smoothing factors No. Run iP[0.00] iP[0.01] iP[0.05] iP[0.10] MAiP 1 co-document-lc6 0.6389 0.5949 0.5051 0.4699 0.2551 2 cas-element-ld5-lc4 0.6684 0.5530 0.4048 0.3248 0.1440 3 co-element-ld2-lc5 0.6907 0.5417 0.4007 0.2920 0.0994 4 co-element-ld2-lc1 0.6718 0.5241 0.3922 0.2963 0.0929 5 cas-element-ld2-lc5 0.6494 0.5203 0.3569 0.2593 0.1134 6 cas-element-ld1-lc6 0.6642 0.5063 0.3652 0.2610 0.1133 INEX 2008 Dagstuhl 2008-12-16

Ad hoc results INEX 2008 Dagstuhl 2008-12-16

Conclusion Retrieving whole documents performed better than element retrieval at higher levels of recall CAS queries performed slightly better that CO queries Higher smoothing at the document level contributed to better performance INEX 2008 Dagstuhl 2008-12-16

Future work Other smoothing techniques Pseudo relevance feedback Incorporating link evidence Information extraction methods INEX 2008 Dagstuhl 2008-12-16

The End Thank you INEX 2008 Dagstuhl 2008-12-16

Language model INEX 2008 Dagstuhl 2008-12-16