Download presentation
Presentation is loading. Please wait.
1
CADIAL search engine at INEX
Jure Mijić1, Marie-Francine Moens2, Bojana Dalbelo Bašić1 1Faculty of Electrical Engineering and Computing 2Department of Computer Science, Katholieke Universiteit Leuven INEX Schloss Dagstuhl Conference Center, Wadern, Germany ITI2008 Cavtat
2
Presentation overview
What is CADIAL project? System overview Ranking model Ad hoc results Conclusion Future work INEX 2008 Dagstuhl
3
What is CADIAL project? Bilateral project between the Government of Flanders and the Ministry of Science, Education and Sports of the Republic of Croatia Aims of the CADIAL project: Provide access to a collection of Croatian legislative documents Enable the use of the Eurovoc thesaurus, an EU standard thesaurus for document indexing and retrieval INEX 2008 Dagstuhl
4
System overview Built with expandability in mind
Supports multiple information retrieval models Supports morphological normalization modules An indexer tool is used for document indexing Input documents are in XML format Output is an index database (a base structure for every search engine model) Index database is upgraded with additional data required by the model (various statistical information) INEX 2008 Dagstuhl
5
Ranking model Language model Additional features
Element priors based on element location and depth Smoothing on document and collection level Additional features Support for CAS queries Support for +/- keyword operators Simple overlapping element removal Stemming INEX 2008 Dagstuhl
6
Ad hoc results Our runs: Three CO runs
One returning only documents Two returning elements Three CAS runs with various smoothing factors No. Run iP[0.00] iP[0.01] iP[0.05] iP[0.10] MAiP 1 co-document-lc6 0.6389 0.5949 0.5051 0.4699 0.2551 2 cas-element-ld5-lc4 0.6684 0.5530 0.4048 0.3248 0.1440 3 co-element-ld2-lc5 0.6907 0.5417 0.4007 0.2920 0.0994 4 co-element-ld2-lc1 0.6718 0.5241 0.3922 0.2963 0.0929 5 cas-element-ld2-lc5 0.6494 0.5203 0.3569 0.2593 0.1134 6 cas-element-ld1-lc6 0.6642 0.5063 0.3652 0.2610 0.1133 INEX 2008 Dagstuhl
7
Ad hoc results INEX 2008 Dagstuhl
8
Conclusion Retrieving whole documents performed better than element retrieval at higher levels of recall CAS queries performed slightly better that CO queries Higher smoothing at the document level contributed to better performance INEX 2008 Dagstuhl
9
Future work Other smoothing techniques Pseudo relevance feedback
Incorporating link evidence Information extraction methods INEX 2008 Dagstuhl
10
The End Thank you INEX 2008 Dagstuhl
11
Language model INEX 2008 Dagstuhl
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.