Experiments for the CL-SR task at CLEF 2006

Slides:



Advertisements
Similar presentations
Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
SINAI-GIR A Multilingual Geographical IR System University of Jaén (Spain) José Manuel Perea Ortega CLEF 2008, 18 September, Aarhus (Denmark) Computer.
Modern Information Retrieval
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
LREC Combining Multiple Models for Speech Information Retrieval Muath Alzghool and Diana Inkpen University of Ottawa Canada.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
PATENTSCOPE Patent Search Strategies and Techniques Andrew Czajkowski Head, Innovation and Technology Support Section Centurion September 11, 2014.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.
Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful.
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
© Paul Buitelaar – November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
UA in ImageCLEF 2005 Maximiliano Saiz Noeda. Index System  Indexing  Retrieval Image category classification  Building  Use Experiments and results.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Speech and Music Retrieval INST 734 Doug Oard Module 12.
National Taiwan University, Taiwan
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
1 Flexible and Efficient Toolbox for Information Retrieval MIRACLE group José Miguel Goñi-Menoyo (UPM) José Carlos González-Cristóbal (UPM-Daedalus) Julio.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
Medical retrieval 2008 New data set with almost 66,000 images Thirty topics were made available, ten in each of three categories: visual, mixed, and semantic.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
CLEF 2008 Workshop Aarhus, September 17, 2008  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. Comas (1), L.
Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.
Diana Inkpen, University of Ottawa, CLEF 2005 Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005 Diana Inkpen, Muath.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Information and Communication Technologies 1 Overview of GeoCLEF 2007 IR techniques IE/NLP techniques GIR techniques Systems Resources Experiments Translation.
1 INFILE - INformation FILtering Evaluation Evaluation of adaptive filtering systems for business intelligence and technology watch Towards real use conditions.
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
A Trainable Multi-factored QA System Radu Ion, Dan Ştefănescu, Alexandru Ceauşu, Dan Tufiş, Elena Irimia, Verginica Barbu-Mititelu Research Institute for.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
F. López-Ostenero, V. Peinado, V. Sama & F. Verdejo
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
An Automatic Construction of Arabic Similarity Thesaurus
Spoken Dialog System.
Document Expansion for Speech Retrieval (Singhal, Pereira)
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
1Micheal T. Adenibuyan, 2Oluwatoyin A. Enikuomehin and 2Benjamin S
Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR
CLEF 2008 Multilingual Question Answering Track
Presentation transcript:

Experiments for the CL-SR task at CLEF 2006 Muath Alzghool and Diana Inkpen University of Ottawa Canada Track: Cross Language Spoken Retrieval (CL-SR)

Experiments Results for sumbitted runs - English collection Results for sumbitted runs - Czech collection Segmentation issues, evaluation score Results for different systems: Smart, Terrier Query expansion Log likelihood collocations scores Terrier: divergence from randomness Small improvements

Results for the submitted runs for the English collection Language MAP Fields Description English 0.2902 TDN Terrier: MANUALKEYWORD + SUMMARY 0.0768 SMART: NSP query expansion (LL) ASRTEXT2004A + AUTOKEYWORD2004A1,A2 French 0.0637 ASRTEXT2004A + AUTOKEYWORD2004A1, A2 Spanish 0.0619 0.0565 TD Terrier: ASRTEXT2004A + ASRTEXT2006A + AUTOKEYWORD2004A1, A2

Results for the submitted runs for the Czech collection Language GAP Fields Description Czech 0.0039 TDN SMART: ASRTEXT, CZECH AUTOKEYWORD, CZECH MANUKEYWORD, ENGLISH MANUKEYWORD, ENGLISH AUTOKEYWORD 0.0005 SMART: ASRTEXT, CZECHAUTOKEYWORD, CZECHMANUKEYWORD 0.0004 SMART: ASRTEXT, CZECHAUTOKEYWORD TD Terrier: ASRTEXT, CZECHAUTOKEYWORD

MAP scores for Terrier and SMART, with or without relevance feedback, for English topics   System Training Test TDN TD T TDN TD T 1 SMART 0.0954 0.0906 0.0873 0.0766 0.0725 0.0759 SMARTnsp 0.0923 0.0901 0.0870 0.0768 0.0754 0.0769 2 Terrier 0.0913 0.0834 0.0760 0.0651 0.0560 0.0656 TerrierKL 0.0915 0.0952 0.0654 0.0565 0.0685

Experiments Various ASR transcripts (2003, 2004, 2006) New ASR 2006 transcripts do not help Combinations do not help Automatic keywords help Cross-language Results good for French to English topic translations Not for Spanish, German, Czech Manual summaries and manual keywords Best results

MAP scores for Terrier, with various ASR transcript combinations Segment fields Training Test TDN TD T ASR2003A 0.0733 0.0658 0.0684 0.0560 0.0473 0.0526 ASR 2004A 0.0794 0.0742 0.0722 0.0670 0.0569 0.0604 ASR 2006A 0.0799 0.0731 0.0741 0.0656 0.0575 0.0576 ASR 2006B 0.0840 0.0770 0.0776 0.0665 0.0591 ASR 2003A+2004A 0.0759 0.0705 0.0596 0.0472 0.0542 ASR 2004A+2006A 0.0811 0.0743 0.0730 0.0638 0.0492 0.0559 ASR 2004A+2006B 0.0804 0.0735 0.0732 0.0628 0.0494 0.0558 ASR 2003A+AUTOK 0.0873 0.0859 0.0789 0.0657 0.0570 0.0671 ASR 2004A+AUTOK 0.0915 0.0952 0.0906 0.0654 0.0565 0.0685 ASR 2006B+AUTOK 0.0926 0.0932 0.0909 0.0717 0.0608 0.0661 ASR 2004A+2006A+AUTOK 0.0925 0.0715 ASR 2004A+2006B+AUTOK 0.0899 0.0890 0.0640 0.0556 0.0692

MAP scores for Smart, with various ASR transcript combinations Segment fields Training Test TDN TD T ASR 2003A 0.0625 0.0586 0.0585 0.0508 0.0418 0.0457 ASR 2004A 0.0701 0.0657 0.0637 0.0614 0.0546 0.0540 ASR 2006A 0.0537 0.0594 0.0608 0.0455 0.0434 0.0491 ASR 2006B 0.0582 0.0635 0.0642 0.0484 0.0459 0.0505 ASR 2003A+2004A 0.0685 0.0646 0.0636 0.0533 0.0442 0.0503 ASR 2004A+2006A 0.0686 0.0699 0.0696 0.0543 0.0490 0.0555 ASR 2004A+2006B 0.0713 0.0702 0.0542 0.0494 0.0553 ASR 2003A +AUTOK 0.0923 0.0847 0.0839 0.0674 0.0616 0.0690 ASR 2004A+AUTOK 0.0954 0.0906 0.0873 0.0766 0.0725 0.0759 ASR 2006B+AUTOK 0.0869 0.0892 0.0895 0.0650 0.0659 0.0734 ASR2004A+2006A+AUTOK 0.0903 0.0932 0.0915 0.0654 0.0777 ASR2004A+2006B+AUTOK 0.0931 0.0919 0.0652 0.0655 0.0742

Results of the cross-language experiments Training Test TDN TD T 1 English 0.0954 0.0906 0.0873 0.0766 0.0725 0.0759 2 French 0.0950 0.0904 0.0814 0.0637 0.0566 0.0483 3 Spanish 0.0773 0.0702 0.0656 0.0619 0.0589 0.0488 4 German 0.0653 0.0622 0.0611 0.0674 0.0605 0.0618 5 Czech 0.0585 0.0506 0.0421 0.0400 0.0309 0.0385 Indexed fields ASRTEXT2004, and autokeywords using SMART with the weighting scheme lnn.ntn

Results of indexing the manual keywords and summaries, using SMART with weighting scheme lnn.ntn, and Terrier with In(exp)C2 Language / System Training Test TDN TD T 1 English SMART 0.3097 0.2829 0.2564 0.2654 0.2344 0.2258 2 English Terrier 0.3242 0.3227 0.2944 0.2902 0.2710 0.2489 3 French SMART 0.2920 0.2731 0.2465 0.1861 0.1582 0.1495 4 French Terrier 0.3043 0.3066 0.2896 0.1977 0.1909 0.1651 5 Spanish SMART 0.2502 0.2324 0.2108 0.2204 0.1779 0.1513 6 Spanish Terrier 0.2899 0.2711 0.2834 0.2444 0.2165 0.1740 7 German SMART 0.2232 0.2182 0.1831 0.2059 0.1811 0.1868 8 German Terrier 0.2356 0.2317 0.2055 0.2294 0.2116 0.2179 9 Czech SMART 0.1766 0.1687 0.1416 0.1275 0.1014 0.1177 10 Czech Terrier 0.1822 0.1765 0.1480 0.1411 0.1092 0.1201

Conclusion and future work Low retrieval results, except when using manual summaries and keywords Future work Filter out potential speech errors – semantic outliers with low PMI score (in a large Web corpus) with neighboring words Index using speech lattices