Download presentation
Presentation is loading. Please wait.
Published byCory Jacobs Modified over 9 years ago
1
Microsoft Research India’s Participation in FIRE2008 Raghavendra Udupa raghavu@microsoft.com
2
Inverted Index Dictionary LA Times 2002 articles Document Ranker Query Translator पिम फोरत् यून की राजनीति CLEF’07 Query #10.2452/447-AH ऐसे दस् तावेज खोजिए जिनमें पिम फोरत् यून के राजनैतिक विचारों पर चर्चा की गई हो। Pim Fortuyn politics CLIR System
3
Inverted Index Dictionary Document Collection Document Ranker Query Translator Domain Adaptation Mining Translation Lexicon from Comparable Corpora Mining transliterations of OOV words Cross- Language Ranking Model Mining NETE Transliterations from Comparable Corpora
4
Inverted Index Dictionary Document Collection Document Ranker Query Translator Domain Adaptation Mining transliterations of OOV terms (ECIR 2009) Cross- Language Ranking Models Mining NETE Transliterations from Comparable Corpora (CIKM’08) Mining Translation Lexicon from Comparable Corpora (MT Summit 2007)
5
Baseline Retrieval System Language Model-Based Retrieval Probabilistic Translation Lexicon ~100K parallel sentences IBM Model 3 Alignment GIZA++ J. Jagarlamudi and A. Kumaran, Cross-Lingual Information Retrieval System for Indian Languages. Working Notes for the CLEF 2007 Workshop.
6
FIRE Fighting Mining Transliterations of Out-Of-Vocabulary Query Terms. Date-Based Document Restriction.
7
Mining Transliterations of Out- Of-Vocabulary Query Terms Raghavendra Udupa
8
OOV Query Terms Many OOV query terms are NEs NEs are often the focus of a query NEs form an open class of terms in all languages. Getting their transliterations right is extremely important Many OOV query terms are not NEs but transliterations of English words. E.g. सेमिनार (seminar), कार्पोंरेशन (corporation), चैम्पियन (champion), फिल्म (film)
9
A Hypothesis The transliterations of most of the transliteratable OOV terms of a query can be found in documents relevant to the query.
10
Empirical Validation CollectionTransliterat able OOV terms Terms with transliterations in at least one relevant document Terms with transliteration in at least 50% of relevant documents CLEF 2006 (Hindi)6258 (94%)49 (79%) CLEF 2007 (Hindi)4742 (89%)34 (72%) CLEF 2007 (Tamil)4342 (98%)39 (89%)
11
A Practical Hypothesis The transliterations of many of the transliteratable OOV terms of a query can be found in the top results of the CLIR system for the query.
12
Mining OOV Transliteration Equivalents Basic Idea: Pair the query with each of the top N results. Treat each pair as a comparable document pair. Mine transliteration equivalents from the comparable document pairs. “They are out there, if you know where to look”: Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval ECIR 2009, Toulouse
13
Long Queries: MAP CollectionBaselineTransliterations Mining % change over baseline CLEF 2006 (Hindi)0.14630.2476+69.24* CLEF 2007 (Hindi)0.25210.3389+34.43* CLEF 2007 (Tamil)0.18480.2270+22.84*
14
Short Queries: MAP CollectionBaseline Transliterations Mining % change over baseline CLEF 2006 (Hindi)0.08770.1467 67.3 CLEF 2007 (Hindi)0.18290.2323 27.0 CLEF 2007 (Tamil)0.10240.1265 23.5
15
FIRE 2008: MAP Baseline Transliterations Mining % change over baseline Short (unofficial) 0.26160.319122 Long (unofficial) 0.43510.487112 Long (official) 0.41400.45269
16
FIRE2008: MAP Difference (Long, official)
17
FIRE 2008: Num_Rel_Ret Baseline Transliterations Mining Short (unofficial) 70.6080.0 Long (unofficial) 84.55%88.54% Long (official) 79.68%82.11%
18
FIRE 2008: P@10 Baseline Transliterations Mining Short (unofficial) 0.10000.4320 Long (unofficial) 0.62600.6540 Long (official) 0.61200.6480
19
Mining Transliterations @ FIRE2008 Worked.
20
Date-Based Document Restriction Raghavendra Udupa
21
Dates Some queries contain dates CLEF 2007, Topic 407: Who was the Australian Prime Minister in 2002? CLEF 2007, Topic 411: …terrorist car bomb in Bali, Indonesia, in 2002. CLEF 2006, Topic 326: …winners in any category of the 1995 Emmy Awards. CLEF 2006, Topic 327: …earthquakes in Mexico City in 1995.
22
Hypothesis If a query contains a date then the relevant documents for the query are likely to be from the same time period.
23
Empirical Validation CLEF’07 LATimes 2002 CLEF’06 GH 95, LATimes 1994
24
CLEF’06: C327 Title: Earthquakes in Mexico City Description: Find documents that provide details on the impact of or the damage caused by earthquakes in Mexico City in 1995. Narrative: Relevant document should contain some information on earthquakes in Mexico City in 1995, such as their magnitude, damages caused, panic of the inhabitants, etc. Documents on earthquakes in other places in Mexico are not relevant unless the seismic impact was also felt in Mexico City.
25
Relevant Document LA121194-0313 107228 December 11, 1994, Sunday, Home Edition A magnitude 6.3 earthquake rocked Mexico City, causing people to flee their homes in fear. There were no immediate reports of injuries or severe damage. The U.S. Geological Survey's National Earthquake Information Center in Golden, Colo., said the quake's epicenter was in Petatlan in the southwestern state of Guerrero.
26
Date-Based Document Restriction Identify dates (if any) in the query. Restrict candidate documents to the set of documents coming from the same time period.
27
FIRE 2008: Relevant Docs TopicRelevant Docs from different time period 44(11/56) 47(23/32) 48(70/76) 50(18/61) 52(2/38) 73(10/53)
28
FIRE 2008: Hindi English MAP Without DR With DR Short 0.2616 (unofficial) 0.2601 (unofficial) Long 0.4351 (unofficial) 0.4140 (official)
29
Date-Based Document Restriction @ FIRE2008 Hurt us. Deeper investigation needed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.