Download presentation
Presentation is loading. Please wait.
Published byAbner Lang Modified over 8 years ago
1
A merging strategy proposal: The 2-step retrieval status value method Fernando Mart´inez-Santiago · L. Alfonso Ure ˜na-L´opez · Maite Mart´in-Valdivia Department of Computer Science, University of Ja´en, Ja´en, Spain Inf Retrieval (2006) 9: 71–93
2
Merging problem Language 1Language 2Language 3 Result lists from per language Merge to a single result list d 11 d 12 d 13 ….. d 21 d 22 d 23 ….. d 31 d 32 d 33 ….. d 31 d 32 d 21 d 11 d 12 d 23 d 13 ….…. query Merge strategy
3
Traditional solution Round-Robin –Language1 list d11 d12 d13… –Language2 list d21 d22 d23… –Language3 list d31 d32 d33… –Marge d11 d21 d31 d12 d22 d32 … Raw-scoring Normalized scoring – 1) – 2)
4
Traditional solution Logistic regression (Calv´e and Savoy (2000), Savoy (2003a)) LVQ neural networks (Mart´ın et al. 2003)
5
2-step retrieval status value method Step 1: –translating and searching the query on each monolinqual collection,produces two results: a)a concept T’ consist of each term together with its corresponding translation b)Mutilinqual collection D’,as result of the union of the 1000 retrieved documents for each language.
6
2-step retrieval status value method Step 2: –re-indexing the D’,but considering solely the T’ vocabulary. –given a concept, its document frequency is the result of grouping together the document frequencies of the terms which makes up the concept
7
2-step retrieval status value method For Example: Spanish word casa translate to English word is house,home Given a document, term frequency will be calculate as usual, document frequency will be the sum of the document frequency of “casa”, “house”,“home”
8
Mixed 2-step RSV Not aligned words Raw mixed 2-step RSV method –for a given τi j, term j into the monolingual collection i, the document frequency value will be: As 2-step method,if τi j is aligned. the initial weight in the first step of the method, if the translation of τi j into the other languages is unknown. RSVi = α · RSV i align + (1 − α) ·RSV i nonalign –α = 0.75
9
Mixed 2-step RSV Normalized mixed 2-step RSV method –α = 0.75
10
Mixed 2-step RSV Learning–based algorithm –Logistic regression α, β1, β2 and β3 must be estimated by using iteratively re- weighted least squares method –LVQ Neural network (Mart´ın et al. 2003)
11
Use machine translation to align word P en = “Pesticides in baby food” –Unigrams P en = {Pesticides, baby, food} –Bigrams P en = {Pesticides baby, baby food} the translated expression is: –EXP en ={Pesticides in baby food}{Pesticides,baby, food}{Pesticides baby,baby food } Then we have, P sp = {Pesticidas alimento ni˜nos} Unigrams P sp = {Pesticidas, beb´e, alimento} (Unigrams Psp is the translation of Unigrams Pen ) Bigrams P sp = {Pesticidas beb´es, alimento ni˜nos} (Bigrams Psp is the translation of Bigrams Pen )
12
Use machine translation to align word For each word i sp ∈ Unigrams P sp do –(a) if word i sp ∈ P sp, then remove word i sp from P sp, and add (word i sp, word i en ) to the set of aligned words ALIGNED Thus, we obtain: – Psp = {ni˜nos} – ALIGNED = {(pesticidas,pesticides),(alimento,food)}
13
Use machine translation to align word For each bigram bigram sp i ∈ BigramsP sp –(a) if (word sp 1, word en 1 ) ∈ ALIGNED (word sp 1 is aligned with word en 1 ) and word sp 2 ∈ P sp then remove word sp 2 from Psp and add (word sp 2, word en 2 ) to ALIGNED set. –(b) if (word sp 1, word en 2 ) ∈ ALIGNED and word sp 2 ∈ Psp then remove word sp 2 from Psp and add (word sp 2, word en 1 ) to ALIGNED set.
14
Use machine translation to align word –(c) if (word sp 2, word en 1 ∈ ALIGNED and word sp 1 ∈ Psp then remove word sp 1 from Psp andadd (word sp 1, word en 2 ) to ALIGNED set. –(d) if (word sp 2, word en 2 ∈ ALIGNED and word sp 1 ∈ Psp, then remove word sp 1 from Psp and add (word sp 1, word en 1 ) to ALIGNED set. Psp = ∅ ALIGNED = {(pesticidas,pesticides),(alimento,food) (ni˜nos,baby)
15
Method conclusion Fully aligned word – 2-step method Partial aligned word – Raw-mixed 2-step RSV – Normalized mixed 2-step RSV – Logistic regression mixed 2-step RSV – Neural network mixed 2-step RSV Algorithm to align phrase and translations
16
Experiment Document –CLEF 2003 have two task CLEF 2003-8 and CLEF 2003-4. CLEF 2003-4 is limited to four language(English, France, German and Spanish ) Query (Title + Description )
17
Experiment they are indexed with the Zprise IR system, using the OKAPI probabilistic model (fixed at b = 0.75 and k1 = 1.2) Translation strategies –Machine Readable Dictionary (Babylon) to pick the first translation available (under the heading “Babylon 1”) or the first two terms (indicated under the label “Babylon 2”) –Machine Translation (MT, Babelfish) –Mixed MT and MDR by taking together Babelfish and Babylon 1 translations.
18
Experiment1 –multilinqual results with fully aligned queries
20
Experiment1 – analysis of failures Too many documents from the Spanish collection for this query
21
Experiment1 – analysis of failures
22
Experiment2 –multilinqual results with partially aligned queries Based on MDR translation approach
23
Experiment2 –multilinqual results with partially aligned queries Based on MDR translation approach
24
Experiment2 –multilinqual results with partially aligned queries Based on MT translation approach with the CLEF 2001–2002 test collection and CLEF2001+CLEF2002+CLEF2003 query set (160 queries, five languages, EN, SP, DE, FR, IT)
25
Conclusion Future effort –Dealing with translation probabilities. –Testing the method with other translation strategies such as the Multilingual Similarity Thesaurus. –n-grams indexing. –continue studying strategies in order to deal with aligned and non-aligned term queries: the integration of both sorts of terms by means of bayesian networks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.