Download presentation
Presentation is loading. Please wait.
1
Why Spectral Retrieval Works
Holger Bast & Debapriyo Majumdar, Why spectral retrieval works, In Proceedings SIGIR’05, pages11-18, August 15-19, 2005 Presenter: Suhan Yu
2
Abstract Spectral retrieval: in LSI, the characteristic is latent topic, combine these characteristic to a vector, then use the vector to compare terms, this processing is called Spectral retrieval. Fixed low-dimension vary the dimension Study for each term pair the resulting curve of relatedness scores. Spectral :“每”一個頻率的weight 這是在訊號處理上的解釋 每個頻率可視為一種特徵在 IR上 對doc來說 它的特徵就是每一個term 在LSA上 對doc來說 它的特徵就是每一個latent topic 在每個特徵上都有weight 因為有很多個特徵 所以把這些特徵串起來就變成一個vector 然後在比較時就用這個特徵vector來比較 就叫spectral retrieval
3
What we mean by spectral retrieval
Ranked retrieval in the term space d1 d2 d3 d4 d5 q internet web surfing beach 2 1 1 1.00 1.00 0.00 0.50 0.00 "true" similarities to query qTd1 ——— |q||d1| 0.82 qTd2 ——— |q||d2| 0.00 0.00 0.38 0.00 cosine similarities Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear.
4
What we mean by spectral retrieval
Ranked retrieval in the term space d1 d2 d3 d4 d5 q internet web surfing beach 2 1 1 1.00 1.00 0.00 0.50 0.00 "true" similarities to query 0.82 0.00 0.00 0.38 0.00 cosine similarities Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 (Lq)T(Ld1) —————— |Lq| |Ld1| 0.98 0.98 -0.25 0.73 0.01 cosine similarities in the subspace …
5
Viewing LSI as document expansion
The cosine-similarity of a query q with document A Can be dropped Diagonal matrix
6
Viewing LSI as document expansion
1. 2. is an eigenvector Query need to project (online) Can be count offline and document expansion.
7
Spectral retrieval — alternative view
Ranked retrieval in the term space d1 d2 d3 d4 d5 q internet web surfing beach 2 1 1 Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 (Lq)T(Ld1) —————— |Lq||Ld1| cosine similarities in the subspace … = qT(LTLd1) —————— |Lq||LTLd1|
8
Spectral retrieval — alternative view
Ranked retrieval in the term space d1 d2 d3 d4 d5 q expansion matrix LTL internet web surfing beach 2 1 1 0.29 0.36 0.25 -0.12 0.44 0.30 -0.17 0.84 Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 cosine similarities in the subspace … qT(LTLd1) —————— |Lq||LTLd1|
9
Spectral retrieval — alternative view
Ranked retrieval in the term space LTLd1 LTLd2 LTLd3 LTLd4 LTLd5 q expansion matrix LTL internet web surfing beach 1.18 0.96 -0.12 1.03 0.01 1.45 1.19 -0.17 1.22 -0.05 1.24 1.04 0.30 1.73 -0.11 -0.04 0.84 1.15 1.98 1 0.29 0.36 0.25 -0.12 0.44 0.30 -0.17 0.84 qT(LTLd1) —————— |q||LTLd1| … similarities after document expansion Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 qT(LTLd1) —————— |Lq||LTLd1| cosine similarities in the subspace … Spectral retrieval = document expansion (not query expansion)
10
The curve of relatedness scores
Instead of looking at all the entries for a fixed dimension, looking at a fixed entry for all dimensions. Two terms All dimensions Term j 在 Dimension=k時 i On Latent semantic space j
11
The curve of relatedness scores
The most curves quite naturally fall into one of these three categories. Unrelated terms
12
Perfectly related terms
Perfectly related terms: terms that have identical co-occurrence patterns. Definition:
13
Perfectly related terms (cont.)
If (singular value decomposition of A) Define Identity matrix (eigenvector are orthogonal) C instead of A
14
Perfectly related terms (cont.)
(m-2)*(m-2) (m-2)*1 (m-2)*1 1*(m-2) 1*1 1*1 Paper wrong
15
Perfectly related terms (cont.)
Calculate eigenvector One of the eigenvalue= x - y
16
Perfectly related terms (cont.)
Replace the x and y
17
Perfectly related terms (cont.)
Calculate eigenvector: Let norm=1
18
Perfectly related terms (cont.)
then eigenvector of C assume is other eigenvectors of C (eigenvectors are orthogonal)
19
Perfectly related terms (cont.)
We get: (Lemma1) the vector is a left singular vector of A the corresponding singular value is for all other left singular vectors of A , , that is, the last two entries are equal.
20
Perfectly related terms (cont.)
1 term 1 term 2 point of fall-off is different for every term pair!
21
Adding perturbations The perfectly related terms is robust under small perturbations of the term-document matrix. Lemma2: 如果perfectly related terms 可以忍受一些雜訊 那麼 我們就不一定要找到perfectly related terms 有些微的差異也是可以忍受的
22
Adding perturbations K most significant left singular vectors of A
Orthogonal k*k matrix Any matrix, and F: Forbenius norm F<1/4 By Stewart’s theorem on perturbation of symmetric matrixes A B C a b c p m n x Get
23
Adding perturbations Let Cauchy’s inequality
24
up-and-then-down shape remains
Adding perturbations up-and-then-down shape remains Sufficiently small perturbations change the curve of relatedness scores only little at any dimension before it fall-off.
25
Curves for unrelated terms
co-occur Co-occurrence graph: vertices = terms edge = two terms co-occur Lemma3: We call two terms perfectly unrelated if no path connects them in the graph term proven shape for perfectly unrelated terms provably small change after slight perturbation half way to a real matrix 200 400 600 subspace dimension 200 400 600 subspace dimension 200 400 600 subspace dimension expansion matrix entry curves for unrelated terms are random oscillations around zero
26
3 Lemma review Lemma1 (perfectly related terms)
the vector is a left singular vector of A the corresponding singular value is for all other left singular vectors of A , , that is, the last two entries are equal. Lemma2 (Add some perturbations) Lemma3 (definition unrelated terms) We call two terms perfectly unrelated if no path connects them in the graph
27
Dimensionless algorithm
Choose dimension: Every choice is inappropriate for a significant fraction of the term pair.
28
Dimensionless algorithm
29
Algorithm TN (dimensionless)
Normalize the length of A to length 1 Compute SVD of the normalize A For each pair of term i,j compute the size of the set Perform document expansion with zero-one matrix T ( if and only if ) represents two terms related The corresponding curve of relatedness scores is never negative
30
Telling the shapes apart — TN
Normalize term-document matrix so that theoretical point of fall-off is equal for all term pairs For each term pair: if curve is never negative before this point, set entry in expansion matrix to 1, otherwise to 0 expansion matrix entry set entry to 1 set entry to 1 set entry to 0 200 400 600 200 400 600 200 400 600 subspace dimension subspace dimension subspace dimension a simple 0-1 classification, no fractional entries!
31
Algorithm TN (dimensionless)
Lemma1: all perfectly related terms , Lemma2: these assignment to T are invariant under small perturbations of the underlying term-document matrix. Lemma3: completely unrelated terms have all-zero curves, ( , )
32
Algorithm TS (dimensionless)
Compute the same matrix U as for TN Disadvantage: need to find s. Advantage: finding s is more intuitive than previous method. (fixed dimension) For each pair of term i,j compute the smoothness of their curve as Perform document expansion with zero-one matrix T ( if and only if ) if only if the scores go only up or only down zig-zags s: smoothness threshold This experiments set s: 0.2% of the entries in T are 1.
33
An alternative algorithm — TM
Again, normalize term-document matrix so that theoretical point of fall-off is equal for all term pairs For each term pair compute the monotonicity of its initial curve (= 1 if perfectly monotone, 0 as number of turns increase) If monotonicity is above some threshold, set entry in expansion matrix to 1, otherwise to 0 0.69 0.69 0.07 0.07 0.82 0.82 expansion matrix entry One more slide with the sij replaced by Tij (or one after the other) set entry to 1 set entry to 1 set entry to 0 200 400 600 200 400 600 200 400 600 subspace dimension subspace dimension subspace dimension again: a simple 0-1 classification!
34
Computation complexity
Nonzero entries Original LSI: because of TN/TS: the can save by discarding pairs of terms that do not co-occur in at least one document. Average number of related terms of a term
35
Experimental evaluation
Three collection: Time collection (3882*425) <83queries> Reuters collection (5701*21578) <120queries> Topic labels Ohsumed collection (99117*233445) <63queries>
36
Experimental evaluation
Use other spectral retrieval schemes from the literal: LSI LSI-RN: term normalized variant CORR: correlation method IRR: iterative residual rescaling Baseline method: COS (cosine similarity) COS,LSI,IRR,LSI-RN: standard tf-idf matrix CORR,TN,TS: row-normalized matrix
37
Result of the experiments
dimension300 dimension400
38
Result of the experiments
dim800 dim1000 dim1000 dim1200
39
Experimental results COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2%
(average precision) 425 docs 3882 terms COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2% 62.8% 58.6% 59.1% 62.2% 64.9% 64.1% Baseline: cosine similarity in term space Latent Semantic Indexing Dumais et al. 1990 Term-normalized LSI Ding et al. 2001 Correlation-based LSI Dupret et al. 2001 Mention unpredictable effects of LSI and relatives: sometimes even below COS baseline! Iterative Residual Rescaling Ando & Lee 2001 our non-negativity test our monotonicity test * the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!
40
Experimental results COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2%
(average precision) 425 docs 3882 terms 21578 docs 5701 terms docs terms COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2% 62.8% 58.6% 59.1% 62.2% 64.9% 64.1% REUTERS 36.2% 32.0% 37.0% 32.3% —— 41.9% 42.9% OHSUMED 13.2% 6.9% 13.0% 10.9% —— 14.4% 15.3% Mention unpredictable effects of LSI and relatives: sometimes even below COS baseline! * the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!
41
Binary vs. fractional relatedness
The finding is that algorithms which do a simple binary classification into related and unrelated term pairs outperform schemes which seem to have additional power by giving a fractional assessment for each term pair.
42
Binary vs. fractional relatedness
Most curves have scores at or below zero at either very few dimensions or at quite a lot of dimensions. 表示大部份的term有完美的相關性或者完全無關
43
Conclusions and outlook
This paper introduced the curves of relatedness scores as a new angle of looking at retrieval. Dimensionless algorithm outperform previous schemes, and more intuitive. Now we use symmetric matrix, but one of the relation between terms is asymmetric. Such as “nucleic” (核酸) and “acid” (酸).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.