Why Spectral Retrieval Works

Why Spectral Retrieval Works
Holger Bast & Debapriyo Majumdar, Why spectral retrieval works, In Proceedings SIGIR’05, pages11-18, August 15-19, 2005 Presenter: Suhan Yu

Abstract Spectral retrieval: in LSI, the characteristic is latent topic, combine these characteristic to a vector, then use the vector to compare terms, this processing is called Spectral retrieval. Fixed low-dimension vary the dimension Study for each term pair the resulting curve of relatedness scores. Spectral :“每”一個頻率的weight 這是在訊號處理上的解釋每個頻率可視為一種特徵在 IR上　　對doc來說　它的特徵就是每一個term 在LSA上　對doc來說　它的特徵就是每一個latent topic 在每個特徵上都有weight 因為有很多個特徵　所以把這些特徵串起來就變成一個vector 然後在比較時就用這個特徵vector來比較　就叫spectral retrieval

What we mean by spectral retrieval
Ranked retrieval in the term space d1 d2 d3 d4 d5 q internet web surfing beach 2 1 1 1.00 1.00 0.00 0.50 0.00  "true" similarities to query qTd1 ——— |q||d1| 0.82 qTd2 ——— |q||d2| 0.00 0.00 0.38 0.00  cosine similarities Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear.

What we mean by spectral retrieval
Ranked retrieval in the term space d1 d2 d3 d4 d5 q internet web surfing beach 2 1 1 1.00 1.00 0.00 0.50 0.00  "true" similarities to query 0.82 0.00 0.00 0.38 0.00  cosine similarities Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 (Lq)T(Ld1) —————— |Lq| |Ld1| 0.98 0.98 -0.25 0.73 0.01  cosine similarities in the subspace …

Viewing LSI as document expansion
The cosine-similarity of a query q with document A Can be dropped Diagonal matrix

Viewing LSI as document expansion
1. 2. is an eigenvector Query need to project (online) Can be count offline and document expansion.

Spectral retrieval — alternative view
Ranked retrieval in the term space d1 d2 d3 d4 d5 q internet web surfing beach 2 1 1 Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 (Lq)T(Ld1) —————— |Lq||Ld1|  cosine similarities in the subspace … = qT(LTLd1) —————— |Lq||LTLd1|

Ranked retrieval in the term space d1 d2 d3 d4 d5 q expansion matrix LTL internet web surfing beach 2 1 1 0.29 0.36 0.25 -0.12 0.44 0.30 -0.17 0.84 Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84  cosine similarities in the subspace … qT(LTLd1) —————— |Lq||LTLd1|

Ranked retrieval in the term space LTLd1 LTLd2 LTLd3 LTLd4 LTLd5 q expansion matrix LTL internet web surfing beach 1.18 0.96 -0.12 1.03 0.01 1.45 1.19 -0.17 1.22 -0.05 1.24 1.04 0.30 1.73 -0.11 -0.04 0.84 1.15 1.98 1 0.29 0.36 0.25 -0.12 0.44 0.30 -0.17 0.84 qT(LTLd1) —————— |q||LTLd1| …  similarities after document expansion Spectral retrieval = linear projection to an eigensubspace Let me first explain the classical view on spectral retrieval, and introduce some basic notation on the way. This will be our example for the next 5 minutes so let me try to make it perfectly clear. L d1 L d2 L d3 L d4 L d5 L q projection matrix L 2.01 1.67 0.37 2.61 1.39 1.01 0.79 -0.84 -0.21 -1.75 0.42 0.33 0.42 0.51 0.66 0.37 0.33 0.43 -0.08 -0.84 qT(LTLd1) —————— |Lq||LTLd1|  cosine similarities in the subspace … Spectral retrieval = document expansion (not query expansion)

The curve of relatedness scores
Instead of looking at all the entries for a fixed dimension, looking at a fixed entry for all dimensions. Two terms All dimensions Term j 在 Dimension=k時 i On Latent semantic space j

The curve of relatedness scores
The most curves quite naturally fall into one of these three categories. Unrelated terms

Perfectly related terms
Perfectly related terms: terms that have identical co-occurrence patterns. Definition:

Perfectly related terms (cont.)
If (singular value decomposition of A) Define Identity matrix (eigenvector are orthogonal) C instead of A

(m-2)*(m-2) (m-2)*1 (m-2)*1 1*(m-2) 1*1 1*1 Paper wrong

Calculate eigenvector One of the eigenvalue= x - y

Replace the x and y

Calculate eigenvector: Let norm=1

then eigenvector of C assume is other eigenvectors of C (eigenvectors are orthogonal)

We get: (Lemma1) the vector is a left singular vector of A the corresponding singular value is for all other left singular vectors of A , , that is, the last two entries are equal.

1 term 1 term 2 point of fall-off is different for every term pair!

Adding perturbations The perfectly related terms is robust under small perturbations of the term-document matrix. Lemma2: 如果perfectly related terms 可以忍受一些雜訊那麼我們就不一定要找到perfectly related terms 有些微的差異也是可以忍受的

Adding perturbations K most significant left singular vectors of A
Orthogonal k*k matrix Any matrix, and F: Forbenius norm F<1/4 By Stewart’s theorem on perturbation of symmetric matrixes A B C a b c p m n x Get

Adding perturbations Let Cauchy’s inequality

up-and-then-down shape remains
Adding perturbations up-and-then-down shape remains Sufficiently small perturbations change the curve of relatedness scores only little at any dimension before it fall-off.

Curves for unrelated terms
co-occur Co-occurrence graph: vertices = terms edge = two terms co-occur Lemma3: We call two terms perfectly unrelated if no path connects them in the graph term proven shape for perfectly unrelated terms provably small change after slight perturbation half way to a real matrix 200 400 600 subspace dimension 200 400 600 subspace dimension 200 400 600 subspace dimension expansion matrix entry curves for unrelated terms are random oscillations around zero

3 Lemma review Lemma1 (perfectly related terms)
the vector is a left singular vector of A the corresponding singular value is for all other left singular vectors of A , , that is, the last two entries are equal. Lemma2 (Add some perturbations) Lemma3 (definition unrelated terms) We call two terms perfectly unrelated if no path connects them in the graph

Dimensionless algorithm
Choose dimension: Every choice is inappropriate for a significant fraction of the term pair.

Dimensionless algorithm

Algorithm TN (dimensionless)
Normalize the length of A to length 1 Compute SVD of the normalize A For each pair of term i,j compute the size of the set Perform document expansion with zero-one matrix T ( if and only if ) represents two terms related The corresponding curve of relatedness scores is never negative

Telling the shapes apart — TN
Normalize term-document matrix so that theoretical point of fall-off is equal for all term pairs For each term pair: if curve is never negative before this point, set entry in expansion matrix to 1, otherwise to 0 expansion matrix entry set entry to 1 set entry to 1 set entry to 0 200 400 600 200 400 600 200 400 600 subspace dimension subspace dimension subspace dimension a simple 0-1 classification, no fractional entries!

Algorithm TN (dimensionless)
Lemma1: all perfectly related terms , Lemma2: these assignment to T are invariant under small perturbations of the underlying term-document matrix. Lemma3: completely unrelated terms have all-zero curves, ( , )

Algorithm TS (dimensionless)
Compute the same matrix U as for TN Disadvantage: need to find s. Advantage: finding s is more intuitive than previous method. (fixed dimension) For each pair of term i,j compute the smoothness of their curve as Perform document expansion with zero-one matrix T ( if and only if ) if only if the scores go only up or only down zig-zags s: smoothness threshold This experiments set s: 0.2% of the entries in T are 1.

An alternative algorithm — TM
Again, normalize term-document matrix so that theoretical point of fall-off is equal for all term pairs For each term pair compute the monotonicity of its initial curve (= 1 if perfectly monotone,  0 as number of turns increase) If monotonicity is above some threshold, set entry in expansion matrix to 1, otherwise to 0 0.69 0.69 0.07 0.07 0.82 0.82 expansion matrix entry One more slide with the sij replaced by Tij (or one after the other) set entry to 1 set entry to 1 set entry to 0 200 400 600 200 400 600 200 400 600 subspace dimension subspace dimension subspace dimension again: a simple 0-1 classification!

Computation complexity
Nonzero entries Original LSI: because of TN/TS: the can save by discarding pairs of terms that do not co-occur in at least one document. Average number of related terms of a term

Experimental evaluation
Three collection: Time collection (3882*425) <83queries> Reuters collection (5701*21578) <120queries> Topic labels Ohsumed collection (99117*233445) <63queries>

Experimental evaluation
Use other spectral retrieval schemes from the literal: LSI LSI-RN: term normalized variant CORR: correlation method IRR: iterative residual rescaling Baseline method: COS (cosine similarity) COS,LSI,IRR,LSI-RN: standard tf-idf matrix CORR,TN,TS: row-normalized matrix

Result of the experiments
dimension300 dimension400

Result of the experiments
dim800 dim1000 dim1000 dim1200

Experimental results COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2%
(average precision) 425 docs 3882 terms COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2% 62.8% 58.6% 59.1% 62.2% 64.9% 64.1% Baseline: cosine similarity in term space Latent Semantic Indexing Dumais et al. 1990 Term-normalized LSI Ding et al. 2001 Correlation-based LSI Dupret et al. 2001 Mention unpredictable effects of LSI and relatives: sometimes even below COS baseline! Iterative Residual Rescaling Ando & Lee 2001 our non-negativity test our monotonicity test * the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!

Experimental results COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2%
(average precision) 425 docs 3882 terms 21578 docs 5701 terms docs terms COS LSI* LSI-RN* CORR* IRR* TN TM TIME 63.2% 62.8% 58.6% 59.1% 62.2% 64.9% 64.1% REUTERS 36.2% 32.0% 37.0% 32.3% —— 41.9% 42.9% OHSUMED 13.2% 6.9% 13.0% 10.9% —— 14.4% 15.3% Mention unpredictable effects of LSI and relatives: sometimes even below COS baseline! * the numbers for LSI, LSI-RN, CORR, IRR are for the best subspace dimension!

Binary vs. fractional relatedness
The finding is that algorithms which do a simple binary classification into related and unrelated term pairs outperform schemes which seem to have additional power by giving a fractional assessment for each term pair.

Binary vs. fractional relatedness
Most curves have scores at or below zero at either very few dimensions or at quite a lot of dimensions. 表示大部份的term有完美的相關性或者完全無關

Conclusions and outlook
This paper introduced the curves of relatedness scores as a new angle of looking at retrieval. Dimensionless algorithm outperform previous schemes, and more intuitive. Now we use symmetric matrix, but one of the relation between terms is asymmetric. Such as “nucleic” (核酸) and “acid” (酸).

Why Spectral Retrieval Works

Similar presentations

Presentation on theme: "Why Spectral Retrieval Works"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Why Spectral Retrieval Works

Similar presentations

Presentation on theme: "Why Spectral Retrieval Works"— Presentation transcript:

Similar presentations

About project

Feedback