Download presentation
Presentation is loading. Please wait.
Published byEvangeline May Modified over 6 years ago
1
Spectral Approaches to Nearest Neighbor Search [FOCS 2014]
Robert Krauthgamer (Weizmann Institute) Joint with: Amirali Abdullah, Alexandr Andoni, Ravi Kannan Simons Institute, Nov. 2016
2
Nearest Neighbor Search (NNS)
Preprocess: a set π of π points in β π Query: given a query point π, report a point π β βπ with the smallest distance to π π β π
3
Motivation Generic setup: Application areas: Distance can be:
Points model objects (e.g. images) Distance models (dis)similarity measure Application areas: machine learning: k-NN rule signal processing, vector quantization, bioinformatics, etcβ¦ Distance can be: Hamming, Euclidean, edit distance, earth-mover distance, β¦ 000000 011100 010100 000100 011111 000000 001100 000100 110100 111111 π β π
4
Curse of Dimensionality
All exact algorithms degrade rapidly with the dimension π Algorithm Query time Space Full indexing π(log πβ
π) π π(π) (Voronoi diagram size) No indexing β linear scan π(πβ
π)
5
Approximate NNS Given a query point π, report π β² βπ s.t. πβ²βπ β€π min π β π β βπ πβ₯1 : approximation factor randomized: return such πβ² with probability β₯90% Heuristic perspective: gives a set of candidates (hopefully small) π β π π β²
6
NNS algorithms Itβs all about space partitions ! Low-dimensional
[Arya-Mountβ93], [Clarksonβ94], [Arya-Mount- Netanyahu-Silverman-Weβ98], [Kleinbergβ97], [HarPeledβ02],[Arya-Fonseca-Mountβ11],β¦ High-dimensional [Indyk-Motwaniβ98], [Kushilevitz-Ostrovsky- Rabaniβ98], [Indykβ98, β01], [Gionis-Indyk- Motwaniβ99], [Charikarβ02], [Datar-Immorlica- Indyk-Mirrokniβ04], [Chakrabarti-Regevβ04], [Panigrahyβ06], [Ailon-Chazelleβ06], [Andoni- Indykβ06], [Andoni-Indyk-Nguyen- Razenshteynβ14], [Andoni-Razenshteynβ15]
7
Low-dimensional kd-trees,β¦ π=1+π runtime: π βπ(π) β
log π
8
High-dimensional Locality-Sensitive Hashing
Crucial use of random projections Johnson-Lindenstrauss Lemma: project to random subspace of dimension π( π β2 log π ) for 1+π approximation Runtime: π 1/π for π-approximation
9
Practice Data-aware partitions optimize the partition to your dataset
PCA-tree [Sproullβ91, McNamesβ01, Verma-Kpotufe-Dasguptaβ09] randomized kd-trees [SilpaAnan-Hartleyβ08, Muja-Loweβ09] spectral/PCA/semantic/WTA hashing [Weiss-Torralba-Fergusβ08, Wang-Kumar-Changβ09, Salakhutdinov-Hintonβ09, Yagnik-Strelow-Ross- Linβ11]
10
Practice vs Theory Data-aware projections often outperform (vanilla) random-projection methods But no guarantees (correctness or performance) JL generally optimal [Alonβ03, Jayram-Woodruffβ11] Even for some NNS setups! [Andoni-Indyk-Patrascuβ06] Why do data-aware projections outperform random projections ? Algorithmic framework to study phenomenon?
11
Plan for the rest Model Two spectral algorithms Conclusion
12
Our model βlow-dimensional signal + large noiseβ
inside high dimensional space Signal: πβπ for subspace πβ β π of dimension πβͺπ Data: each point in π is perturbed by a full-dimensional Gaussian noise π π (0, π 2 πΌ π ) π
13
Model properties Data π =π+πΊ Query π =π+ π π s.t.:
each point in P have at least unit norm Query π =π+ π π s.t.: ||πβ π β ||β€1 for βnearest neighborβ π β ||πβπ||β₯1+π for everybody else Noise entries π(0, π 2 ) πβ 1 π 1/4 up to factor poly( π β1 π log π) Claim: exact nearest neighbor is still the same Noise is large: has magnitude π π β π 1/4 β«1 top π dimensions of π capture sub-constant mass JL would not work: after noise, gap very close to 1
14
Algorithms via PCA Find the βsignal subspaceβ π ?
then can project everything to π and solve NNS there Use Principal Component Analysis (PCA)? β extract top direction(s) from SVD e.g., π-dimensional space π that minimizes πβπ π 2 (π,π) If PCA removes noise βperfectlyβ, we are done: π=π Can reduce to π-dimensional NNS
15
NNS performance as if we are in π dimensions for full model?
Best we can hope for dataset contains a βworst-caseβ π-dimensional instance Effectively reduce dimension π to π Spoiler: Yes NNS performance as if we are in π dimensions for full model?
16
PCA under noise fails Does PCA find βsignal subspaceβ π under noise ?
PCA minimizes πβπ π 2 (π,π) good only on βaverageβ, not βworst-caseβ weak signal directions overpowered by noise directions typical noise direction contributes π=1 π π π 2 π 2 =Ξ(π π 2 ) π β
17
1st Algorithm: intuition
Extract βwell-captured pointsβ points with signal mostly inside top PCA space should work for large fraction of points Iterate on the rest π β
18
Iterative PCA To make this work: Find top PCA subspace π
Nearly no noise in π: ensuring π close to π π determined by heavy-enough spectral directions (dimension may be less than π) Capture only points whose signal fully in π well-captured: distance to π explained by noise only Find top PCA subspace π πΆ=points well-captured by π Build NNS d.s. on {πΆ projected onto π} Iterate on the remaining points, π βπΆ Query: query each NNS d.s. separately
19
Simpler model Find top-k PCA subspace π πΆ=points well-captured by π
Build NNS on {πΆ projected onto π} Iterate on remaining points, π βπΆ Query: query each NNS separately Assume: small noise π π = π π + πΌ π , where || πΌ π ||βͺπ can be even adversarial Algorithm: well-captured if π π ,π β€2πΌ Claim 1: if π β captured by πΆ, will find it in NNS for any captured π : || π π β π π ||=|| π βπ||Β±4πΌ=||πβπ||Β±5πΌ Claim 2: number of iterations is π( log π ) π β π π 2 ( π ,π) β€ π β π π 2 π ,π β€πβ
πΌ 2 for at most 1/4-fraction of points, π 2 π ,π β₯4 πΌ 2 hence constant fraction captured in each iteration
20
Analysis of general model
Noise is larger, must use that it is random βSignalβ should be stronger than βnoiseβ (on average) Use random matrix theory π =π+πΊ πΊ is random πΓπ with entries π(0, π 2 ) All singular values π 2 β€ π 2 πβπ/ π π has rank β€π and (Frobenius-norm)2 β₯π important directions have π 2 β₯Ξ©(π/π) can ignore directions with π 2 βͺππ/π Important signal directions stronger than noise!
21
Closeness of subspaces ?
Trickier than singular values Top singular vector not stable under perturbation! Only stable if second singular value much smaller How to even define βclosenessβ of subspaces? To the rescue: Wedinβs sin-theta theorem sin π π,π = max π₯βπ |π₯|=1 min π¦βπ ||π₯βπ¦|| π π π
22
Wedinβs sin-theta theorem
Developed by [Davis-Kahanβ70], [Wedinβ72] Theorem: Consider π =π+πΊ π is top-π subspace of π π is the π-space containing π Then: sin π π,π β€ ||πΊ|| π π (π) Another way to see why we need to take directions with sufficiently heavy singular values π
23
Additional issue: Conditioning
After an iteration, the noise is not random anymore! non-captured points might be βbiasedβ by capturing criterion Fix: estimate top PCA subspace from a small sample of the data Might be purely due to analysis But does not sound like a bad idea in practice either
24
Performance of Iterative PCA
Can prove there are π π log π iterations In each, we have NNS in β€π dimensional space Overall query time: π 1 π π π β
π β
log 3/2 π Reduced to π π log π instances of π-dimension NNS!
25
2nd Algorithm: PCA-tree
Closer to algorithms used in practice Find top PCA direction π£ Partition into slabs β₯π£ Snap points to β₯ hyperplane Recurse on each slice Query: follow all tree paths that may contain π β βπ/π
26
Two algorithmic modifications
Centering: Need to use centered PCA (subtract average) Otherwise errors from perturbations accumulate Sparsification: Need to sparsify the set of points in each node of the tree Otherwise can get a βdenseβ cluster: not enough variance in signal lots of noise Find top PCA direction π£ Partition into slabs β₯π£ Snap points to β₯ hyperplanes Recurse on each slice Query: follow all tree paths that may contain π β
27
Analysis An βextremeβ version of Iterative PCA Algorithm:
just use the top PCA direction: guaranteed to have signal ! Main lemma: the tree depth is β€2π because each discovered direction close to π snapping: like orthogonalizing with respect to each one cannot have too many such directions Query runtime: π π π 2π Overall performs like π(πβ
log π)-dimensional NNS!
28
Wrap-up Recent development: Here: Immediate questions:
Why do data-aware projections outperform random projections ? Algorithmic framework to study phenomenon? Recent development: Data-aware worst-case algorithm [Andoni-Razenshteinβ15] Here: Model: βlow-dimensional signal + large noiseβ like NNS in low dimensional space via βrightβ adaptation of PCA Immediate questions: Other, less-structured signal/noise models? Algorithms with runtime dependent on spectrum? Broader Q: Analysis that explains empirical success?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.