Download presentation
Presentation is loading. Please wait.
Published byAbner Blair Modified over 9 years ago
1
Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California at Santa Barbara 36 th ACM International Conference on Information Retrieval
2
Definition: Finding pairs of objects whose similarity is above a certain threshold. Application examples: Collaborative filtering. Spam and near duplicate detection. Image search. Query suggestions. Motivation: APSS still time consuming for large datasets. All Pairs Similarity Search (APSS) ≥ τ Sim (d i,d j ) = cos(d i,d j ) 2
3
Previous Work Approaches to speedup APSS: Exact APSS: –Dynamic Computation Filtering. [ Bayardo et al. WWW’07 ] –Inverted indexing. [Arasu et al. VLDB’06] –Parallelization with MapReduce. [Lin SIGIR’09] –Partition-based similarity comparison [Maha WSDM’13] Approximate APSS via LSH : Tradeoff between precision and recall plus addition of redundant computations. Approaches that utilize memory hierarchy: General query processing [ Manegold VLDB02 ] Other computing problems. 3
4
Baseline: Partition-based Similarity Search (PSS) Partitioning with dissimilarity detection Similarity comparison with parallel tasks [WSDM’13] 4
5
PSS Task Read assigned partition into area S. Repeat Read some vectors v i from other partitions Compare v i with S Output similar vector pairs Until other potentially similar vectors are compared. Memory areas: S = vectors owned, B = other vectors, C = temporary. Task steps: 5
6
Focus and Contribution Contribution: Analyze memory hierarchy behavior in PSS tasks. New data layout/traversal techniques for speedup: ① Splitting data blocks to fit cache. ② Coalescing: read a block of vectors from other partitions and process them together. Algorithms: Baseline: PSS [WSDM’13] Cache-conscious designs: PSS1 & PSS2 6
7
PROBLEM1: PSS area S is too big to fit in cache Other vectors B C Inverted index of vectors … … Accumulator for S … S … … … … … Too Long to fit in cache! 7
8
PSS1: Cache-conscious data splitting B Accumulator for Si C … S1S1 … S2S2 SqSq aaa aa … … aaa … After splitting: … … Split Size? 8
9
PSS1 Task Compare (S x, B) PSS1 Task Compare(S x, B) Read S and divide into many splits Read other vectors into B … for d i in S x for d j in B Sim(d i,d j ) += w i,t * w j,t if( sim(d i,d j ) + maxw d i * sum d j <t) then … Output similarity scores For each split S x 9
10
Modeling Memory/Cache Access of PSS1 Area S i Area B Area C Sim(d i,d j ) + = w i,t * w j,t if( sim(di,dj) + maxw d i * sum d j < T ) then Total number of data accesses : D 0 = D 0 (S i ) + D 0 (B)+D 0 (C) 10
11
Cache misses and data access time D 0 : total memory data accesses. Memory and cache access counts: D 1 : missed access at L1 D 2 : missed access at L2 D 3 : missed access at L3 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem δ i : access time at cache level i δ mem : access time in memory. Memory and cache access time: 11
12
Total data access time Data found in L1 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem ~2 cycles
13
Total data access time Data found in L2 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem 6-10 cycles
14
Total data access time Data found in L3 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem 30-40 cycles
15
Total data access time Data found in memory Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem 100- 300 cycles
16
Actual vs. Predicted Avg. task time ≈ #features * ( lookup + multiply + add) + access mem 13
17
RECALL: Split size s B Accumulator for Si C … S1S1 … S2S2 SqSq aaa aa … … aaa … … … Split Size s
18
Ratio of Data Access to Computation Avg. task time ≈ #features * ( lookup + add+multiply) + access mem Data access computation Data access Split size s 15
19
PSS2: Vector coalescing Issues: PSS1 focused on splitting S to fit into cache. PSS1 does not consider cache reuse to improve temporal locality in memory areas B and C. Solution: coalescing multiple vectors in B
20
PSS2: Example for improved locality SiSi … …… C … … … B … Striped areas in cache 16
21
Evaluation Implementation: Hadoop MapReduce. Objectives: Effectiveness of PSS1, PSS2 over PSS. Benefits of modeling. Datasets: Twitter, Clueweb, Enron emails, YahooMusic, Google news. Preprocessing: Stopword removal + df-cut. Static partitioning for dissimilarity detection.
22
Improvement Ratio of PSS1,PSS2 over PSS 2.7x 18
23
RECALL: coalescing size b SiSi … …… C … … … B … b … Avg. # of sharing= 2 18
24
Average number of shared features 19
25
Overall performance
26
Clueweb
27
Impact of split size s in PSS1 Clueweb Twitter Emails
28
RECALL: split size s & coalescing size b SiSi … …… C … … … B … b s 20
29
Affect of s & b on PSS2 performance (Twitter) fastest 21
30
Conclusions Splitting hosted partitions to fit into cache reduces slow memory data access (PSS1) Coalescing vectors with size-controlled inverted indexing can improve the temporal locality of visited data.(PSS2) Cost modeling for memory hierarchy access is a guidance to optimize parameter setting. Experiments show cache-conscious design can be upto 2.74x as fast as the cache-oblivious baseline.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.