Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Machine Learning on Spark

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.

Yasuhiro Fujiwara (NTT Cyber Space Labs)

Clustering and Load Balancing Optimization for Redundant Content Removal Shanzhong Zhu (Ask.com) Alexandra Potapova, Maha Alabduljalil (Univ. of California.

Bahman Bahmani  Fundamental Tradeoffs  Drug Interaction Example [Adapted from Ullman’s slides, 2012]  Technique I: Grouping 

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Ivory : Ivory : Pairwise Document Similarity in Large Collection with MapReduce Tamer Elsayed, Jimmy Lin, and Doug Oard Laboratory for Computational Linguistics.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.

Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.

1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.

CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

Cache-Conscious Runtime Optimization for Ranking Ensembles Xun Tang, Xin Jin, Tao Yang Department of Computer Science University of California at Santa.

Samuli Laine: A General Algorithm for Output-Sensitive Visibility PreprocessingI3D 2005, April 3-6, Washington, D.C. A General Algorithm for Output- Sensitive.

DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.

Efficient Exact Similarity Searches using Multiple Token Orderings Jongik Kim 1 and Hongrae Lee 2 1 Chonbuk National University, South Korea 2 Google Inc.

Load Balancing for Partition-based Similarity Search Date : 2014/09/01 Author : Xun Tang, Maha Alabduljalil, Xin Jin, Tao Yang Source : SIGIR’14 Advisor.

Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.

Load Balancing for Partition-based Similarity Search Xun Tang, Maha Alabduljalil, Xin Jin, Tao Yang Department of Computer Science University of California.

Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.

Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.

Load Balancing for Partition-based Similarity Search Xun Tang, Maha Alabduljalil, Xin Jin, Tao Yang Department of Computer Science University of California.

The Simigle Image Search Engine Wei Dong

A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.

Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.

CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.

Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.

ISchool, Cloud Computing Class Talk, Oct 6 th Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective Tamer Elsayed,

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.

CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

Google News Personalization Big Data reading group November 12, 2007 Presented by Babu Pillai.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Sunpyo Hong, Hyesoon Kim

1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

An Efficient Algorithm for Incremental Update of Concept space

Efficient Multi-User Indexing for Secure Keyword Search

Optimizing Parallel Algorithms for All Pairs Similarity Search

Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.

Information Retrieval and Web Search

Efficient Similarity Search with Cache-Conscious Data Traversal

Collective Network Linkage across Heterogeneous Social Platforms

Sameh Shohdy, Yu Su, and Gagan Agrawal

Personalizing Search on Shared Devices

Department of Computer Science University of California, Santa Barbara

On Spatial Joins in MapReduce

Memory System Performance Chapter 3

Department of Computer Science University of California, Santa Barbara

LSH-based Motion Estimation

Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Presentation transcript:

Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California at Santa Barbara 36 th ACM International Conference on Information Retrieval

Definition: Finding pairs of objects whose similarity is above a certain threshold. Application examples: Collaborative filtering. Spam and near duplicate detection. Image search. Query suggestions. Motivation: APSS still time consuming for large datasets. All Pairs Similarity Search (APSS) ≥ τ Sim (d i,d j ) = cos(d i,d j ) 2

Previous Work Approaches to speedup APSS:  Exact APSS: –Dynamic Computation Filtering. [ Bayardo et al. WWW’07 ] –Inverted indexing. [Arasu et al. VLDB’06] –Parallelization with MapReduce. [Lin SIGIR’09] –Partition-based similarity comparison [Maha WSDM’13]  Approximate APSS via LSH : Tradeoff between precision and recall plus addition of redundant computations. Approaches that utilize memory hierarchy:  General query processing [ Manegold VLDB02 ]  Other computing problems. 3

Baseline: Partition-based Similarity Search (PSS) Partitioning with dissimilarity detection Similarity comparison with parallel tasks [WSDM’13] 4

PSS Task  Read assigned partition into area S.  Repeat  Read some vectors v i from other partitions  Compare v i with S  Output similar vector pairs Until other potentially similar vectors are compared. Memory areas: S = vectors owned, B = other vectors, C = temporary. Task steps: 5

Focus and Contribution Contribution:  Analyze memory hierarchy behavior in PSS tasks.  New data layout/traversal techniques for speedup: ① Splitting data blocks to fit cache. ② Coalescing: read a block of vectors from other partitions and process them together. Algorithms:  Baseline: PSS [WSDM’13]  Cache-conscious designs: PSS1 & PSS2 6

PROBLEM1: PSS area S is too big to fit in cache Other vectors B C Inverted index of vectors … … Accumulator for S … S … … … … … Too Long to fit in cache! 7

PSS1: Cache-conscious data splitting B Accumulator for Si C … S1S1 … S2S2 SqSq aaa aa … … aaa … After splitting: … … Split Size? 8

PSS1 Task Compare (S x, B) PSS1 Task Compare(S x, B) Read S and divide into many splits Read other vectors into B … for d i in S x for d j in B Sim(d i,d j ) += w i,t * w j,t if( sim(d i,d j ) + maxw d i * sum d j <t) then … Output similarity scores For each split S x 9

Modeling Memory/Cache Access of PSS1 Area S i Area B Area C Sim(d i,d j ) + = w i,t * w j,t if( sim(di,dj) + maxw d i * sum d j < T ) then Total number of data accesses : D 0 = D 0 (S i ) + D 0 (B)+D 0 (C) 10

Cache misses and data access time D 0 : total memory data accesses. Memory and cache access counts: D 1 : missed access at L1 D 2 : missed access at L2 D 3 : missed access at L3 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem δ i : access time at cache level i δ mem : access time in memory. Memory and cache access time: 11

Total data access time Data found in L1 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem ~2 cycles

Total data access time Data found in L2 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem 6-10 cycles

Total data access time Data found in L3 Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem cycles

Total data access time Data found in memory Total data access time = (D 0 -D 1 )δ 1 + (D 1 -D 2 )δ 2 + (D 2 -D 3 )δ 3 + D 3 δ mem cycles

Actual vs. Predicted Avg. task time ≈ #features * ( lookup + multiply + add) + access mem 13

RECALL: Split size s B Accumulator for Si C … S1S1 … S2S2 SqSq aaa aa … … aaa … … … Split Size s

Ratio of Data Access to Computation Avg. task time ≈ #features * ( lookup + add+multiply) + access mem Data access computation Data access Split size s 15

PSS2: Vector coalescing Issues: PSS1 focused on splitting S to fit into cache. PSS1 does not consider cache reuse to improve temporal locality in memory areas B and C. Solution: coalescing multiple vectors in B

PSS2: Example for improved locality SiSi … …… C … … … B … Striped areas in cache 16

Evaluation Implementation: Hadoop MapReduce. Objectives: Effectiveness of PSS1, PSS2 over PSS. Benefits of modeling. Datasets: Twitter, Clueweb, Enron s, YahooMusic, Google news. Preprocessing: Stopword removal + df-cut. Static partitioning for dissimilarity detection.

Improvement Ratio of PSS1,PSS2 over PSS 2.7x 18

RECALL: coalescing size b SiSi … …… C … … … B … b … Avg. # of sharing= 2 18

Average number of shared features 19

Overall performance

Clueweb

Impact of split size s in PSS1 Clueweb Twitter s

RECALL: split size s & coalescing size b SiSi … …… C … … … B … b s 20

Affect of s & b on PSS2 performance (Twitter) fastest 21

Conclusions Splitting hosted partitions to fit into cache reduces slow memory data access (PSS1) Coalescing vectors with size-controlled inverted indexing can improve the temporal locality of visited data.(PSS2) Cost modeling for memory hierarchy access is a guidance to optimize parameter setting. Experiments show cache-conscious design can be upto 2.74x as fast as the cache-oblivious baseline.