© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,

Slides:

Advertisements

Similar presentations

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Advertisements

Table de multiplication, division, addition et soustraction.

Evaluation and Validation

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

Maximal Independent Subsets of Linear Spaces. Whats a linear space? Given a set of points V a set of lines where a line is a k-set of points each pair.

1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko.

Quid-Pro-Quo-tocols Strengthening Semi-Honest Protocols with Dual Execution Yan Huang 1, Jonathan Katz 2, David Evans 1 1. University of Virginia 2. University.

Computing with adversarial noise Aram Harrow (UW -> MIT) Matt Hastings (Duke/MSR) Anup Rao (UW)

ABSTRACT: We examine how to determine the number of states of a hidden variables when learning probabilistic models. This problem is crucial for improving.

Machine Learning Intro iCAMP 2012

1 Parallel Algorithms (chap. 30, 1 st edition) Parallel: perform more than one operation at a time. PRAM model: Parallel Random Access Model. p0p0 p1p1.

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.

1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.

Capacity-Approaching Codes for Reversible Data Hiding Weiming Zhang, Biao Chen, and Nenghai Yu Department of Electrical Engineering & Information Science.

William Liu1, Harsha Sirisena2, Krzysztof Pawlikowski2

Error-Correcting codes

K-means Clustering Ke Chen.

Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.

PATH INTEGRAL FORMULATION OF LIGHT TRANSPORT Jaroslav Křivánek Charles University in Prague

Addition 1’s to 20.

Test B, 100 Subtraction Facts

2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!

FIND THE AREA ( ROUND TO THE NEAREST TENTHS) 2.7 in 15 in in.

2 x /10/2015 Know Your Facts!. 8 x /10/2015 Know Your Facts!

2.4 – Factoring Polynomials Tricky Trinomials The Tabletop Method.

1/22 Worst and Best-Case Coverage in Sensor Networks Seapahn Meguerdichian, Farinaz Koushanfar, Miodrag Potkonjak, and Mani Srivastava IEEE TRANSACTIONS.

Introduction Distance-based Adaptable Similarity Search

Learning to Recommend Questions Based on User Ratings Ke Sun, Yunbo Cao, Xinying Song, Young-In Song, Xiaolong Wang and Chin-Yew Lin. In Proceeding of.

all-pairs shortest paths in undirected graphs

Distributed Constraint Satisfaction Problems M OHSEN A FSHARCHI.

Finding Skyline Nodes in Large Networks. Evaluation Metrics:  Distance from the query node. (John)  Coverage of the Query Topics. (Big Data, Cloud Computing,

5 x4. 10 x2 9 x3 10 x9 10 x4 10 x8 9 x2 9 x4.

Parallel algorithms for expression evaluation Part1. Simultaneous substitution method (SimSub) Part2. A parallel pebble game.

EMIS 8374 LP Review: The Ratio Test. 1 Main Steps of the Simplex Method 1.Put the problem in row-0 form. 2.Construct the simplex tableau. 3.Obtain an.

Linear Programming – Simplex Method: Computational Problems Breaking Ties in Selection of Non-Basic Variable – if tie for non-basic variable with largest.

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.

Multiplication Facts Practice

Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.

Computational Facility Layout

Graeme Henchel Multiples Graeme Henchel

0 x x2 0 0 x1 0 0 x3 0 1 x7 7 2 x0 0 9 x0 0.

Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.

T-SPaCS – A Two-Level Single-Pass Cache Simulation Methodology + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Wei Zang.

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.

Three things everyone should know to improve object retrieval

MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.

Similarity Search in High Dimensions via Hashing

Query Specific Fusion for Image Retrieval

Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.

Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)

J Cheng et al,. CVPR14 Hyunchul Yang( 양현철 )

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Indexing Techniques Mei-Chen Yeh.

Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Minimal Loss Hashing for Compact Binary Codes

NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.

© 2009 IBM Corporation IBM Research Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih-Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,

Sublinear Algorithmic Tools 3

Lecture 11: Nearest Neighbor Search

Presentation transcript:

© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York, NY, USA 3 Facebook, Menlo Park, CA, USA Reciprocal Hash Tables for Nearest Neighbor Search

Introduction – Nearest Neighbor Search – Motivation Reciprocal Hash Tables – Formulation – Solutions Experiments Conclusion Outline

Introduction: Nearest Neighbor Search (1) 3

Hash based nearest neighbor search – Locality sensitive hashing [Indyk and Motwani, 1998]: close points in the original space have similar hash codes Introduction: Nearest Neighbor Search (2) 4 x1x1 Xx1x1 x2x2 x3x3 x4x4 x5x5 h1h h2h h1h1 h2h2 ……………… hkhk …………… 010…100…111…001…110… x2x2 x3x3 x4x4 x5x5

Hash based nearest neighbor search – Compressed storage: binary codes – Efficient computations: hash table lookup or Hamming distance ranking based on binary operations Introduction: Nearest Neighbor Search (3) 5 … … wkwk 1 0/-1 HashingHash Table Bucket Indexed Image

Problems – build multiple hash tables and probe multiple buckets to improve the search performance [Gionis, Indyk, and Motwani, 1999; Lv et al. 2007] – not much research studies the general strategy for multiple hash table construction random selection: widely-used general strategy, usually need a large number of hash tables Motivation – Similar to the well-studied feature selection problem, select the most informative and independent hash functions support various types of hashing algorithms, different data sets and scenarios, etc. Introduction: Motivation 6 … Search results

7 Reciprocal Hash Tables: Formulation (1)

8 Reciprocal Hash Tables: Formulation (2)

9 Reciprocal Hash Tables: Formulation (3)

10 Reciprocal Hash Tables: Solutions (1)

11 Reciprocal Hash Tables: Solutions (2)

12 Reciprocal Hash Tables: Solutions (3)

Boosting style: try to correct the previous mistakes by updating weights on neighbor pairs in each round Sequential Strategy: Boosting x l1 x l2 x l3 … … x l1 x l2 x l3 … … x l1 x l2 x l3 … … 13 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 … x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 …x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 … > 0< 0= 0 similaritiesprediction errorupdated similarities

14 Reciprocal Hash Tables: Solutions (4)

Experiments Datasets – SIFT-1M: 1 Million 128-D SIFT – GIST-1M: 1 Million 960-D GIST Baselines: – Random selection Setting: – 10,000 training samples and 1,000 queries on each set – 100 neighbors and 200 non-neighbors for each training sample – The groundtruth for each query is defined as the top 5 nearest neighbors based on Euclidean distances – Average performance of 10 independent runs 15

16 Experiments: Over Basic Hashing Algorithms (1) Hash Lookup Evaluation the precision of RAND deceases dramatically with more hash tables, while (R)DHF increase their performance first and attain significant performance gains over RAND both methods faithfully improve the performance over RAND in terms of hash lookup.

17 Experiments: Over Basic Hashing Algorithms (2) Hamming Ranking Evaluation DHF and RDHF consistently achieve the best performance over LSH, KLSH and RMMH in most cases RDHF gains significant performance improvements over DHF

18 Experiments: Over Multiple Hashing Algorithms build multiple hash tables using different hashing algorithms with different settings, because many hashing algorithms are prevented from being directly used to construct multiple tables, due to the upper limit of the hash function number double bit (DB) quantization [Liu et al. 2011] on PCA-based Random Rotation Hashing (PCARDB) and Iterative Quantization (ITQDB) [Gong and Lazebnik 2011].

Summary and contributions – a unified strategy for hash table construction supporting different hashing algorithms and various scenarios. – two important selection criteria for hashing performance – formalize it as the dominant set problem in a vertex- and edge-weighted graph representing all pooled hash functions – a reciprocal strategy based on boosting to reduce the redundancy between hash tables Conclusion 19

Thank you!