Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO.

Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE

Content 1.About the problem 2.Basic concepts 3.Previous work 4.Our technique 5.Experiments 6.Conclusion and future wok

Proximity Searching Huge Database Exact searching is not possible Expensive distance

Applications Retrieval Information Classification People finder through the web Clustering Currently used on –Classification of Spider’s web –Face recognition on Chilean’s Web

Problems (metric spaces) Index Extraction of characteristics Complex objects High dimension Memory limited Huge databases

Terminology Queries –Range query –K nearest neighbor Properties Symmetry Strict possitiveness Triangle inequality

Previous work Pivot based Partition based Pivot distance q

Previous work Pivot based Partition based centro q

Our technique Permutation Permutant p3 p2 p5 P4 P6 u P1

Our technique Exact matching elements have the same permutation Similar elements must have a similar permutation (we guess) Spearman footrule metric –Measures the similarity of the permutations –Promissority elements first

Spearman Footrule metric Example 3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4 Difference of positions

Searching process (1a. part) Preprocessing time Permutant p1 p2 p3 p3,p1,p2 p3,p2,p1 p2,p1,p3 p2,p3,p1

Searching process (2a. part) Query time Permutant p1 p2 p3 p3,p1,p2 p3,p2,p1 p2,p1,p3 p2,p3,p1 q p2,p1,p3 Sorting elements by Spearman Footrule metric p2,p1,p3 p2,p3,p1 ….. p3,p1,p2

Experiments 93% retrieved, comparing 10% of database 90% retrieved, comparing 60% of database Pivot based algorithm Retrieved 48% %retrieved

Experiments 100% retrieved, comparing 15% of database 100% retrieved, comparing 90% of database %retrieved

How good is our prediction? retrieved Dimension 256, using 256 pivots Percentage of the database compared Metric algorithms are using one of them

Similarities between permutations Almost the same value

Conclusion A new probabilistic algorithm for proximity searching in metric space. Our technique is based on permutations. Close elements will have similar permutations. This technique is the fastest known algorithm for high dimension. Permutations are good predictor

Future Work Can Non-metric spaces be tackled with this technique? Approximated all K Nearest neighbor algorithm. Improving other metric indexes.

Thank you UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE Kfiguero@dcc.uchile.cl

Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO.

Similar presentations

Presentation on theme: "Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO.

Similar presentations

Presentation on theme: "Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO."— Presentation transcript:

Similar presentations

About project

Feedback