A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES

Slides:

Advertisements

Similar presentations

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Advertisements

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.

1 Top-k Spatial Joins

Nearest Neighbor Queries using R-trees

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.

1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)

Reverse Furthest Neighbors in Spatial Databases Bin Yao, Feifei Li, Piyush Kumar Florida State University, USA.

Efﬁcient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.

Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

2-dimensional indexing structure

1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)

Spatial Indexing for NN retrieval

Spatial Queries Nearest Neighbor and Join Queries.

1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.

Spatial Queries Nearest Neighbor Queries.

A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.

Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)

C o n f i d e n t i a l HOME NEXT Subject Name: Data Structure Using C Unit Title: Graphs.

1 GRAPHS - ADVANCED APPLICATIONS Minimim Spanning Trees Shortest Path Transitive Closure.

1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.

The University of Hong Kong 1 Capacity Constrained Assignment in Spatial Databases Authors: Leong Hou U, University of Hong Kong Man Lung Yiu, Aalborg.

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

Data Structure Introduction.

1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.

Efficient Processing of Top-k Spatial Preference Queries

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

Probabilistic Contextual Skylines D. Sacharidis 1, A. Arvanitis 12, T. Sellis 12 1 Institute for the Management of Information Systems — “Athena” R.C.,

Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.

Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree ： An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.

Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Dept. of Computer Science Assignment Problems in Spatial Databases Kamiru Leong Hou, U Supervisor: Nikos Mamoulis Probation Talk:

Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Data Structures Using C, 2e

Spatial Queries Nearest Neighbor and Join Queries.

Database Management System

External Sorting Chapter 13

Lectures Queues Chapter 8 of textbook 1. Concepts of queue

Nearest Neighbor Queries using R-trees

External Sorting Chapter 13

Lecture 2- Query Processing (continued)

Minimum Spanning Tree Algorithms

Chapter 12 Query Processing (1)

Efficient Processing of Top-k Spatial Preference Queries

External Sorting Chapter 13

Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)

Presentation transcript:

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Rohit Anurag, Rahul Nayak

Scenario Some users want to select objects with specific features, based on their preferences These requests are performed as database queries Queries express users’ preferences by different weights on the attributes of the searched objects These are the so-called Preference Queries

A FAIR ASSIGNMENT PROBLEM Scenario The result of a preference query is the object in the database with the highest aggregate score If multiple preference queries are issued simultaneously, an object may be the best solution for many of them: Who will be coupled to the object? Which results will receive other users? A FAIR ASSIGNMENT PROBLEM

Scenario - Example Internship assignment, based on student’s preferences in terms of: nature of the job Salary office location other features… For a single student the system returns a set of top-k results with respect of his/her preference function An available internship position could be the top-1 choice of many interested students. It can only be assigned to one of them

Scenario - Example Internship assignment, based on student’s preferences in terms of: nature of the job salary office location other features… Users’ preference functions f1=0.8X+0.2Y f2=0.5X+0.5Y Positions’ attributes a=(0.5,0.6) b=(0.2,0.7) c=(0.8,0.2) d=(0.4,0.4) Best point (standing) Y f2 b f1 a d c (salary) X

Related Algorithms 1-1 assignment problem is related to three types of search: Spatial Assignment problem (model: SMP) Chain Algorithm Skyline Queries Branch-and-Bound Skyline Algorithm Top-k Search Threshold Algorithm

Spatial Assignment Problem – Chain Algorithm Its goal is to find a stable pair Its preference function is based on Euclidean distance a prefers b’ to b if dist(a,b’) < dist(a,b) A pair (a,b) is stable if and only if a’s closest object is b and b’s closest object is a, where a and b are among the unassigned (remaining) objects in A and B Chain algorithm: pick an object from A (randomly) or Q; find the NN (Nearest Neighbour) of a∈ A (aB∈ B); find the NN a’ ∈ A of aB∈ B; if a ≠ a’, aBis pushed into a queue Q; otherwise pair (a,aB) is output as the result pair and a, aB are removed from A and B.

Skyline Queries – BBS Algorithm A different approach exploits the set’s skyline concept The skyline of O consists of all points o ∈ O that are not dominated by any other point in O. It’s faster if the objects are indexed by an R-Tree BBS algorithm: Compute the skyline of O by accessing the minimum number of R- tree nodes it is I/O optimal Access the node of the tree in ascending distance order from the sky point Sky point is the (imaginary) most preferable object possible. Once a data object is found, it is added to the skyline and all R-tree nodes/subtrees dominated by it are pruned.

BBS Algorithm–Example sky M1 M2 M3 m1 m2 m3 m4 m5 m6 m7 a m2 M1 M3 m7 c d e g h a c d e i j l k m b f ... m3 m6 b g m1 f h i j INN Heap = {e, i, m1, m2, M2,M3} INN Heap = {a} INN Heap = {m1, m2, m3, M2,M3} INN Heap = {M1, M2,M3} INN Heap = {m2} m4 l k Osky = {e, a} Osky = {e} m5 M2 m

Top-k search – Threshold Algorithm O is a collection of n objects, an object o has D attributes D S1, S2, …, SD sorted lists, one for each attribute, ordered by the atomic scores A top-k query, based on an aggregate function f, retrieves a k-subset Otopk of O (k<n), such that f(o) ≥ f(o’), ∀o ∈ Otopk, o’ ∈ (O−Otopk) The most used algorithm for top-k queries is Threshold Algorithm (TA) pops objects form the sorted lists in round-robin manner for each object o, f(o) is computed The set of k objects with the highest score is maintained the search terminates when the k-th score is greater than or equal to threshold T

Problem Statement A set of user preference function F over a set of multidimensional objects O. The score f(o) of an object o is: Our goal is to find stable 1-1 matching between F and O A function-object pair (f, o) in F × O is stable, if there is no function f’ ∈ F, f’ ≠ f, f’(o) > f(o) and there is no object o’ ∈ O, o’ ≠ o, f(o’) > f(o), where F and O are the sets of the unassigned (remaining) functions and objects.

Algorithms – Brute Force Search Assumption: F kept in memory, O indexed by an R-tree (Ro) on the disk Progressive technique Issue top-1 queries against O, one for every function in F (|F| pairs) The pair (f,o) with the highest f(o) value should be stable o is the top-1 preference of f f’(o) cannot be greater than f(o) for any function f’ ≠ f After the pair (f,o) is added to the query result o is removed from Ro If o was the top-1 object for another function f’ ≠ f, top-1 search must be re-applied for f’ Improvements: maintaining the search heap for each top-1 query, the search can resume Drawback: large amount of memory!

Algorithms – Skyline-Based Search Assumption: if F contains only monotone function, than the top-1 objects should be in Osky Stable function-object pairs between Osky and F are found and output Osky is computed and maintained SB(set F, R-tree Ro) Osky := ∅ while |F| > 0 do First we compute the skyline Osky Osky := ComputeSkyline(RO) Then while there are unassigned functions the pair (f,o) with the highest f(o) score is found UpdateSkyline(Osky, o, RO) (f,o):=BestPair(F, Osky) Output (f,o) Finally, f and o are removed from F and O, and Oskyis updated F := F-f; O := O-o; Osky:= Osky-o

Algorithms – Skyline-Based Search (Example) Ricordati di dire BAUUUUUUUUUUUUUUUUUUUU

Implementation – UpdateSkyline (Example) To minimize the tree traversal cost during skyline maintenance, the dominated objects by o are pruned and these entries are added to the pruned list o.plist To minimize the required memory, each pruned object is kept in the plist of only one skyline object Scand := ∅ algorithm UpdateSkyline(set Osky, object o,R-tree RO) new Osky :=ResumeSkyline(Scand , Osky) algorithm ResumeSkyline(set Scand, set Osky) while S is not empty do else ⊳ not dominated by any skyline object else Scand :={E|E ∈ o.plist, E ∉o’.plist, ∀o’ ∈ Osky } c b m1 M2 de-heap top entry E of Scand d a if E is dominated by any o ∈ Oskythen add E to o.plist M3 if E is non-leaf entry then visit node N pointed by E for all entries E’ ∈ N do push E’ into Scand Osky :=Osky ∪ E Scand = {d} Scand = {M3, a, b, d} Scand = {M2, M3, a, b, d} Scand = {c, M2, M3, a, b, d} Scand = {m1, c, M2, M3} Scand = {a, b, d} Scand = {b, d} Scand = {} Osky = {a, b, c} Osky = {c} Osky = {a, c} c.plist = {M2} c.plist = {M2, M3} b.plist = {d}

Algorithms – Skyline-Based Search (Optimization) The numbers of loops required can be reduced if multiple stable object-function pairs are output at each loop SB(set F, R-tree RO) Osky :=∅; Odel := ∅ while |F | > 0 do ⊳more unassigned functions if Osky =∅ then Osky :=ComputeSkyline(RO) else UpdateSkyline(Osky, Odel, RO) Odel := ∅ Fbest :=∅ for all o ∈ Osky do ﬁnd function o.fbest∈F that maximizes f(o) Fbest :=Fbest ∪ o.fbest for all f ∈ Fbest do ﬁnd object f.obest∈Osky that maximizes f(o) for all f∈Fbest do if (f.obest).fbest=f then F := F − f ; O := O − f.obest Osky := Osky−f.obest; Odel := Odel∪f.obest Fbest is the subset of F that includes the functions o.fbest that maximize f(o) For each f∈Fbest, the object f.obest that maximizes f(o) is coupled with the function f If (f.obest).fbest=f, then (f, f.obest) is stable and the function/object is removed from F/O and Osky At least one pair is guaranteed to be output

Experiments – |F| and |O| Dependency PAUSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Experiments – Dimensionality D PAUSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Conclusions SB is proven to be: I/O optimal by using an incremental skyline maintenance algorithm, which is proven to be I/O optimal CPU optimal by accelerating the matching between functions and skyline objects and identifying multiple stable pairs in each iteration

THANK YOU FOR YOUR ATTENTION Conclusions THANK YOU FOR YOUR ATTENTION Fine!! Dedicated to Chip…. RIP