1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.

Slides:

Advertisements

Similar presentations

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

Advertisements

Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.

Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

1 Top-k Spatial Joins

Nearest Neighbor Queries using R-trees

Comp 122, Spring 2004 Binary Search Trees. btrees - 2 Comp 122, Spring 2004 Binary Trees  Recursive definition 1.An empty tree is a binary tree 2.A node.

Nearest Neighbor Queries using R-trees Based on notes from G. Kollios.

5-1 Chapter 5 Tree Searching Strategies. 5-2 Satisfiability problem Tree representation of 8 assignments. If there are n variables x 1, x 2, …,x n, then.

Branch & Bound Algorithms

Efﬁcient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.

2-dimensional indexing structure

1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.

Spatial Indexing for NN retrieval

On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.

1 R-Trees for Spatial Indexing Yanlei Diao UMass Amherst Feb 27, 2007 Some Slide Content Courtesy of J.M. Hellerstein.

Spatial Queries Nearest Neighbor Queries.

A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.

Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)

Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.

1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.

The University of Hong Kong 1 Capacity Constrained Assignment in Spatial Databases Authors: Leong Hou U, University of Hong Kong Man Lung Yiu, Aalborg.

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

1 Heaps and Priority Queues Starring: Min Heap Co-Starring: Max Heap.

1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

Efficient Processing of Top-k Spatial Preference Queries

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.

Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.

On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:

9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.

The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

1 Heaps and Priority Queues v2 Starring: Min Heap Co-Starring: Max Heap.

Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.

Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Internal and External Sorting External Searching

Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Dept. of Computer Science Assignment Problems in Spatial Databases Kamiru Leong Hou, U Supervisor: Nikos Mamoulis Probation Talk:

Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.

Navigation Piles with Applications to Sorting, Priority Queues, and Priority Deques Jyrki Katajainen and Fabio Vitale Department of Computing, University.

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Data Structures Using C, 2e

Spatial Queries Nearest Neighbor and Join Queries.

Multiway Search Trees Data may not fit into main memory

Database Management System

Priority Queues An abstract data type (ADT) Similar to a queue

Courtsey & Copyright: DESIGN AND ANALYSIS OF ALGORITHMS Courtsey & Copyright:

Nearest Neighbor Queries using R-trees

Spatio-temporal Pattern Queries

original list {67, 33,49, 21, 25, 94} pass { } {67 94}

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Lecture 2- Query Processing (continued)

Spatial Indexing I R-trees

Priority Queues An abstract data type (ADT) Similar to a queue

Lecture 13: Query Execution

Efficient Processing of Top-k Spatial Preference Queries

CMPT 225 Lecture 16 – Heap Sort.

Presentation transcript:

1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone Manco

2 Scenario Some users want to select objects with specific features, based on their preferences These requests are performed as database queries Queries express users’ preferences by different weights on the attributes of the searched objects These are the so-called Preference Queries

3 Scenario The result of a preference query is the object in the database with the highest aggregate score If multiple preference queries are issued simultaneously, an object may be the best solution for many of them: –Who will be coupled to the object? –Which results will receive other users? A FAIR ASSIGNMENT PROBLEM

4 Scenario - Example Internship assignment, based on student’s preferences in terms of: –nature of the job –Salary –office location –other features… For a single student the system returns a set of top-k results with respect of his/her preference function An available internship position could be the top-1 choice of many interested students. It can only be assigned to one of them The system must look for a fair 1-1 matching between the users and the objects –Stable Marriage Problem (SMP)

Scenario - Example 5 Best point Internship assignment, based on student’s preferences in terms of: –nature of the job –salary –office location –other features… b a d c f1f1 f2f2 Users’ preference functions f 1 =0.8X+0.2Y f 2 =0.5X+0.5Y Positions’ attributes a=(0.5,0.6) b=(0.2,0.7) c=(0.8,0.2) d=(0.4,0.4) (salary) X (standing) Y

6 Related Algorithms 1-1 assignment problem is related to three types of search: –Spatial Assignment problem (model: SMP) Chain Algorithm –Skyline Queries Branch-and-Bound Skyline Algorithm –Top-k Search Threshold Algorithm Stablepair:Given two datasets A and B, a 1-1 matching M is stable if there are no two pairs (a, b) and (a’, b’ ) in M, such that a prefers b’ to b, and b prefers a to a ‘(where a, a’ ∈ A and b, b’ ∈ B).

Spatial Assignment Problem – Chain Algorithm Its goal is to find a stable pair Its preference function is based on Euclidean distance –a prefers b’ to b if dist(a,b’) < dist(a,b) A pair (a,b) is stable if and only if a’s closest object is b and b’s closest object is a, where a and b are among the unassigned (remaining) objects in A and B o Chain algorithm: 1.pick an object from A (randomly) or Q; 2.find the NN (Nearest Neighbour) of a ∈ A (a B ∈ B); 3.find the NN a’ ∈ A of a B ∈ B; 4.if a ≠ a’, a B is pushed into a queue Q; otherwise pair (a,a B ) is output as the result pair and a, a B are removed from A and B. 7

Skyline Queries – BBS Algorithm A different approach exploits the set’s skyline concept –The skyline of O consists of all points o ∈ O that are not dominated by any other point in O. It’s faster if the objects are indexed by an R-Tree o BBS algorithm: 1.Compute the skyline of O by accessing the minimum number of R-tree nodes it is I/O optimal 2.Access the node of the tree in ascending distance order from the sky point Sky point is the (imaginary) most preferable object possible. 3.Once a data object is found, it is added to the skyline and all R-tree nodes/subtrees dominated by it are pruned. 8

BBS Algorithm–Example 9 sky M3 M2 M1 m5 m4 m7 m6 m2 m1 m3 a c d g h i e b f j l k m M1M2M3 m1m2m3m4m5m6m7 g ha c de ij lk mb f... INN Heap = {M1, M2,M3}INN Heap = {m1, m2, m3, M2,M3}INN Heap = {e, i, m1, m2, M2,M3} O sky = {e}O sky = {e, a} INN Heap = {m2}INN Heap = {a}

Skyline Queries –DeltaSky Algorithm Is used in a dinamic dataset, where objects can be added/removed It determines the intersection between MBR and EDR without explicity calculating the EDR itself 10 For each deletion in O sky, DeltaSky Traverse the R-Tree once If more deletion are performed, DeltaSky incurs in high I/O cost EDR: Exclusive Dominance Region MBR: Minimum Bounding Rectangle

Top-k search – Threshold Algorithm O is a collection of n objects, an object o has D attributes D S 1, S 2, …, S D sorted lists, one for each attribute, ordered by the atomic scores A top-k query, based on an aggregate function f, retrieves a k-subset O topk of O (k<n), such that f(o) ≥ f(o’), ∀ o ∈ O topk, o’ ∈ (O−O topk ) The most used algorithm for top-k queries is Threshold Algorithm (TA) –pops objects form the sorted lists in round-robin manner –for each object o, f(o) is computed –The set of k objects with the highest score is maintained –the search terminates when the k-th score is greater than or equal to threshold T 11

Top-k search - BRS & Onion Branch-and-bound Ranked Search: 1.Visit R-tree nodes in an order determined by a preference function f 2.Maxscore(M): is an upper bound of the score for any object inside the MBR M 3.Nodes are accessed in descending maxscore order 4.Terminate when the score of the k-th best object is no smaller than the next node’s maxscore. Onion: 1.Compute the convex hull of the data objects and set it as the layer 2.Remove the hull object 3.Expand the layers from the first one moving inwards 12

Problem Statement A set of user preference function F over a set of multidimensional objects O. The score f(o) of an object o is: Our goal is to find stable 1-1 matching between F and O A function-object pair (f, o) in F × O is stable, if there is no function f’ ∈ F, f’ ≠ f, f’(o) > f(o) and there is no object o’ ∈ O, o’ ≠ o, f(o’) > f(o), where F and O are the sets of the unassigned (remaining) functions and objects. 13

Algorithms – Brute Force Search 14 Assumption: F kept in memory, O indexed by an R-tree (R o ) on the disk Progressive technique Issue top-1 queries against O, one for every function in F (|F| pairs) The pair (f,o) with the highest f(o) value should be stable –o is the top-1 preference of f –f’(o) cannot be greater than f(o) for any function f’ ≠ f After the pair (f,o) is added to the query result –o is removed from R o –If o was the top-1 object for another function f’ ≠ f, top-1 search must be re-applied for f’ Improvements: maintaining the search heap for each top-1 query, the search can resume –Drawback: large amount of memory!

Algorithms – Skyline-Based Search 15 Assumption: if F contains only monotone function, than the top-1 objects should be in O sky Stable function-object pairs between O sky and F are found and output –O sky is computed and maintained First we compute the skyline O sky SB(set F, R-tree R o ) O sky := ∅ while |F| > 0 do UpdateSkyline(O sky, o, R O ) Then while there are unassigned functions the pair (f,o) with the highest f(o) score is found (f,o):=BestPair(F, O sky ) Output (f,o) O sky := ComputeSkyline(R O ) F := F-f; O := O-o; O sky := O sky -o Finally, f and o are removed from F and O, and O sky is updated

Algorithms – Skyline-Based Search (Example) 16 sky

Implementation - BestPair 17 A brute force implementation is not efficient: –Requires |F| * |O sky | comparisons (cross product F x O sky ) Another approach is to index either F or O sky –The indexing of O sky is not practical (number of updates) –F is indexed since only one deletion is performed in it at each loop Functions are indexed as sorted lists, one for each coefficient It’s applied a reverse top-1 search on the lists, where the roles of objects and functions are swapped Each list L 1,…, L D (D is the dimensionality) holds the (f.α i,f) pairs of all functions f ∈ F, sorted on f.α i in descending order The threshold T can be calculated as –The sum of the coefficients could be greater than 1, then a normalization of the function is required Normalization algorithm 1.Rank dimensions in descending order based on o’s corresponding values 2.B=1, for each dimension i: β i = min{B,l i }, B = B-β i

Implementation - BestPair (Example) 18 o = (10,6,8)f a = 0.8X + 0.1Y + 0.1Z f b = 0.2X + 0.8Y + 0.0Z f c = 0.5X + 0.4Y + 0.1Z f d = 0.0X + 0.1Y + 0.9Z f e = 0.2X + 0.4Y + 0.4Z L1L1 L2L2 L3L3 f a (0.8)f b (0.8)f d (0.9) f c (0.5)f e (0.4) f e (0.2)f c (0.4)f c (0.1) f b (0.2)f d (0.1)f a (0.1) f d (0.0)f a (0.1)f b (0.0) f best = f a = 9.4 f a (o)=9.4 f b (o)=6.8f d (o)=7.8 l 1 =0.8, l 2 =0.8, l 3 =0.9 B=1 β 1 = min{B,l 1 } = 0.8B = B-0.8 = 0.2β 3 = min{B,l 3 } = 0.2 B=0 β 1 = 0.8, β 2 = 0, β 3 = 0.2 T tight = 9.6 f c (o)=8.2 l 1 =0.5, l 2 =0.8, l 3 =0.9 B=1β 1 = min{B,l 1 } = 0.5B = B-0.5 = 0.5β 3 = min{B,l 3 } = 0.5 B=0 β 1 = 0.5, β 2 = 0, β 3 = 0.5 T tight = 9

Implementation - BestPair (Improvements) 19 TA access order –The accessing order changes from Round-Robin to l i *o i descending values order (l i is the last value seen in each L i ) Resuming search –The state of the previous applied search for the object in O sky is stored and the search can be resumed, if necessary –The drawback of this method is the extra memory required Iterative solution: the queue’s maximum capacity is set to Ω = ω * |F| –the queue stores only the top-Ω functions –Ω is decreased by 1 when an element is popped from the queue; if Ω=0, its value is reset to ω * |F| –this allow to control the tradeoff between execution time and memory usage

Implementation – UpdateSkyline (Example) 20 To minimize the tree traversal cost during skyline maintenance, the dominated objects by o are pruned and these entries are added to the pruned list o.plist To minimize the required memory, each pruned object is kept in the plistof only one skyline object m1 M2 M3 c a b d S cand = {m1, c, M2, M3}S cand = {c, M2, M3, a, b, d}S cand = {M2, M3, a, b, d} O sky = {c} S cand = {M3, a, b, d}S cand = {a, b, d}S cand = {b, d}S cand = {d} O sky = {a, c}O sky = {a, b, c} S cand = {} c.plist = {M2}c.plist = {M2, M3} b.plist = {d} S cand := ∅ algorithm UpdateSkyline(set O sky, object o,R-tree R O ) new O sky :=ResumeSkyline(S cand, O sky ) algorithm ResumeSkyline(set S cand, set O sky ) while Q is not empty do else ⊳ not dominated by any skyline object else S cand :={E|E ∈ o.plist, E ∉ o’.plist, ∀ o’ ∈ O sky } de-heap top entry E of S cand if E is non-leaf entry then for all entries E’ ∈ N do visit node N pointed by E O sky :=O sky ∪ E if E is dominated by any o ∈ O sky then add E to o.plist push E’ into S cand

Algorithms – Skyline-Based Search (Optimization) 21 The numbers of loops required can be reduced if multiple stable object-function pairs are output at each loop SB(set F, R-tree R O ) O sky := ∅ ; O del := ∅ while |F | > 0 do ⊳ more unassigned functions if O sky = ∅ then O sky :=ComputeSkyline(R O ) else UpdateSkyline(O sky, O del, R O ) O del := ∅ F best := ∅ for all o ∈ O sky do ﬁnd function o.f best ∈ F that maximizes f(o) F best :=F best ∪ o.f best for all f ∈ F best do ﬁnd object f.o best ∈ O sky that maximizes f(o) for all f ∈ F best do if (f.o best ).f best =f then F := F − f ; O := O − f.o best O sky := O sky −f.o best ; O del := O del ∪ f.o best F best is the subset of F that includes the functions o.f best that maximize f(o) For each f ∈ F best, the object f.o best that maximizes f(o) is coupled with the function f If (f.o best ).f best =f, then (f, f.o best ) is stable and the function/object is removed from F/O and O sky At least one pair is guaranteed to be output

Problem Variants 22 Objects and Functions with capacities –Multiple objects/functions may share the same features only one object/function with a capacity attribute –Once a pair is found, the capacity of f and o are reduced by 1 Functions with Different Priorities –f.γ is the priority of the function –To increase the efficiency of TA, a skyline F sky is built on the functions

Experiments 23 Three types of synthetic datasets: –independent values are generated uniformly and independently –correlated object’s values are close in all dimensions (if an object is good in one dimension, it is likely to be good on the other ones too) –anti-correlated objects that are good in one dimension tend to be poor in the other ones ParameterValues |F| (in thousands)1, 2.5, 5, 10, 20 |O| (in thousands)10, 50, 100, 200, 400 DimensionalityD3, 4, 5, 6 Capacityk1, 2, 4,8, 16 Function Piority γ1, 2, 4,8, 16

Experiments – |F| and |O| Dependency 24

Experiments – Dimensionality D 25

Experiments – Capacity k and Priority γ 26

Experiments – Real Data (Zillow and NBA) 27

Conclusions 28 SB is proven to be: –I/O optimal by using an incremental skyline maintenance algorithm, which is proven to be I/O optimal –CPU optimal by accelerating the matching between functions and skyline objects and identifying multiple stable pairs in each iteration

Conclusions 29 THANK YOU FOR YOUR ATTENTION Dedicated to Chip…. RIP