Download presentation
Presentation is loading. Please wait.
Published byAugustus Gaines Modified over 9 years ago
1
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin Lin 1 1 The University of New South Wales, Australia 2 Monash University, Australia 3 The University of Technology, Australia
2
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse Top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
3
Faculty of Information Technology Influence Set Influence Influence Set
4
Faculty of Information Technology Influence Set A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g., Distance Rating Price Important facility?
5
Faculty of Information Technology Influence Set Important to identify potential users/customers Used in various applications such as marketing, cluster and outlier analysis, and decision support systems Significance Types
6
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
7
Faculty of Information Technology Reverse k Nearest Neighbors (RkNN) Definition of importance –A facility f is important to a user if f is one of its k closest facilities Reverse k Nearest Neighbors –Find every user u for which the query facility q is important, i.e., q is one of its k-closest facilities. Influence set of f 1 is {u 1,u 2 } Influence set of f 2 is {u 3 } K=1 u2u2 f1f1 f2f2 u1u1 u3u3
8
Faculty of Information Technology RkNN Algorithms Pruning Verification Half-space Region-based TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011) Six-regions (SIGMOD 2000) SLICE (ICDE 2014) Six-regions (Stanoi et al., SIGMOD 2000) TPL (Tao et al., VLDB 2004) FINCH (Wu et al., VLDB 2008) Boost (Emrich et al., SIGMOD 2010) InfZone (Cheema et al., ICDE2011) SLICE (Yang et al., ICDE 2014)
9
Faculty of Information Technology Regions-based Pruning: - Six-regions [Stanoi et al., SIGMOD 2000] 1.Divide the whole space centred at the query q into six equal regions 2.Find the k-th nearest neighbor in each Partition. 3.The k-th nearest facility of q in each region defines the area that can be pruned k=2 The user points that cannot be pruned should be verified by range query b a c d q u1u1 u2u2 RkNN Algorithms
10
Faculty of Information Technology Half-space Pruning: the space that is contained by k half- spaces can be pruned -TPL [Tao et al., VLDB 2004] 1.Find the nearest facility f in the unpruned area. 2.Draw a bisector between q and f, prune by using the half-space 3.Go to step 1 unless all facilities in the unpruned area have been accessed k=2 b a c d q RkNN Algorithms u Checking which k-half spaces prune a point/node is expensive TPL ++ [Yang et al., PVLDB 2015]
11
Faculty of Information Technology FINCH [Wu et al., VLDB 2008] –Approximate the unpruned area by a convex polygon k=2 b a c d q RkNN Algorithms
12
Faculty of Information Technology InfZone [Cheema et al., ICDE 2011] 1.The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. 2.A user u is a RkNN of q if and only if u lies inside the influence zone 3.No verification phase. k=2 b a c d q RkNN Algorithms
13
Faculty of Information Technology SLICE [Yang et al., ICDE 2014] 1.Divide the whole space centred at the query q into t equal regions 2.Draw arcs for each facility 3.k-th arc in each partition defines the pruning region Pruning requires checking only one distance RkNN Algorithms q f1f1 f2f2 k=2
14
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
15
Faculty of Information Technology Influence Set based on Reverse Top-k Definition of importance –Each user u has a preference function –A facility f is important to a user u if f is one of the top-k facilities for u Reverse Top-k Query (RTk) –Find every user u for which the query facility q is one of her top-k facilities. Influence set of f 1 is {u 2 } Influence set of f 2 is {u 1,u 3 } K=1 u2u2 f1f1 f2f2 u1u1 u3u3 Price=1 Price=2 2 3 0.9*price + 0.1*distance 0.5*price + 0.5*distance 1*distance
16
Faculty of Information Technology Existing work on Reverse Top-k Vlachou et al., “Reverse top-k queries”, ICDE 2010 Chester et al., “Indexing reverse top-k queries in two dimensions,” DASFAA 2013 Cheema et al., “A Unified Framework for Efficiently Processing Ranking Related Queries”, EDBT 2014 Vlachou et al., “Branch-and-bound algorithm for reverse top-k queries”, SIGMOD 2013 Ge et al., “Efficient all top-k computation: A unified solution for all top-k, reverse top-k and top-m influential queries”, TKDE 2013. Vlachou et al., “Monitoring reverse top-k queries over mobile devices”, MobiDE 2011 Yu et al., “Processing a large number of continuous preference top-k queries”, SIGMOD 2012
17
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
18
Faculty of Information Technology Influence Set based on Reverse Skyline Dominance A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y Definition of importance A facility f is important to a user u if f is not dominated by any other facility Reverse Skyline Find every user u for which the query facility q is not dominated by any other facility. Influence set of f 1 is {u 1,u 2 } Influence set of f 2 is {u 1,u 2,u 3 } u2u2 f1f1 f2f2 u1u1 u3u3 Price=1 Price=2
19
Faculty of Information Technology Existing work on Reverse Skylines Dellis et al., “Efficient computation of reverse skyline queries”, VLDB 2007 Lian et al., “Reverse skyline search in uncertain databases”, TODS 2010 Prasad et al., “Efficient reverse skyline retrieval with arbitrary non-metric similarity measures”, EDBT 2011 Wang et al., “Energy-efficient reverse skyline queries processing over wireless sensor networks”, TKDE 2012 Wu et al., “Finding the influence set through skylines”, EDBT 2009
20
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
21
Faculty of Information Technology Representative Objects Given a set of facilities and a set of users, choose t representative facilities considering coverage and diversity Coverage Let I(f) denote the influence set of a facility. Given a set of facilities F, its coverage is the measure of total number of distinct users that are influenced by the facilities in F Koh et al., “Finding k most favorite products based on reverse top-t queries”, VLDB J. 2014 Gkorgkas et al., “ Finding the most diverse products using preference queries”, EDBT 2015
22
Faculty of Information Technology Representative Objects Diversity Let I(f) denote the influence set of a facility. Dissimilarity between two facilities is defined based on the Jaccard similarity of their influence sets Diversity of a set of facility F is the minimum of the pair-wise dissimilarities between the facilities in the set
23
Faculty of Information Technology Representative Objects Problem Definition Score of a set of facilities F is Given a set of facilities and a set of users, return a set of t facilities with maximum score.
24
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
25
Faculty of Information Technology Techniques Challenges Problem is NP-Hard Requires computing influence sets for many facilities Requires set intersection and union operations to compute diversity
26
Faculty of Information Technology Techniques Phase 1: Compute influence sets Prune the facilities that cannot be among the representative facilities Compute influence sets of remaining facilities Phase 2: Greedy Algorithm Iteratively select a facility f that maximizes the score of current set Stop when t facilities have been selected
27
Faculty of Information Technology Techniques Phase 1: Compute influence sets Prune the facilities that cannot be among the representative facilities Compute influence sets of remaining facilities 1.Apply existing reverse top-k algorithm for each remaining facility 2.Compute top-k facilities for each user and populate the influence sets of each facility a)Use branch-and-bound top-k algorithm for each user b)Use brute-force algorithm to compute top-k for each user RTK TK NBF
28
Faculty of Information Technology Techniques Phase 2: Greedy Algorithm Iteratively select a facility f that maximizes the score of current set Stop when t facilities have been selected Selecting f requires computing set intersection and union operations 1.Compute exact set operations 2.Compute approximate set intersection and union ESO MK
29
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
30
Faculty of Information Technology Experimental Results
31
Faculty of Information Technology Experimental Results
32
Faculty of Information Technology Outline Influence Sets Reverse k Nearest Neighbors Queries Reverse top-k Queries Reverse Skyline Queries Representative Objects using Influence Sets Techniques Experiment Results Summary
33
Faculty of Information Technology Summary We studied the problem of computing representative objects using influence sets based on reverse top-k queries Proposed a two phase greedy algorithm with approximation guarantee Experimental results demonstrate that the greedy algorithms produce high quality results
34
Faculty of Information Technology Thanks
35
Faculty of Information Technology Future Work Compute representative objects where influence sets are computed based on RkNN queries or reverse skyline Develop efficient techniques to compute the influence sets of a large number of facilities in a batch Develop techniques to compute reverse top-k queries where distance between users and facilities is considered
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.