Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

Similar presentations


Presentation on theme: "Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin."— Presentation transcript:

1 Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin Lin 1 1 The University of New South Wales, Australia 2 Monash University, Australia 3 The University of Technology, Australia

2 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse Top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

3 Faculty of Information Technology Influence Set Influence Influence Set

4 Faculty of Information Technology Influence Set A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g.,  Distance  Rating  Price Important facility?

5 Faculty of Information Technology Influence Set  Important to identify potential users/customers  Used in various applications such as marketing, cluster and outlier analysis, and decision support systems Significance Types

6 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

7 Faculty of Information Technology Reverse k Nearest Neighbors (RkNN) Definition of importance –A facility f is important to a user if f is one of its k closest facilities Reverse k Nearest Neighbors –Find every user u for which the query facility q is important, i.e., q is one of its k-closest facilities. Influence set of f 1 is {u 1,u 2 } Influence set of f 2 is {u 3 } K=1 u2u2 f1f1 f2f2 u1u1 u3u3

8 Faculty of Information Technology RkNN Algorithms Pruning Verification Half-space Region-based TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011) Six-regions (SIGMOD 2000) SLICE (ICDE 2014) Six-regions (Stanoi et al., SIGMOD 2000) TPL (Tao et al., VLDB 2004) FINCH (Wu et al., VLDB 2008) Boost (Emrich et al., SIGMOD 2010) InfZone (Cheema et al., ICDE2011) SLICE (Yang et al., ICDE 2014)

9 Faculty of Information Technology Regions-based Pruning: - Six-regions [Stanoi et al., SIGMOD 2000] 1.Divide the whole space centred at the query q into six equal regions 2.Find the k-th nearest neighbor in each Partition. 3.The k-th nearest facility of q in each region defines the area that can be pruned k=2 The user points that cannot be pruned should be verified by range query b a c d q u1u1 u2u2 RkNN Algorithms

10 Faculty of Information Technology Half-space Pruning: the space that is contained by k half- spaces can be pruned -TPL [Tao et al., VLDB 2004] 1.Find the nearest facility f in the unpruned area. 2.Draw a bisector between q and f, prune by using the half-space 3.Go to step 1 unless all facilities in the unpruned area have been accessed k=2 b a c d q RkNN Algorithms u Checking which k-half spaces prune a point/node is expensive TPL ++ [Yang et al., PVLDB 2015]

11 Faculty of Information Technology FINCH [Wu et al., VLDB 2008] –Approximate the unpruned area by a convex polygon k=2 b a c d q RkNN Algorithms

12 Faculty of Information Technology InfZone [Cheema et al., ICDE 2011] 1.The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. 2.A user u is a RkNN of q if and only if u lies inside the influence zone 3.No verification phase. k=2 b a c d q RkNN Algorithms

13 Faculty of Information Technology SLICE [Yang et al., ICDE 2014] 1.Divide the whole space centred at the query q into t equal regions 2.Draw arcs for each facility 3.k-th arc in each partition defines the pruning region Pruning requires checking only one distance RkNN Algorithms q f1f1 f2f2 k=2

14 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

15 Faculty of Information Technology Influence Set based on Reverse Top-k Definition of importance –Each user u has a preference function –A facility f is important to a user u if f is one of the top-k facilities for u Reverse Top-k Query (RTk) –Find every user u for which the query facility q is one of her top-k facilities. Influence set of f 1 is {u 2 } Influence set of f 2 is {u 1,u 3 } K=1 u2u2 f1f1 f2f2 u1u1 u3u3 Price=1 Price=2 2 3 0.9*price + 0.1*distance 0.5*price + 0.5*distance 1*distance

16 Faculty of Information Technology Existing work on Reverse Top-k  Vlachou et al., “Reverse top-k queries”, ICDE 2010  Chester et al., “Indexing reverse top-k queries in two dimensions,” DASFAA 2013  Cheema et al., “A Unified Framework for Efficiently Processing Ranking Related Queries”, EDBT 2014  Vlachou et al., “Branch-and-bound algorithm for reverse top-k queries”, SIGMOD 2013  Ge et al., “Efficient all top-k computation: A unified solution for all top-k, reverse top-k and top-m influential queries”, TKDE 2013.  Vlachou et al., “Monitoring reverse top-k queries over mobile devices”, MobiDE 2011  Yu et al., “Processing a large number of continuous preference top-k queries”, SIGMOD 2012

17 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

18 Faculty of Information Technology Influence Set based on Reverse Skyline Dominance  A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y Definition of importance  A facility f is important to a user u if f is not dominated by any other facility Reverse Skyline  Find every user u for which the query facility q is not dominated by any other facility. Influence set of f 1 is {u 1,u 2 } Influence set of f 2 is {u 1,u 2,u 3 } u2u2 f1f1 f2f2 u1u1 u3u3 Price=1 Price=2

19 Faculty of Information Technology Existing work on Reverse Skylines  Dellis et al., “Efficient computation of reverse skyline queries”, VLDB 2007  Lian et al., “Reverse skyline search in uncertain databases”, TODS 2010  Prasad et al., “Efficient reverse skyline retrieval with arbitrary non-metric similarity measures”, EDBT 2011  Wang et al., “Energy-efficient reverse skyline queries processing over wireless sensor networks”, TKDE 2012  Wu et al., “Finding the influence set through skylines”, EDBT 2009

20 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

21 Faculty of Information Technology Representative Objects Given a set of facilities and a set of users, choose t representative facilities considering coverage and diversity Coverage  Let I(f) denote the influence set of a facility.  Given a set of facilities F, its coverage is the measure of total number of distinct users that are influenced by the facilities in F Koh et al., “Finding k most favorite products based on reverse top-t queries”, VLDB J. 2014 Gkorgkas et al., “ Finding the most diverse products using preference queries”, EDBT 2015

22 Faculty of Information Technology Representative Objects Diversity  Let I(f) denote the influence set of a facility.  Dissimilarity between two facilities is defined based on the Jaccard similarity of their influence sets  Diversity of a set of facility F is the minimum of the pair-wise dissimilarities between the facilities in the set

23 Faculty of Information Technology Representative Objects Problem Definition  Score of a set of facilities F is  Given a set of facilities and a set of users, return a set of t facilities with maximum score.

24 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

25 Faculty of Information Technology Techniques Challenges  Problem is NP-Hard  Requires computing influence sets for many facilities  Requires set intersection and union operations to compute diversity

26 Faculty of Information Technology Techniques Phase 1: Compute influence sets  Prune the facilities that cannot be among the representative facilities  Compute influence sets of remaining facilities Phase 2: Greedy Algorithm  Iteratively select a facility f that maximizes the score of current set  Stop when t facilities have been selected

27 Faculty of Information Technology Techniques Phase 1: Compute influence sets  Prune the facilities that cannot be among the representative facilities  Compute influence sets of remaining facilities 1.Apply existing reverse top-k algorithm for each remaining facility 2.Compute top-k facilities for each user and populate the influence sets of each facility a)Use branch-and-bound top-k algorithm for each user b)Use brute-force algorithm to compute top-k for each user RTK TK NBF

28 Faculty of Information Technology Techniques Phase 2: Greedy Algorithm  Iteratively select a facility f that maximizes the score of current set  Stop when t facilities have been selected  Selecting f requires computing set intersection and union operations 1.Compute exact set operations 2.Compute approximate set intersection and union ESO MK

29 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

30 Faculty of Information Technology Experimental Results

31 Faculty of Information Technology Experimental Results

32 Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

33 Faculty of Information Technology Summary  We studied the problem of computing representative objects using influence sets based on reverse top-k queries  Proposed a two phase greedy algorithm with approximation guarantee  Experimental results demonstrate that the greedy algorithms produce high quality results

34 Faculty of Information Technology Thanks

35 Faculty of Information Technology Future Work  Compute representative objects where influence sets are computed based on RkNN queries or reverse skyline  Develop efficient techniques to compute the influence sets of a large number of facilities in a batch  Develop techniques to compute reverse top-k queries where distance between users and facilities is considered


Download ppt "Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin."

Similar presentations


Ads by Google