Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

Slides:



Advertisements
Similar presentations
Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN Presented By: Ying Zhang Joint work with Muhammad Aamir Cheema, Xuemin Lin,
Advertisements

Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
Spatio-temporal Databases
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.
Jianzhong Qi Rui Zhang Lars Kulik Dan Lin Yuan Xue The Min-dist Location Selection Query University of Melbourne 14/05/2015.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
CircularTrip: An Effective Algorithm for Continuous kNN Queries Muhammad Aamir Cheema Database Research Group, The School of Computer Science and Engineering,
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,
Efficient Reverse k-Nearest Neighbors Retrieval with Local kNN-Distance Estimation Mike Lin.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Spatio-temporal Databases Time Parameterized Queries.
ISEE: Efficient k-Nearest-Neighbor Monitoring over Moving Obejcts [SSDBM 2007] Wei Wu, Kian-Lee Tan National University of Singapore.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Presented by: Duong, Huu Kinh Luan March 14 th, 2011.
Information Technology Trends in Location Based Services Muhammad Aamir Cheema Monash University, Australia Contact:
Outline Who am I? What is research? My Research Higher studies opportunities in Australia Getting jobs in IT industry Presented by: Muhammad Aamir Cheema,
Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Nearest Neighbor Searching Under Uncertainty
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
1 L AZY U PDATES : A N E FFICIENT T ECHNIQUE T O C ONTINUOUSLY M ONITORING R EVERSE K NN (PVLDB’09) Presented By: Jing LI Supervisor: Nikos Mamoulis.
Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,
Efficient Processing of Top-k Spatial Preference Queries
On Computing Top-t Influential Spatial Sites Authors: T. Xia, D. Zhang, E. Kanoulas, Y.Du Northeastern University, USA Appeared in: VLDB 2005 Presenter:
9/2/2005VLDB 2005, Trondheim, Norway1 On Computing Top-t Most Influential Spatial Sites Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du Northeastern.
Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Presented by: Dardan Xhymshiti Fall  Type: Research paper  Authors:  International conference on Very Large Data Bases. Yoonjar Park Seoul National.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Tian Xia and Donghui Zhang Northeastern University
A Unified Algorithm for Continuous Monitoring of Spatial Queries
A Unified Framework for Efficiently Processing Ranking Related Queries
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Progressive Computation of The Min-Dist Optimal-Location Query
Stochastic Skyline Operator
Probabilistic Data Management
TT-Join: Efficient Set Containment Join
Preference Query Evaluation Over Expensive Attributes
Spatio-temporal Databases
Probabilistic Data Management
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Presented by: Mahady Hasan Joint work with
Uncertain Data Mobile Group 报告人:郝兴.
Spatio-temporal Databases
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin Lin 1 1 The University of New South Wales, Australia 2 Monash University, Australia 3 The University of Technology, Australia

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse Top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Influence Set Influence Influence Set

Faculty of Information Technology Influence Set A facility f is important for u if it is one of the top-k facilities for a user u considering her preferences, e.g.,  Distance  Rating  Price Important facility?

Faculty of Information Technology Influence Set  Important to identify potential users/customers  Used in various applications such as marketing, cluster and outlier analysis, and decision support systems Significance Types

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Reverse k Nearest Neighbors (RkNN) Definition of importance –A facility f is important to a user if f is one of its k closest facilities Reverse k Nearest Neighbors –Find every user u for which the query facility q is important, i.e., q is one of its k-closest facilities. Influence set of f 1 is {u 1,u 2 } Influence set of f 2 is {u 3 } K=1 u2u2 f1f1 f2f2 u1u1 u3u3

Faculty of Information Technology RkNN Algorithms Pruning Verification Half-space Region-based TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011) Six-regions (SIGMOD 2000) SLICE (ICDE 2014) Six-regions (Stanoi et al., SIGMOD 2000) TPL (Tao et al., VLDB 2004) FINCH (Wu et al., VLDB 2008) Boost (Emrich et al., SIGMOD 2010) InfZone (Cheema et al., ICDE2011) SLICE (Yang et al., ICDE 2014)

Faculty of Information Technology Regions-based Pruning: - Six-regions [Stanoi et al., SIGMOD 2000] 1.Divide the whole space centred at the query q into six equal regions 2.Find the k-th nearest neighbor in each Partition. 3.The k-th nearest facility of q in each region defines the area that can be pruned k=2 The user points that cannot be pruned should be verified by range query b a c d q u1u1 u2u2 RkNN Algorithms

Faculty of Information Technology Half-space Pruning: the space that is contained by k half- spaces can be pruned -TPL [Tao et al., VLDB 2004] 1.Find the nearest facility f in the unpruned area. 2.Draw a bisector between q and f, prune by using the half-space 3.Go to step 1 unless all facilities in the unpruned area have been accessed k=2 b a c d q RkNN Algorithms u Checking which k-half spaces prune a point/node is expensive TPL ++ [Yang et al., PVLDB 2015]

Faculty of Information Technology FINCH [Wu et al., VLDB 2008] –Approximate the unpruned area by a convex polygon k=2 b a c d q RkNN Algorithms

Faculty of Information Technology InfZone [Cheema et al., ICDE 2011] 1.The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. 2.A user u is a RkNN of q if and only if u lies inside the influence zone 3.No verification phase. k=2 b a c d q RkNN Algorithms

Faculty of Information Technology SLICE [Yang et al., ICDE 2014] 1.Divide the whole space centred at the query q into t equal regions 2.Draw arcs for each facility 3.k-th arc in each partition defines the pruning region Pruning requires checking only one distance RkNN Algorithms q f1f1 f2f2 k=2

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Influence Set based on Reverse Top-k Definition of importance –Each user u has a preference function –A facility f is important to a user u if f is one of the top-k facilities for u Reverse Top-k Query (RTk) –Find every user u for which the query facility q is one of her top-k facilities. Influence set of f 1 is {u 2 } Influence set of f 2 is {u 1,u 3 } K=1 u2u2 f1f1 f2f2 u1u1 u3u3 Price=1 Price= *price + 0.1*distance 0.5*price + 0.5*distance 1*distance

Faculty of Information Technology Existing work on Reverse Top-k  Vlachou et al., “Reverse top-k queries”, ICDE 2010  Chester et al., “Indexing reverse top-k queries in two dimensions,” DASFAA 2013  Cheema et al., “A Unified Framework for Efficiently Processing Ranking Related Queries”, EDBT 2014  Vlachou et al., “Branch-and-bound algorithm for reverse top-k queries”, SIGMOD 2013  Ge et al., “Efficient all top-k computation: A unified solution for all top-k, reverse top-k and top-m influential queries”, TKDE  Vlachou et al., “Monitoring reverse top-k queries over mobile devices”, MobiDE 2011  Yu et al., “Processing a large number of continuous preference top-k queries”, SIGMOD 2012

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Influence Set based on Reverse Skyline Dominance  A facility x dominates another facility y w.r.t. a user u, if for every attribute, u prefers x over y Definition of importance  A facility f is important to a user u if f is not dominated by any other facility Reverse Skyline  Find every user u for which the query facility q is not dominated by any other facility. Influence set of f 1 is {u 1,u 2 } Influence set of f 2 is {u 1,u 2,u 3 } u2u2 f1f1 f2f2 u1u1 u3u3 Price=1 Price=2

Faculty of Information Technology Existing work on Reverse Skylines  Dellis et al., “Efficient computation of reverse skyline queries”, VLDB 2007  Lian et al., “Reverse skyline search in uncertain databases”, TODS 2010  Prasad et al., “Efficient reverse skyline retrieval with arbitrary non-metric similarity measures”, EDBT 2011  Wang et al., “Energy-efficient reverse skyline queries processing over wireless sensor networks”, TKDE 2012  Wu et al., “Finding the influence set through skylines”, EDBT 2009

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Representative Objects Given a set of facilities and a set of users, choose t representative facilities considering coverage and diversity Coverage  Let I(f) denote the influence set of a facility.  Given a set of facilities F, its coverage is the measure of total number of distinct users that are influenced by the facilities in F Koh et al., “Finding k most favorite products based on reverse top-t queries”, VLDB J Gkorgkas et al., “ Finding the most diverse products using preference queries”, EDBT 2015

Faculty of Information Technology Representative Objects Diversity  Let I(f) denote the influence set of a facility.  Dissimilarity between two facilities is defined based on the Jaccard similarity of their influence sets  Diversity of a set of facility F is the minimum of the pair-wise dissimilarities between the facilities in the set

Faculty of Information Technology Representative Objects Problem Definition  Score of a set of facilities F is  Given a set of facilities and a set of users, return a set of t facilities with maximum score.

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Techniques Challenges  Problem is NP-Hard  Requires computing influence sets for many facilities  Requires set intersection and union operations to compute diversity

Faculty of Information Technology Techniques Phase 1: Compute influence sets  Prune the facilities that cannot be among the representative facilities  Compute influence sets of remaining facilities Phase 2: Greedy Algorithm  Iteratively select a facility f that maximizes the score of current set  Stop when t facilities have been selected

Faculty of Information Technology Techniques Phase 1: Compute influence sets  Prune the facilities that cannot be among the representative facilities  Compute influence sets of remaining facilities 1.Apply existing reverse top-k algorithm for each remaining facility 2.Compute top-k facilities for each user and populate the influence sets of each facility a)Use branch-and-bound top-k algorithm for each user b)Use brute-force algorithm to compute top-k for each user RTK TK NBF

Faculty of Information Technology Techniques Phase 2: Greedy Algorithm  Iteratively select a facility f that maximizes the score of current set  Stop when t facilities have been selected  Selecting f requires computing set intersection and union operations 1.Compute exact set operations 2.Compute approximate set intersection and union ESO MK

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Experimental Results

Faculty of Information Technology Experimental Results

Faculty of Information Technology Outline  Influence Sets  Reverse k Nearest Neighbors Queries  Reverse top-k Queries  Reverse Skyline Queries  Representative Objects using Influence Sets  Techniques  Experiment Results  Summary

Faculty of Information Technology Summary  We studied the problem of computing representative objects using influence sets based on reverse top-k queries  Proposed a two phase greedy algorithm with approximation guarantee  Experimental results demonstrate that the greedy algorithms produce high quality results

Faculty of Information Technology Thanks

Faculty of Information Technology Future Work  Compute representative objects where influence sets are computed based on RkNN queries or reverse skyline  Develop efficient techniques to compute the influence sets of a large number of facilities in a batch  Develop techniques to compute reverse top-k queries where distance between users and facilities is considered