A Unified Framework for Efficiently Processing Ranking Related Queries

Slides:



Advertisements
Similar presentations
Multi-Guarded Safe Zone: An Effective Technique to Monitor Moving Circular Range Queries Presented By: Muhammad Aamir Cheema 1 Joint work with Ljiljana.
Advertisements

Computer Science and Engineering Diversified Spatial Keyword Search On Road Networks Chengyuan Zhang 1,Ying Zhang 2,1,Wenjie Zhang 1, Xuemin Lin 3,1, Muhammad.
1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
Identifying the Most Influential Data Objects with Reverse Top-k Queries By Akrivi Vlachou 1, Christos Doulkeridis 1, Kjetil Nørvag 1 and Yannis Kotidis.
Finding the Sites with Best Accessibilities to Amenities Qianlu Lin, Chuan Xiao, Muhammad Aamir Cheema and Wei Wang University of New South Wales, Australia.
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Searching on Multi-Dimensional Data
Click to edit Present’s Name SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries Shiyu Yang 1, Muhammad Aamir Cheema 2,1, Xuemin.
CircularTrip: An Effective Algorithm for Continuous kNN Queries Muhammad Aamir Cheema Database Research Group, The School of Computer Science and Engineering,
Reverse Furthest Neighbors in Spatial Databases Bin Yao, Feifei Li, Piyush Kumar Florida State University, USA.
Computational Geometry -- Voronoi Diagram
2-dimensional indexing structure
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Cache Oblivious Search Trees via Binary Trees of Small Height
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Outline Who am I? What is research? My Research Higher studies opportunities in Australia Getting jobs in IT industry Presented by: Muhammad Aamir Cheema,
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
RELAXED REVERSE NEAREST NEIGHBORS QUERIES Arif Hidayat Muhammad Aamir Cheema David Taniar.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Influence Zone: Efficiently Processing Reverse k Nearest Neighbors Queries Presented By: Muhammad Aamir Cheema Joint work with Xuemin Lin, Wenjie Zhang,
Efficient Processing of Top-k Spatial Preference Queries
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Bin Yao, Feifei Li, Piyush Kumar Presenter: Lian Liu.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Information Technology Influence Computation in Spatial Dabases Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Presented by: Dardan Xhymshiti Spring 2016:. Authors: Publication:  ICDM 2015 Type:  Research Paper 2 Michael ShekelyamGregor JosseMatthias Schubert.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Dense-Region Based Compact Data Cube
Spatial Data Management
A Unified Algorithm for Continuous Monitoring of Spatial Queries
Multiway Search Trees Data may not fit into main memory
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Redraw these graphs so that none of the line intersect except at the vertices B C D E F G H.
Stochastic Skyline Operator
TT-Join: Efficient Set Containment Join
Orthogonal Range Searching and Kd-Trees
Spatial Online Sampling and Aggregation
Localizing the Delaunay Triangulation and its Parallel Implementation
Finding Fastest Paths on A Road Network with Speed Patterns
Fast Nearest Neighbor Search on Road Networks
Graphs Chapter 11 Objectives Upon completion you will be able to:
Probabilistic n-of-N Skyline Computation over Uncertain Data Streams
Algorithm design (computational geometry)
Presented by: Mahady Hasan Joint work with
Range Queries on Uncertain Data
Continuous Density Queries for Moving Objects
Efficient Processing of Top-k Spatial Preference Queries
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

A Unified Framework for Efficiently Processing Ranking Related Queries Muhammad Aamir Cheema1, Zhitao Shen2, Xuemin Lin2, Wenjie Zhang2 1 Monash University, Australia 2 University of New South Wales, Australia

Outline Dual mapping and ranking K-lower envelope and its application in ranking Our contributions Highlights of our algorithms Experimental results Conclusions and future work Slide # 2

Dual mapping and ranking Given a point a=(u,v) and a weighting vector W=(w1, w2), a.score = u*w1 + v*w2 A point a=(u,v) is mapped to a line a*: y=ux + v in dual The weighting vector W=(w1, w2) is mapped to a vertical line W*: x=w1/w2 The intersection of a* and w* is the point where y= u(w1/w2)+ v = (u*w1 +v*w2))/w2 W*: x = w1/ w2 b* yb= b.score/w2 a a* b ya= a.score/w2 Primal Dual Slide # 3

Ranking in dual space Rank a b c d Rank d b a c W*: x = w1/ w2 Example Query: Given a weighted vector W=(w1,w2), return k objects with smallest scores Solution: Map W and all the objects to dual space Return k lowest lines intersecting W* Rank a b c d Rank d b a c W*: x = w1/ w2 W*: x = w3/ w4 c d a 2 1 b Primal Dual Slide # 4

k-lower envelope 2-lower envelope p p’ Given a set of lines L, mass of a point p is the number of lines that lie strictly below p k-lower envelope consists of every point p that lies on one of the lines in L and has mass equal to k-1. 2-lower envelope p p’ Slide # 5

k-lower envelope and ranking Top-k queries: Any top-k query involving any linear scoring function can be answered using k-lower envelope. c d a b Primal Dual Slide # 6

k-lower envelope and ranking Reverse top-k query: Given an object q, return the set of weighted vectors for which q is one of the top-k objects. Applications: Identify the users that may prefer the product q Solution: Compute the intersection between q* and k-lower envelope W*: x = w1/ w2 c d a q b Primal Dual Slide # 7

k-lower envelope and ranking k-snippet: Return all valuable objects where an object o is called valuable if it is among top-k objects for at least one scoring function Applications: A data summary such that every top-m (m≤k) query can be answered using this summary. Solution: Return objects that lie on or below k-lower envelope f e c d a b Primal Dual Slide # 8

k-lower envelope and other applications k-depth contour: Return an area such that an object o is valuable if and only if o is outside this area Ranking Outlier detection Reverse k furthest neighbors And more Voronoi-diagrams Half-space range searching and more … Slide # 9

Our contributions Existing algorithms to compute k-lower envelope assume data can fit in main memory are index-agnostic We propose two efficient index-aware secondary memory algorithms SkyRider – I/O and CPU efficient algorithm KnightRider – I/O optimal As a result of above, we are able to compute k-snippet (I/O optimal) k-depth contour (I/O optimal when node size > k) Reverse top-k query (up to two orders of magnitude better than state-of-the-art) Slide # 11

Rider: The Basic Idea c d a b Primal Dual Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope c d a b Primal Dual Slide # 12

Implementing Rider c d a b Primal Dual Line with k-th largest slope. Start from the left most point on k-lower envelope (always move towards right) Upon reaching an intersection Make a turn (i.e., leave the current road) The path travelled is the k-lower envelope Line with k-th largest slope. i.e., point in primal with k-th largest x-value c d A point (u,v) in primal is mapped to a line y=ux+v a b Primal Dual Slide # 13

SkyRider: An I/O efficient version of Rider Main observation: Only the points in primal space that are among k-skyband points are required to compute k-lower envelope Algorithm: Compute k-skyband using BBS Run Rider on k-skyband Slide # 14

KnightRider: An I/O optimal algorithm Must-first paradigm An entry is called a must entry, if the correctness cannot be guaranteed without accessing it. Algorithm Insert root node of R-tree in Q While Q is not empty Access the entries in Q Compute two approximations of k-lower envelope using accessed entries Q  the unaccessed must entries Return k-lower envelope Slide # 15

Experiments: Data Real data 5 Million POIs on the road network of California Each POI has two attributes: distance to nearest beach, distance to nearest airport Synthetic data Slide # 16

Experiments: Competitors BELT [H. Edelsbrunner and E. Welzl, “Constructing belts in two dimensional arrangements with applications,” SIAM J. Comput., 1986] FDC [T. Johnson, I. Kwok, and R. T. Ng, “Fast computation of 2-dimensional depth contours,” in KDD, 1998] FDC-Index (same as FDC but uses Index for computing convex hull) Slide # 17

Experiments: Results Effect of data size Slide # 18

Experiments: Results Effect of k Slide # 19

Experiments: Results Effect of data distribution Slide # 20

Experiments: Results Reverse top-k queries MRTopK [A. Vlachou, C. Doulkeridis, Y. Kotidis, and K. Nørvåg, “Reverse top-k queries,” in ICDE, 2010] Slide # 21

Conclusions and Future Work Contributions First to study index-aware algorithm for k-lower envelope with applications in ranking related queries Propose two efficient algorithms SkyRider and KinghtRider Proof of I/O optimality Algorithms are extendible to higher dimensionality Future work Propose approximate but efficient algorithms for higher dimensionality Slide # 22

Presented by Muhammad Aamir Cheema aamir.cheema@monash.edu http://users.monash.edu.au/~aamirc Twitter handle: @cheema154 Presented by Muhammad Aamir Cheema Slide # 23