Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Slides:



Advertisements
Similar presentations
Approximation algorithms for geometric intersection graphs.
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
ECE 667 Synthesis and Verification of Digital Circuits
Approximation Algorithms
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Leena Suhl University of Paderborn, Germany
VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
A Difference Resolution Approach to Compressing Access Control Lists
Fast Algorithms For Hierarchical Range Histogram Constructions
Approximations of points and polygonal chains
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries Vagelis Hristidis University of California, San Diego Nick Koudas AT&T.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.
1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.
Greedy vs Dynamic Programming Approach
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Polynomial-Time Approximation Schemes for Geometric Intersection Graphs Authors: T. Erlebach, L. Jansen, and E. Seidel Presented by: Ping Luo 10/17/2005.
Jan 6-10th, 2007VLSI Design A Reduced Complexity Algorithm for Minimizing N-Detect Tests Kalyana R. Kantipudi Vishwani D. Agrawal Department of Electrical.
Approximation Algorithms for MAX-MIN tiling Authors Piotr Berman, Bhaskar DasGupta, S. Muthukrishman S. Muthukrishman Published on Journal of Algorithms,
Improved Approximation Bounds for Planar Point Pattern Matching (under rigid motions) Minkyoung Cho Department of Computer Science University of Maryland.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
An Algorithm for the Coalitional Manipulation Problem under Maximin Michael Zuckerman, Omer Lev and Jeffrey S. Rosenschein (Simulations by Amitai Levy)
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
© J. Christopher Beck Lecture 6: Time/Cost Trade-off in Project Planning.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Honors Track: Competitive Programming & Problem Solving Seminar Topics Kevin Verbeek.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Dense-Region Based Compact Data Cube
Tian Xia and Donghui Zhang Northeastern University
Query Reranking As A Service
Discovering the Skyline of Web Databases
Stochastic Skyline Operator
Chart Packing Heuristic
Core-Sets and Geometric Optimization problems.
Polygonal Curve Simplification
Haim Kaplan and Uri Zwick
Preference Query Evaluation Over Expensive Attributes
CARPENTER Find Closed Patterns in Long Biological Datasets
Data Integration with Dependent Sources
Randomized Algorithms CS648
Finding Fastest Paths on A Road Network with Speed Patterns
Discovering Functional Communities in Social Media
Clustering.
Efficient Algorithms for the Weighted k-Center Problem on a Real Line
Approximation Algorithms
Consensus Partition Liang Zheng 5.21.
Fair Clustering through Fairlets ( NIPS 2017)
On the Designing of Popular Packages
An O(n log n)-Time Algorithm for the k-Center Problem in Trees
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
Clustering.
Approximation Algorithms for the Selection of Robust Tag SNPs
Computational Advertising and
Approximation Algorithms
Donghui Zhang, Tian Xia Northeastern University
Efficient Aggregation over Objects with Extent
Presentation transcript:

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS University of Texas at Arlington George Washington University SIGMOD’17 © 2017 ACM. ISBN 978-1-4503-4197-4/17/05

Outline Motivation and Problem statement 2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set) HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set) Experiments

𝑓=∑ 𝑤 𝑖 𝐴 𝑖 Maxima Queries … to give the best trade-off b/w price, duration, number of stops, … 𝑓=∑ 𝑤 𝑖 𝐴 𝑖

Example 𝑡 𝑖  𝑓=𝑥+𝑦 𝑓(𝑡 𝑖 ) 𝑓=𝑥+𝑦

Example Convex hull (sky convex)   

Example        A subset of skyline: the set of non-dominated points   

Example Convex hull (sky convex)  𝑓

Convex hull size Problem Curvature effect

Convex hull size Problem effect of the number of attributes (m)

Regret-Ratio Minimizing Set 𝑓 𝑡 −𝑓( 𝑡 ′ ) Problem: Find a subset of size at most r that minimizes the maximum Regret-ratio over all functions  𝑓 𝑡 −𝑓( 𝑡 ′ ) 𝑓(𝑡)  𝑓 𝑡 𝑓

Overview of the literature, Our contributions The regret-ratio notion and the problem was first proposed at [Nanongkai et. al. VLDB 2010]. In two dimensional data: [Chester et. al. VLDB 2014]: Sweeping line 𝑂(𝑟. 𝑛 2 ) We: a dynamic algorithm O r.s. log s . log c <O r.n. (log n ) 2 -- s: skyline size; c: convex hull size. In higher dimensional data: Complexity: NP-complete For arbitrary dimensions: [Chester et. al. VLDB 2014] Recently for fixed dimensions: [W. Cao et. al. ICDT 2017], [P. K. Agrawal et. al. Arxiv:1702.01446, 2017] Existing work: (a) a greedy heuristic with unproven theoretical guarantee, (b) a simple attribute space discretization with a fixed upper bound on the regret-ratio of output [Nanongkai et. al. VLDB 2010]. We: a linearithmic time approximation algorithm that guarantees a regret ratio, within any arbitrarily small user-controllable distance from the optimal regret ratio. Assumption: fixed number of dimensions

Outline Motivation and Problem statement 2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set) HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set) Experiments

High-level idea     t1 t2 t0 t3 t4 t5 t6 t7 Order the skyline points from top-left to bottom right, add two dummy points t0 and ts+1, and construct a complete weighted graph on these points t1  t2 t0  t3 Weight of an edge is the Max. regret ratio of removing all the points in its top-right half-space  t4  t5 t6 t7

High-level idea t0 t1 t2 t3 t4 t5 t6 t7 Order the skyline points from top-left to bottom right, add two dummy points t0 and ts+1, and construct a complete weighted graph on these points t0 t1 t2 t3 Weight of an edge is the Max. regret ratio of removing all the points in its top-right half-space  use binary search t4 t5 t6 t7

High-level idea     t1 t2 t0 t3 t4 t5 t6 t7 Order the skyline points from top-left to bottom right, add two dummy points t0 and ts+1, and construct a complete weighted graph on these points t1  t2 t0  t3 Weight of an edge is the Max. regret ratio of removing all the points in its top-right half-space  use binary search  t4  Apply the Dynamic programming, DP(ti,r’): optimal solution from ti to ts+1 with at most r’ intermediate steps 𝑂(𝑟.𝑠. log 𝑠 log 𝑐 ) t5 t6 t7

Outline Motivation and Problem statement 2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set) HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set) Experiments

Steps RRMS DMM MRST Start with a conceptual model Discuss its problems Propose the idea of function space discretization Transform RRMS to a Min Max problem MRST Define the intermediate problem “Min Rows Satisfying a Threshold” Transform MRST to a fixed-size instance of Set-cover problem

Regret-ratio on 𝑓 if only 𝑡 2 is remained Conceptual Model F (all possible functions) f 𝑡 1 𝑡 2 𝑡 𝑠 ... Transform the problem to a min-max problem Regret-ratio on 𝑓 if only 𝑡 2 is remained Problem1: F is continuous  infinite number of columns Matrix Discritization Problem2: Even if could construct the matrix, 𝑛 𝑟 to solve it Transform to fixed-size set-cover instances Max ( ) Min

Matrix Discretization 𝜃 2 f 𝛼= 𝜋 2𝛾 Arbitrarily small user-controllable distance from the optimal solution 𝜃 1

DMM: Discretized Min Max Problem F (discretized function space) Observation: the optimal regret-ratio is one of the cell values! Practical HD-RRMS: Use greedy approximate algorithm for solving the set-cover instances Accept a result if its size is at most 𝑟𝑚𝑙𝑜𝑔(𝛾): Index size increase, no change in quality of output Accept the result if size is at most r: index size does not change, output quality may increase. F (discretized function space) F (all possible functions) f Order the values in M. Do a binary search over the values and for each value f 𝑡 1 𝑡 2 𝑡 𝑠 ... 𝑡 𝑖 Define an intermediate problem: Min. rows satisfying the threshold (MRST) 1 if regret-ratio of t for f is at most threshold, 0 otherwise Convert M to a (fixed-size) binary matrix Convert MRST to a (fixed size) set-cover instance For fixed values of 𝑚 and 𝛾, can be solved in constant time.  The running time of HD-RRMS is 𝑂(𝑛 log 𝑛 ) Max ( ) Min

Outline Motivation and Problem statement 2D-RRMS (Two-Dimensional Regret-Ratio Minimizing Set) HD-RRMS (Higher-Dimensional Regret-Ratio Minimizing Set) Experiments

Setup Synthetic Data: Real-world Datasets Three datasets (correlated, independent, and anti-correlated) 10M tuples over 10 ordinal attributes. Real-world Datasets Airline dataset: 5.8M records over two ordinal attributes. US Department of Transportation (DOT) dataset: 457K records over 7 ordinal attributes. NBA dataset: 21K tuples over 17 ordinal attributes.

2D-RRMS NBA dataset Airline dataset

HD-RRMS DOT dataset NBA dataset

Thank You!