VLDB’2007 review Denis Mindolin. VLDB’07 program.

Slides:

Advertisements

Similar presentations

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Advertisements

Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.

Sequential Three-way Decision with Probabilistic Rough Sets Supervisor: Dr. Yiyu Yao Speaker: Xiaofei Deng 18th Aug, 2011.

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,

Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra 1.

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Data Broadcast in Asymmetric Wireless Environments Nitin H. Vaidya Sohail Hameed.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

8-1 Problem-Solving Examples (Preemptive Case). 8-2 Outline Preemptive job-shop scheduling problem (P-JSSP) –Problem definition –Basic search procedure.

Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.

Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces Jian Pei # Wen Jin # Martin Ester # Yufei Tao + # Simon Fraser University,

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

Scheduling policies for real- time embedded systems.

Scheduling Periodic Real-Time Tasks with Heterogeneous Reward Requirements I-Hong Hou and P.R. Kumar 1.

Research and Practice at University of Queensland Wei Lu ( 卢卫 ) 2/19/2009.

PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.

August 21, 2002VLDB Gurmeet Singh Manku Frequency Counts over Data Streams Frequency Counts over Data Streams Stanford University, USA.

Efficient Processing of Top-k Spatial Preference Queries

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.

Lazy Maintenance of Materialized Views Jingren Zhou, Microsoft Research, USA Paul Larson, Microsoft Research, USA Hicham G. Elmongui, Purdue University,

Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.

Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.

Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB

Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.

Parallel Computation of Skyline Queries COSC6490A Fall 2007 Slawomir Kmiec.

Privacy Preserving Outlier Detection using Locality Sensitive Hashing

Day 5 - More Complexity With Queries Explanation of JOIN & Examples Explanation of JOIN & Examples Explanation & Examples of Aggregation Explanation &

Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

Probabilistic Skylines on Uncertain Data (VLDB2007) Jian Pei et al Supervisor: Dr Benjamin Kao Presenter: For Date: 22 Feb 2008 ??: the possible world.

Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.

Tian Xia and Donghui Zhang Northeastern University

Frequency Counts over Data Streams

Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS

Stochastic Skyline Operator

Visualization of query processing over large-scale road networks

Data Integration with Dependent Sources

CS 440 Database Management Systems

Distributed Probabilistic Range-Aggregate Query on Uncertain Data

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Probabilistic n-of-N Skyline Computation over Uncertain Data Streams

Topic 3: Prob. Analysis Randomized Alg.

The Byzantine Secretary Problem

Uncertain Data Mobile Group 报告人：郝兴.

Branch and Bound Example

Efficient Processing of Top-k Spatial Preference Queries

Lu Tang , Qun Huang, Patrick P. C. Lee

Presentation transcript:

VLDB’2007 review Denis Mindolin

VLDB’07 program

Outline Probabilistic Skylines on Uncertain Data, Jian Pei et al Lazy Maintenance of Materialized Views, Jingren Zhou et al

Probabilistic Skylines on Uncertain Data Based on the VLDB’07 paper of Jian Pei et al

Skyline. General picture For a dataset D = {p 1,..,p n }, the skyline S is the set of all p i s.t. there is no other p j that dominates p i p i dominates p j if p i is  better than p j in at least one dimension, and  not worse than p j in all other dimensions Single game results: S = {Eddie, Carl}

Uncertain data Multiple game results: S=? Use some aggregate function?  Can’t capture distribution!  Can be biased by outliers!

Probabilistic dominance relation Uncertain data Uncertain object U={u 1,..,u l } Uncertain objects are independent Pr(u i ) = Pr(u j ) Probabilistic dominance relation Given two uncertain objects U={u 1, …, u l1 }, V={v 1, …, v l2 } The prob. that V dominates U is given by

Probabilistic dominance relation. Example Smaller values of X and Y are better

p-Skyline Let U={u 1,…,u l }. For all u  U, probability of u in skyline := Probability u not dominated by any other object Skyline probability of U p-Skyline

The bottom up skyline algorithm Bounding  Compute upper and lower bounds of skyline prob. for objects Pruning  If the lower bound of Pr(U) is larger than p, then U is in the skyline. If the upper bound of Pr(U) is smaller than p, U is not in the skyline Refining  If p is between the lower and the upper bounds, then we need to get tighter bounds of the skyline probabilities by the next iteration of the algorithm

Bounding u min =(min i=1 {u i.D 1 },…,min{u i.D l }) u max =(max i=1 {u i.D 1 },…,max{u i.D l }) Lemma  If u i1 < u i2 then Pr(u i1 ) ≥ Pr(u i2 )  Pr(u min ) ≥ Pr(U) ≥ Pr(u max )

Pruning Rule1. For an uncertain object U and probability threshold p,  if Pr(U min ) < p, then U is not in the p-skyline.  If Pr(U max ) ≥ p, then U is in the p-skyline. Rule2. For each instance u  U, let Pr + (u) and Pr - (u) be the upper and lower bounds of Pr(u)  If, then U is not in the p-skyline  If, then U is in the p-skyline Rule3. Let U and V be two different uncertain objects. If u  U and V max < u, then Pr(u) = 0

Pruning Rule4. Let U and V be two uncertain objects and U’  U be a subset of instances of U such that U’ max  V min. If, then Pr(V) < p and thus V is not in the p-skyline

Refinement Partition instances into layers

Algorithm summary Complexity: O(W total *R)  W total – number of instances whose skyline probabilities are computed by the algorithm  R – average cost of querying local R-tree of possible dominating objects  W total is much smaller than the total number of instances Top-down algorithm: see the paper

Lazy Maintenance of Materialized Views Based on the VLDB’07 paper of Jingren Zhou et al

Eager and Deferred Materialized View Maintenance T1 V T2 Eager: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2), recomp(V)} Deferred: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2)} … User tran: {recomp(V)} … User tran: {Q(V)} Executed: {Q(V)}

Lazy Materialized View Maintenance T1 V T2 Lazy: User tran: {upd(T1), upd(T2)} Executed: {upd(T1), upd(T2)} … Executed: { recomp(V) } … User tran: {Q(V)} Executed: {Q(V)}

System architecture Based on MS SQL Server 2005

How it works

Delta tables Table 1 : {(transID i, stmtID i, rowID i, action i )} … Table n : {(transID i, stmtID i, rowID i, action i )}  tranID – transaction id  stmtID – statement id  rowID – updated row id  action = (ins|del)  All “update” actions are converted into pairs of del/ins actions

Maintenance and its optimization Maintenance task is created for each view affected by a transaction Views updated incrementally using Delta tables “Smart” maintenance task scheduler  Maintenance tasks are scheduled as low-priority jobs  Maintenance tasks are combined using the Condense operator  Proper times slot is allocated for each task

Delta stream Condense operator Intuition: Tran: {A:=1,…,A:=2,…,A:=3}=>{…,A:=3} Operator definition  INS/INS condense: {ins 1 (row a ), …, ins k (row a )}=>{…, ins k (row a )}  INS/DEL condense: {ins 1 (row a ), …, del k (row a )}=>{…}  DEL/DEL condense: {del 1 (row a ), …, del k (row a )}=>{…, del k (row a )}

Performance results Response time is low Query response time is low Maintenance cost  eager view update cost Overhead is low