Probabilistic Contextual Skylines D. Sacharidis 1, A. Arvanitis 12, T. Sellis 12 1 Institute for the Management of Information Systems — “Athena” R.C.,

Slides:

Advertisements

Similar presentations

Google News Personalization: Scalable Online Collaborative Filtering

Advertisements

VLDB 2011 Pohang University of Science and Technology (POSTECH) Republic of Korea Jongwuk Lee, Seung-won Hwang VLDB 2011.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

gSpan: Graph-based substructure pattern mining

1 A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Gruppo 10: Paolo Barboni, Tommaso Campanella, Simone.

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.

Data Mining Classification: Alternative Techniques

Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?

Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.

Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.

Chapter 8 File organization and Indices.

Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,

Ensemble Learning: An Introduction

Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.

CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.

Memory Allocation CS Introduction to Operating Systems.

Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.

SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:

Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.

Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.

Mehdi Mohammadi March Western Michigan University Department of Computer Science CS Advanced Data Structure.

K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.

1 Top-k Dominating Queries DB seminar Speaker: Ken Yiu Date: 25/05/2006.

OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.

Efficient Processing of Top-k Spatial Preference Queries

The university of Hong Kong Department of Computer Science Continuous Monitoring of Top-k Queries over Sliding Windows Authors: Kyriakos Mouratidis, Spiridon.

All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)

CS411 Database Systems Kazuhiro Minami 11: Query Execution.

CS4432: Database Systems II Query Processing- Part 2.

5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.

The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.

A FAIR ASSIGNMENT FOR MULTIPLE PREFERENCE QUERIES

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

CSCE Database Systems Chapter 15: Query Execution 1.

Week 14 - Monday.  What did we talk about last time?  Heaps  Priority queues  Heapsort.

Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.

An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.

1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.

More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.

1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.

CS522 Advanced database Systems Huiping Guo Department of Computer Science California State University, Los Angeles 3. Overview of data storage and indexing.

SIMILARITY SEARCH The Metric Space Approach

Parallel Databases.

Database Management System

Probabilistic Data Management

Ishan Sharma Abhishek Mittal Vivek Raj

Chapter 12: Query Processing

Probabilistic Data Management

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Lecture 2- Query Processing (continued)

LINEAR HASHING E0 261 Jayant Haritsa Computer Science and Automation

Overview of Query Evaluation

Relaxing Join and Selection Queries

The Skyline Query in Databases Which Objects are the Most Important?

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

Probabilistic Contextual Skylines D. Sacharidis 1, A. Arvanitis 12, T. Sellis 12 1 Institute for the Management of Information Systems — “Athena” R.C., Greece 2 National Technical University of Athens, Greece

static vs. dynamic skyline (another) hotels example Price and Distance are Statically Preferred (SP) attributes with fixed preferences: lower is better Ignore Amenity (assume all amenities are equally preferable) h 4 and h 5 are in the static skyline Amenity is a Relatively Preferred (RP) attribute, preferences are defined per query h 3, h 4 and h 5 are in the dynamic skyline

contextual skyline just like dynamic skyline, but preferences are associated with some context what if no preferences are specified for the current context? two issues: can we extract them from previous situations? what does it mean to be in the skyline?

extract preferences key idea is to combine preferences from similar contexts to the current first assess the similarity between the current context C q and all past contexts C j : contexts may have conflicting preferences, and we model uncertainty with probabilities value u is better than v for the context C i with some probability probabilities can be extracted based on context similarities

probabilistic contextual skylines dominance relationships are uncertain (assuming independence among attributes) tuple t dominates t’ for the context C i with probability the skyline probability of a tuple is defined as probabilistic contextual skyline query, p-CSQ, returns all tuples with

example skyline probability (1) (2) (3)

non-indexed algorithms (1/2) for RP attributes (unlike standard and tuple-probabilistic skylines) no monotonic visit order exists transitivity is not preserved we have to apply BNL-like methods (not SFS, BBS, etc.) Basic Iterative Algorithm (BIA) for each tuple scan the database and compute skyline probability (abort when below threshold)

non-indexed algorithms (2/2) Candidate Selection Algorithm (CSA) it identifies candidates group tuples by their values on the RP attributes tuples that are dominated in an RP-group have 0 probability tuples that are in the skyline w.r.t. only the SP attributes have probability 1 CSA applies BIA only for the candidates (needs to check them against all tuples, though)

index-based algorithms (1/3) the algorithms only consider the candidates Basic Group Counting (BGC) idea: tuples in an RP group that dominate a candidate t contribute the same probability use COUNT aR-tree per RP group but, don’t just issue a range query per tree… we don’t care about the exact count if tuple’s probability is below threshold instead visit nodes from all trees in parallel and use a single priority queue the node from the tree which has the highest expected probability of dominating tuple t has the largest priority

index-based algorithms (2/3) Super Group Counting (SGC) there can be a lot of RP groups with only a few tuples to mitigate this, assign groups to super groups use a GROUP-COUNT aR-tree per super-group entry: where c[g j ] is the number of tuples beneath node e i that belong to the j-th group same algorithm as BGC… you only need to redefine the expected dominance probability to take into account multiple groups

index-based algorithms (3/3) Batch Counting Algorithm (BCA) all previous algorithms compute the skyline probability of one tuple at a time BCA examines candidates in batches (as many as fit in memory) extra bookkeeping with each heap entry to avoid double counting e 1 is deheaped e 1 + dominates t 1 but not t 2 entire e 1 contributes to t 1, but for t 2 we need to expand e 1 and enheap its children e 2, e 3 also remember e 1 + with e 2, e 3

Experiments Non-indexed BIA, CSA Index-based SGC, BCA

Total Time vs. Dataset Cardinality Non-indexed BIA, CSA Index-based SGC, BCA

Total Time vs. RP domain size Non-indexed CSA Index-based SGC, BCA

Total Time vs. Dimensionality Non-indexed CSA Index-based SGC, BCA

thank you!