ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT 2009 1.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Two-Pass Algorithms Based on Sorting
Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.
The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.
Anany Levitin ACM SIGCSE 1999SIG. Outline Introduction Four General Design Techniques A Test of Generality Further Refinements Conclusion.
Maintaining Sliding Widow Skylines on Data Streams.
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
CS4413 Divide-and-Conquer
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Lecture 8 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Shooting Stars in the Sky:An Online Algorithm for Skyline Queries 作者: Donald Kossmann Frank Ramsak Steffen Rost 報告:黃士維.
Lecture 14 Go over midterm results Algorithms Efficiency More on prime numbers.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
CS 104 Introduction to Computer Science and Graphics Problems Data Structure & Algorithms (3) Recurrence Relation 11/11 ~ 11/14/2008 Yang Song.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances.
Summary of Algo Analysis / Slide 1 Algorithm complexity * Bounds are for the algorithms, rather than programs n programs are just implementations of an.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
External Sorting Chapter 13.. Why Sort? A classic problem in computer science! Data requested in sorted order  e.g., find students in increasing gpa.
Wook-Shin Han, Sangyeon Lee POSTECH, DGIST
1 Chapter 24 Developing Efficient Algorithms. 2 Executing Time Suppose two algorithms perform the same task such as search (linear search vs. binary search)
Maximal Vector Computation in Large Data Sets The 31st International Conference on Very Large Data Bases VLDB 2005 / VLDB Journal 2006, August Parke Godfrey,
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “ Introduction to the Design & Analysis of Algorithms, ” 2 nd ed., Ch. 1 Chapter.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
CSC 211 Data Structures Lecture 13
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Algorithms & Flowchart
The Skyline Operator Borzsonyi S, Kossmann D, Stocker K. Classic Paper for Reference Noted by David Hsu.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 5 Algorithms (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
1 Algorithms  Algorithms are simply a list of steps required to solve some particular problem  They are designed as abstractions of processes carried.
Lecture 24 Query Execution Monday, November 28, 2005.
The σ-neighborhood skyline queries Chen, Yi-Chung; LEE, Chiang. The σ-neighborhood skyline queries. Information Sciences, 2015, 322: 張天彥 2015/12/05.
CSCE Database Systems Chapter 15: Query Execution 1.
Authors: Kenneth S.Bogh, Sean Chester, Ira Assent (Data-Intensive Systems Group, Aarhus University). Type: Research Paper Presented by: Dardan Xhymshiti.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Data Structures and Algorithms Searching Algorithms M. B. Fayek CUFE 2006.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Search Algorithms Written by J.J. Shepherd. Sequential Search Examines each element one at a time until the item searched for is found or not found Simplest.
MEMORY MANAGEMENT Operating System. Memory Management Memory management is how the OS makes best use of the memory available to it Remember that the OS.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
Online Skyline Queries. Agenda Motivation: Top N vs. Skyline „Classic“ Algorithms –Block-Nested Loop Algorithm –Divide & Conquer Algorithm Online Algorithm.
Evaluation of Relational Operations: Other Operations
Algorithm Analysis (not included in any exams!)
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Objective of This Course
Lecture 2- Query Processing (continued)
Skyline query with R*-Tree: Branch and Bound Skyline (BBS) Algorithm
Database Systems (資料庫系統)
Presentation transcript:

ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT

Outline 2 Introduction Algorithm Block-nested-loop algorithm Divide and Conquer algorithm Other Skyline algorithm Experiment Conclusion

Introduction 3 Skyline  Skyline is defined as those points which are not dominated by any other point For example, a hotel with Hotel A: price = $50 and distance= 0.8 miles Hotel B: price = $100 and distance = 1.0 mile

(Cont.) 4 Skyline of Hotels

Block-nested-loop algorithm 5 The block-nested-loops algorithm repeatedly reads the set of tuples. The idea of this algorithm is to keep a window of incomparable tuples in main memory.

(Cont.) 6 Obviously this algorithm works particularly well if the Skyline is small. In the best case, the Skyline fits into the window and the algorithm terminates in one or two iterations; thus, the best case complexity is of the order of O(n); In the worst case, the complexity is of the order of O(n2). {n being the number of tuples in the input}.

(Cont.) Maintaining the Window as a Self-organizing List A great deal of time in the basic algorithm is spent for comparing a tuple with the tuples in the window 7 To speed up these comparisons, we propose to organize the window as a self-organizing list. When tuple w of the window is found to dominate another tuple, then w is moved to the beginning of the window. As a result, the next tuple from the input will be compared to w first. This variant is particularly attractive if the data is skewed; i.e., if there are a couple of killer tuples which dominate many other tuples and a significant number of neutral tuples which are part of the Skyline, but do not dominate other tuples

8 (Cont.)

Basic Divide and Conquer Algorithm 9 We will now turn to the divide-and-conquer we proposed extensions to make that algorithm more efficient in the database context (i.e., limited main memory). This algorithm is theoretically the best known algorithm for the worst case. In the worst case, its asymptotic complexity is of the order of O(n (log n)d−2) +O(n log n) ; n is the number of input tuples d is the number of dimensions in the Skyline. Unfortunately, this is also the complexity of the algorithm in the best case; so, we expect this algorithm to outperform our block-nested-loops algorithm in bad cases and to be worse in good cases

10 (Cont.)

11 (Cont.)

Extention to the Basic Algorithm 12

13 (Cont.)

14 (Cont.) Early Skyline We propose another very simple extension to the divide-and-conquer algorithm In situations in which the available main memory is limited. This extension concerns the first step in which the data is partitioned into m partitions. We propose to carry out this step as follows: 1.Load a large block of tuples from the input; more precisely, load as many tuples as fit into the available main-memory buffers. 2. Apply the basic divide-and-conquer algorithm to this block of tuples in order to immediately eliminate tuples which are dominated by others. We refer to this as an “Early Skyline” 3.Partition the remaining tuples into m partitions. Clearly, applying an Early Skyline incurs additional CPU cost, but it also saves I/O because less tuples need to be written and reread in the partitioning steps. In general an Early Skyline is attractive if the Skyline is selective; i.e., if the result of the whole Skyline operation is small.

Other Skyline Algorithm Using B-trees 15

(Cont.) 16 Using R-trees Given a hotel h, we need not search in any branches of the R-tree which are guaranteed to contain only hotels that are dominated by h. For example, if we know that there is a hotel that costs $30 and is located 1.0 miles from the beach, then we need not consider any branches in the R-tree which include, for instance, hotels in the price range of ($40,$60) and distance range of (2.0 miles, 3.5 miles). As a consequence, the idea is to traverse the R-tree in a depth first way and prune branches of the R-tree with every new hotel found

(Cont.) 17

(Cont.) 18 Hotel A Hotel B Hotel C

Experiment 19

20

Conclusion The Skyline operation is useful for a number of database applications, including decision support and visualization. Our experimental results indicated that a database system should implement a block-nested-loops algorithm for good cases and a divide-and-conquer algorithm for tough cases. More specifically, we propose to implement a block-nested loops algorithm with a window that is organized as a self-organizing list and a divide-and-conquery algorithm that carries out m-way partitioning and “Early Skyline “ computation. 21