The Skyline Operator Borzsonyi S, Kossmann D, Stocker K. Classic Paper for Reference Noted by David Hsu.

Slides:

Advertisements

Similar presentations

The Skyline Operator (Stephan Borzsonyi, Donald Kossmann, Konrad Stocker) Presenter: Shehnaaz Yusuf March 2005.

Advertisements

概率统计（ ZYH ）节目录 2.1 随机变量与分布函数 2.2 离散型随机变量的概率分布 2.3 连续型随机变量的概率分布第二章随机变量及其分布.

概率统计（ ZYH ）节目录 3.1 二维随机变量的概率分布 3.2 边缘分布 3.4 随机变量的独立性第三章随机向量及其分布 3.3 条件分布.

ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT

球面网格及其应用李杰权北京师范大学数学科学学院

数据挖掘实验 1 Apriori 算法编程实现. 数据挖掘实验一 (20’) 实验目的：了解关联规则在数据挖掘中的应用，理解和掌握关联挖掘的经典算法 Apriori 算法的基本原理和执行过程并完成程序设计。实验内容：对给定数据集用 Apriori 算法进行挖掘，找出其中的频繁集并生成关联规则。

石化的 IT 大挑战洛阳石化工程公司. 石化公司简介中国石化集团洛阳石油化工工程公司，是国内能源化工领域集技术专利商与工程承包商于一体的高科技企业。拥有中国综合设计甲级资质，为国家首批业务涵盖 21 个行业的工程咨询企业之一，拥有工程总承包、工程设计、工程监理、工程咨询和环境影响评价等甲.

位置相关查询处理研究背景及意义移动计算、无线通信以及定位技术的快速发展，使得位置相关的查询处理及基于位置的信息服务技术已经成为一个热点研究领域。大量的应用领域 ( 如地理信息系统、智能导航、交通管制、天气预报、军事、移动电子商务等 ) 均迫切需要有效地查询这些数据对象。

一、拟合优度检验二、变量的显著性检验三、参数的置信区间

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第四十二讲 ) 离散数学. 例设 S = {a ， b} ， ρ （ S ） ={ ,{a},{b},{a ， b}} 是 S 的幂集合，则（ ρ （ S ）,∩, ∪）是一个格。规定映射 g 为： g （  ） =

计算机在分析化学的应用 ( 简介 ) 陈辉宏. 一. 概述信息时代的来临, 各门学科的研究方法都有了新的发展. 计算机的介入, 为分析化学的进展提供了一种更方便的研究方法.

第二十三讲 7.3 利用频率采样法设计 FIR 滤波器. 回顾窗函数设计法：得到的启发：能否在频域逼近？用什么方法逼近？通过加窗实现时域逼近.

绪论绪论绪论绪论南京信息工程大学物理实验教学中心第一次布置的作业 P37/3, 6P37/3, 6 作业做在实验报告册上！！

地理信息系统概述. 数据和信息 (Data & Information) 数据原始事实如：员工姓名，数据可以有数值、图形、声音、视觉数据等信息以一定规则组织在一起的事实的集合。

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第六十二讲 ) 离散数学. 最后，我们构造能识别 A 的 Kleene 闭包 A* 的自动机 M A* =(S A* ， I ， f A* ， s A* ， F A* ) ，令 S A* 包括所有的 S A 的状态以及一个附加的状态 s.

分析化学与无机化学中溶液 pH 值计算的异同比较谢永生  分析化学是大学化学系的一门基础课，课时较少，其内容主要是无机物的化学分析。分析化学是以无机化学作为基础的，我们都是在已掌握一定的无机化学知识后才学习分析化学。所以在分析化学的学习中会重复许多无机化学内容，造成学习没有兴.

1 为了更好的揭示随机现象的规律性并利用数学工具描述其规律, 有必要引入随机变量来描述随机试验的不同结果例电话总机某段时间内接到的电话次数, 可用一个变量 X 来描述例检测一件产品可能出现的两个结果, 也可以用一个变量来描述第五章随机变量及其分布函数.

例9：例9：第 n-1 行（ -1 ）倍加到第 n 行上，第（ n-2 ）行（ -1 ）倍加到第 n-1 行上，以此类推，直到第 1 行（ -1 ）倍加到第 2 行上。

主讲教师：陈殿友总课时： 124 第八讲函数的极限. 第一章机动目录上页下页返回结束 § 3 函数的极限在上一节我们学习数列的极限，数列 {x n } 可看作自变量为 n 的函数： x n =f(n),n ∈ N +, 所以，数列 {x n } 的极限为 a, 就是当自变量 n.

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第三十八讲 ) 离散数学. 第八章格与布尔代数 §8.1 引言在第一章中我们介绍了关于集合的理论。如果将 ρ （ S ）看做是集合 S 的所有子集组成的集合，于是， ρ （ S ）中两个集合的并集 A ∪ B ，两个集合的交集.

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第四十八讲 ) 离散数学. 例设 S 是一个非空集合， ρ （ s ）是 S 的幂集合。不难证明 :(ρ(S),∩, ∪,ˉ, ,S) 是一个布尔代数。其中： A∩B 表示 A ， B 的交集； A ∪ B 表示 A ，

实验一：信号、系统及系统响应 1 、实验目的 1 熟悉连续信号经理想采样前后的频谱变化关系，加深对时域采样定理的理解。 2 熟悉时域离散系统的时域特性。 3 利用卷积方法观察分析系统的时域特性。 4 掌握序列傅里叶变换的计算机实现方法，利用序列的傅里叶变换对连续信号、离散信号及系统响应进行频域分析。

线性代数习题课吉林大学术洪亮第一讲行列式前面我们已经学习了关于行列式的概念和一些基本理论，其主要内容可概括为：

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第二十五讲 ) 离散数学. 定理群定义中的条件（ 1 ）和（ 2 ）可以减弱如下：（ 1 ） ’ G 中有一个元素左壹适合 1 · a=a; （ 2 ） ’ 对于任意 a ，有一个元素左逆 a -1 适合 a -1 ·

第二章随机变量及其分布第一节随机变量及其分布函数一、随机变量用数量来表示试验的基本事件定义 1 设试验的基本空间为，，如果对试验的每一个基本事件，规定一个实数记作与之对应，这样就得到一个定义在基本空间上的一个单值实函数，称变量为随机变量．随机变量常用字母、、等表示．或用.

数学系 University of Science and Technology of China DEPARTMENT OF MATHEMATICS 第 3 章曲线拟合的最小二乘法给出一组离散点，确定一个函数逼近原函数，插值是这样的一种手段。在实际中，数据不可避免的会有误差，插值函数会将这些误差也包括在内。

聚合物在生物高分子分离中的应用王延梅中国科学技术大学高分子科学与工程系 Tel

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第三十九讲 ) 离散数学. 例设 S 是一个集合， ρ （ S ）是 S 的幂集合，集合的交（ ∩ ），并（∪）是 ρ （ S ）上的两个代数运算，于是，（ ρ （ S ）， ∩ ，∪）是一个格。而由例知.

第十三章脂类化合物学习要求： 1. 掌握油脂、蜡、磷脂化合物的结构 2. 掌握油脂的性质及皂化值、碘值和酸值的概念.

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第四十五讲 ) 离散数学模格定义设（ L ， ≤ ）是一个格，对任意 a ， b ， c ∈ L ，如果 a≤b ，都有 a  （ b×c ） = b× （ a  c ）则称（ L ， ≤ ）为模格。

外文文献检索示例. 实验目的：掌握利用计算机网络检索外文文献的基本方法；了解熟悉下列数据库的结构、内容并掌握其检索方法；掌握检索的主要途径：出版物（ Publication ）、关键词（ Keyword ）、作者（ Author ）等。

流态化概述一、固体流态化：颗粒物料与流动的流体接触，使颗粒物料呈类似于流体的状态。二、流态化技术的应用：流化催化裂化、吸附、干燥、冷凝等。三、流态化技术的优点：连续化操作；温度均匀，易调节和维持；气、固间传质、传热速率高等。四、本章基本内容： 1. 流态化基本概念 2. 流体力学特性 3.

16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.

量子化学第四章角动量与自旋（Angular momentum and spin） 4.1 动量算符 4.2 角动量阶梯算符方法

在发明中学习线性代数概念的引入李尚志中国科学技术大学. 随风潜入夜 : 知识的引入之一、线性方程组的解法加减消去法  方程的线性组合  原方程组的解是新方程的解是否有 “ 增根 ” ？  互为线性组合 : 等价变形  初等变换  高斯消去法.

Chapter 4 OPTIMIZED IMPLEMENTATION OF LOGIC FUNCTIONS 优化.

第2章激光器的工作原理回顾 ——产生激光的三个必要条件： 1. 工作物质 2. 激励能源 3. 光学谐振腔

第一节相图基本知识 1 三元相图的主要特点（1）是立体图形，主要由曲面构成；（2）可发生四相平衡转变；（3）一、二、三相区为一空间。

量子力学教程 ( 第二版 ) 3.4 连续谱本征函数的归一化连续谱本征函数是不能归一化的一维粒子的动量本征值为的本征函数 ( 平面波 ) 为可以取中连续变化的一切实数值. 不难看出，只要则在量子力学中, 坐标和动量的取值是连续变化的 ; 角动量的取值是离散的.

第 3 章控制流分析内容概述 – 定义一个函数式编程语言，变量可以指称函数 – 以 dynamic dispatch problem 为例（作为参数的函数被调用时，究竟执行的是哪个函数） – 规范该控制流分析问题，定义什么是可接受的控制流分析 – 定义可接受分析在语义模型上的可靠性 – 讨论分析算法.

吉林大学远程教育课件主讲人 : 杨凤杰学时： 64 ( 第五十三讲 ) 离散数学. 定义设 G= （ V ， T ， S ， P ）是一个语法结构，由 G 产生的语言（或者说 G 的语言）是由初始状态 S 演绎出来的所有终止符的集合，记为 L （ G ） ={w  T *

§8-3 电场强度一、电场近代物理证明：电场是一种物质。它具有能量、动量、质量。电荷电场电荷电场对外的表现 : 1) 电场中的电荷要受到电场力的作用 ; 2) 电场力可移动电荷作功.

报告人：黄磊缓冲溶液的积分缓冲容量. 缓冲指数的概念是 Vanslyke 在 1922 年提出的，意义是当缓冲溶液改变一个单位时需加入酸碱物质的量即这里的缓冲指数指的是微分缓冲容量，是加酸碱物质的量随着 pH 值的变化率 1 ，微分缓冲容量.

Department of Mathematics 第二章解析函数第一节解析函数的概念与 C-R 条件第二节初等解析函数第三节初等多值函数.

1.2.4 平面与平面垂直的判定二面角的有关概念. 问题提出 1. 空间两个平面有平行、相交两种位置关系，对于两个平面平行，我们已作了全面的研究，对于两个平面相交，我们应从理论上有进一步的认识.

《 UML 分析与设计》交互概述图授课人：唐一韬. 知识图谱知识图谱知识图谱知识图谱.

Introduction to Automatic Control The Laplace Transform Li Huifeng Tel:

刘相兵 (Maclean Liu) 介绍 dbms_registry PL/SQL 程序包.

第一节物质的量. 聚小成大，聚微成宏想想看：你如何用托盘天平称出一粒米的质量（假设每粒大米的质量一样大 )

§10.2 对偶空间一、对偶空间与对偶基二、对偶空间的有关结果三、例题讲析.

请同学们仔细观察下列两幅图有什么共同特点？如果两个图形不仅形状相同，而且每组对应点所在的直线都经过同一点, 那么这样的两个图形叫做位似图形, 这个点叫做位似中心.

表单自定义 “ 表单自定义 ” 功能是用于制作表单的工具，用数飞 OA 提供的表单自定义功能能够快速制作出内容丰富、格式规范、美观的表单。

OSPF. OSPF 协议概述链路状态信息 RTA RTC RTD RTB 链路状态数据库每台路由器会将当前正确的链路状态信息向一定的范围内的所有主机发送它支持区域的概念，同一区域内的路由器最终都可以拥有对此区域相同的拓扑描述每台路由器接收到此信息之后，根据最短路径算法计算最优的下一跳.

力的合成力的合成一、力的合成二、力的平行四边形上一页下一页目录退出. 一、力的合成 O. O. 1. 合力与分力我们常常用一个力来代替几个力。如果这个力单独作用在物体上的效果与原来几个力共同作用在物体上的效果完全一样，那么，这一个力就叫做那几个力的合力，而那几个力就是这个力的分力。

Discrete Mathematics Section 3.7 Applications of Number Theory 大葉大學資訊工程系黃鈴玲.

算得清写的准 —— 物业费公示报告的编写讲师：朱芸物业费的构成？哪些是管理人员工资呢？哪些算工程费用？怎样才能核算的清楚呢？

逻辑设计基础 1 第 7 章多级与（或）非门电路逻辑设计基础多级门电路.

Online Interval Skyline Queries on Time Series. I. Introduction.

第五章特征值与特征向量 —— 幂法 /* Power Method */ 计算矩阵的主特征根及对应的特征向量 Wait a second, what does that dominant eigenvalue mean? That is the eigenvalue with the largest.

八. 真核生物的转录㈠特点 ① 转录单元为单顺反子（ single cistron ），每个蛋白质基因都有自身的启动子，从而造成在功能上相关而又独立的基因之间具有更复杂的调控系统。 ② RNA 聚合酶的高度分工，由 3 种不同的酶催化转录不同的 RNA 。 ③ 需要基本转录因子与转录调控因子的参与，这.

人有悲欢离合，月有阴晴圆缺。月有阴晴圆缺。华师大版七年级数学第二册海口市第十中学数学组吴锐.

§5.6 利用希尔伯特 (Hilbert) 变换研究系统的约束特性希尔伯特变换的引入可实现系统的网络函数与希尔伯特变换.

3D 仿真机房建模哈尔滨工业大学指导教师：吴勃英、张达治蒋灿、杜科材、魏世银机房尺寸介绍.

1 第三章数列数列的概念考点搜索 ●数列的概念 ●数列通项公式的求解方法 ●用函数的观点理解数列高考猜想以递推数列、新情境下的数列为载体, 重点考查数列的通项及性质, 是近年来高考的热点, 也是考题难点之所在.

HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.

目录上页下页返回结束二、无界函数反常积分的审敛法 * 第五节反常积分无穷限的反常积分无界函数的反常积分一、无穷限反常积分的审敛法反常积分的审敛法  函数第五章第五章.

一、弧微分规定：   单调增函数如图，   弧微分公式二、曲率及其计算公式曲率是描述曲线局部性质（弯曲程度）的量． ) ) 弧段弯曲程度越大转角越大转角相同弧段越短弯曲程度越大 1 、曲率的定义 )

§7.2 估计量的评价标准上一节我们看到，对于总体 X 的同一个未知参数，由于采用的估计方法不同，可能会产生多个不同的估计量．这就提出一个问题，当总体的一个参数存在不同的估计量时，究竟采用哪一个好呢？或者说怎样评价一个估计量的统计性能呢？下面给出几个常用的评价准则．一．无偏性.

分组函数 Schedule: Timing Topic 35 minutes Lecture 40 minutes Practice

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers/VLDB2015 报告人：胡信晖 2019/1/18.

Presentation transcript:

The Skyline Operator Borzsonyi S, Kossmann D, Stocker K. Classic Paper for Reference Noted by David Hsu

Content Introduction SQL Extensions Implementation of the Skyline Operator Other Skyline Algorithms Performance Experiments and Results Related Work Conclusion

For a better and general understanding about Skyline ， these Chinese papers is highly recommended. [1] 朱琳，关佶红, 周水庚.Skyline 计算综述 [J]. 计算机工程与应用， 2008 [2] 魏小娟，杨婧，李翠平，陈红.Skyline 查询处理 [J]. 软件学报， 2008,6

Introduction The database system at your travel agents’ is unable to decide which hotel is best for you, but it can at least present you all interesting hotels. Interesting are all hotels that are not worse than any other hotel in both dimensions. We call this set of interesting hotels the Skyline. From the Skyline, you can now make your final decision, thereby weighing your personal preferences for price and distance to the beach.

Introduction : Example Computing the Skyline is known as the maximum vector problem [KLP75,PS85].We use the term Skyline because of its graphical representation (see below). More formally, the Skyline is defined as those points which are not dominated by any other point. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension. For example, a hotel with price=$50 and distance = 0.8 miles dominates a hotel with price=$100 and distance = 1.0 miles.

Introduction : nice properties One of the nice properties of the Skyline of a set is that for any monotone scoring function,if maximizes that scoring function, then is in the Skyline. In other words, no matter how you weigh your personal preferences towards price and distance of hotels, you will find your favorite hotel in the Skyline. In addition, for every point in the Skyline, there exists a monotone scoring function such that maximizes that scoring function. In other words, the Skyline does not contain any hotels which are nobody’s favorite.

由 skyline 的定义, 得到如下的重要性质 : 性质 1 如果 p 点支配 q 点, 任何一个在所有维上都单调的函数对 p 点的打分都优于对 q 点的打分。性质 2 任何一个在所有维上都单调的函数, 如果在数据库的点 p 上取得了所有点中的函数最大值, 那么 p 点一定是此数据库的 skyline 点。反之也成立 : 性质 3 对数据库中的任何一个 skyline 点 p, 必然存在一个在所有维上都单调的函数, 使得 p 在此函数上取得的值是数据库所有点中取得的函数最大值。 Introduction

In this work, we show how the Skyline operation can be integrated into a database system. We will first describe possible SQL extensions in order to specify Skyline queries. Then, we will present and evaluate alternative algorithms to compute the Skyline; it will become clear that the original algorithm of [KLP75, PS85] has terrible performance in the database context. We will also briefly discuss how standard index structures such as B-trees and R-trees can be exploited to evaluate Skyline queries. In addition, we will show how the Skyline operation can interact with other query operations; e.g., joins, group-by, and Top N. At the end, we will discuss related work, give conclusions, and make suggestions for future work. Introduction : organization of this paper

SQL Extensions d1,...,dm denote the dimensions of the Skyline ; e.g., price, distance to the beach, or rating. MIN,MAX, and DIFF specify whether the value in that dimension should be minimized, maximized, or simply be different. For example, the price of a hotel should be minimized (MIN annotation) whereas the rating should be maximized ( MAX annotation). In our Skyline of Manhattan example, two buildings that have different x coordinates can both be seen and therefore both may be part of the skyline; as a result, the x dimension is listed in the SKYLINE OF clause of that query with a DIFF annotation. The optional DISTINCT specifies how to deal with duplicates (described below).

SQL Extensions The semantics of the SKYLINE OF clause are straightforward. The SKYLINE OF clause is executed after the SELECT...FROM...WHERE...GROUP BY...HAVING...part of the query, but before the ORDER BY clause and possibly other clauses that follow (e.g., STOP AFTER for Top N[CK97]). The SKYLINE OF clause selects all interesting tuples; i.e., tuples which are not dominated by any other tuple. Extending our definition from the introduction tuple p=(p1,...,pk,pk+1,...,pl,pl+1,...,pm,pm+1,...,pn) dominates tuple q=(q1,...,qk,qk+1,...,ql,ql+1,...,qm,qm+1,...,qn) for a Skyline query

SQL Extensions If pi = qi for all i=1,...,m, then p and q are incomparable and may both be part of the Skyline if no DISTINCT is specified. With DISTINCT, either p or q are retained(the choice of which is unspecified). The values of the attributes dm+1,..., dn are irrelevant for the Skyline computation, but these attributes are of course part of the tuples of the Skyline(i.e., there is no implicit projection).Note that it does not matter in which order the dimensions are specified in the SKYLINE OF clause; for ease of presentation, we put the MIN dimensions first and the DIFF dimensions last. In addition, a one-dimensional Skyline is equivalent to a min, max, or distinct SQL query without a SKYLINE OF clause.

SQL Extensions Qerry1: cheap hotels near the beach in Nassau and the Skyline of Manhattan. Query3: salespersons who were very successful in 1999 and have low salary; these people might be eligible for a raise. Query4: cheap hotels near the beach in Nassau ; this time, however, at most two hotels are returned because the query specifies price categories ; within each price category the user is only interested for the hotel with the smallest distance to the beach. Naturally, attributes which are specified in the SKYLINE OF clause may also be used in any other clause. To eliminate outrageously expensive hotels, for example, the WHERE clause of the first query of Figure could be extended by a price predicate.

Implementation of the Skyline Operator Our approach is to extend an existing (relational, object-oriented or object relational ) database system with a new logical operator that we refer to as the Skyline operator. The Skyline operator encapsulates the implementation of the SKYLINE OF clause. The implementation of other operators (e.g., join ) need not be changed. According to the semantics of Skyline queries, the Skyline operator is typically executed after scan, join, and group- by operators and before a final sort operator, if the query has an ORDER BY clause.

Implementation of the Skyline Operator Just like join and most other logical operators, there are several different (physical) ways to implement the Skyline operator. In this section, we will describe seven variants: three variants based on a block-nested-loops algorithm; three variants based on divide-and-conquer; and one special variant for two dimensional Skylines. Furthermore, we will show how Skyline queries can be implemented on top of a relational database system, without changing the database system at all; it will become clear, however, that this approach performs very poorly.

Implementation: Translating a Skyline Query into a Nested SQL Query ※ Essentially,this approach corresponds to the naive “nested-loops” way to compute the Skyline because this query cannot be unnested [ GKG+97, BCK98 ]; as we will see in the following subsections, we can do much better. ※ If the Skyline query involves a join or group-by(e.g., the third query of Figure 3 ),this join or group-by would have to be executed as part of the outer query and as part of the subquery. ※ As we will see in Section 4 the Skyline operation can be combined with other operations(e.g., join or Top N)in certain cases, resulting in little additional cost to compute the Skyline.

Implementation: Two-dimensional Skyline Operator A two-dimensional Skyline can be computed by sorting the data. If the data is topologically sorted according to the two attributes of the SKYLINE OF clause, the test of whether a tuple is part of the Skyline is very cheap: you simply need to compare a tuple with its predecessor. More precisely, you need to compare a tuple with the last previous tuple which is part of the Skyline. Figure 4 illustrates this approach. h2 can be eliminated because it is dominated by h1, its predecessor. Likewise, h3 can be eliminated because it is dominated by h1, its predecessor after h2 has been eliminated.

Implementation: Two-dimensional Skyline Operator Figure 5 shows why sorting does not work if the Skyline involves more than two dimensions. In this example, we are interested in hotels with a low price, a short distance to the beach, and a high rating ( many stars ). The only hotel which can be eliminated is h3: h3 is dominated by h1, but h1 is not h3’s direct predecessor. In this example, there is just one hotel between h1and h3; in general, however, there might be many hotels so that sorting does not help. There are special algorithms to deal with three- dimensional Skylines [KLP75], but for brevity we will not discuss such algorithms in this work.

Implementation: Block-nested-loops Algorithm