Yue (Jenny) Cui and William Perrizo North Dakota State University

Slides:



Advertisements
Similar presentations
พีชคณิตแบบสัมพันธ์ (Relational Algebra) บทที่ 3 อ. ดร. ชุรี เตชะวุฒิ CS (204)321 ระบบฐานข้อมูล 1 (Database System I)
Advertisements

1 Advanced SQL Queries. 2 Example Tables Used Reserves sidbidday /10/04 11/12/04 Sailors sidsnameratingage Dustin Lubber Rusty.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Chapter 6 Additional Relational Operations Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Concepts of Database Management Seventh Edition
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Query-By-Example (QBE) 2440: 180 Database Concepts.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Concepts of Database Management Sixth Edition
Concepts of Database Management, Fifth Edition
Xin  Syntax ◦ SELECT field1 AS title1, field2 AS title2,... ◦ FROM table1, table2 ◦ WHERE conditions  Make a query that returns all records.
Database Queries. Queries Queries are questions used to retrieve information from a database. Contain criteria to specify the records and fields to be.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Concepts of Database Management Seventh Edition
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
1 SQL-3 Tarek El-Shishtawy Professor Ass. Of Computer Engineering.
Intro to SQL Management Studio. Please Be Sure!! Make sure that your access is read only. If it isn’t, you have the potential to change data within your.
Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Efficient Computation of Combinatorial Skyline Queries Author: Yu-Chi Chung, I-Fang Su, and Chiang Lee Source: Information Systems, 38(2013), pp
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.
Copyright 2008 Koren ECE666/Koren Part.7b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.
Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.
Indexing Structures for Files and Physical Database Design
Chapter # 6 The Relational Algebra and Calculus
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Chapter 3 Introduction to SQL(3)
Database System Implementation CSE 507
Efficient Image Classification on Vertically Decomposed Data
Efficient Ranking of Keyword Queries Using P-trees
Efficient Ranking of Keyword Queries Using P-trees
Chapter 2: Intro to Relational Model
Yue (Jenny) Cui and William Perrizo North Dakota State University
GO! with Microsoft® Access e
Efficient Image Classification on Vertically Decomposed Data
מדינת ישראל הוועדה לאנרגיה אטומית
A Fast and Scalable Nearest Neighbor Based Classification
Querying Database ISYS 363.
Indexing and Hashing Basic Concepts Ordered Indices
Aggregations Various Aggregation Functions GROUP BY HAVING.
MongoDB Aggregations.
Chapter 11 Indexing And Hashing (1)
Lesson 4: Introduction to Functions
Query Functions.
Section 4 - Sorting/Functions
Projecting output in MySql
Algorithm of Aggregate Function SUM
Algorithm for the Aggregate Function SUM
Fraction-Score: A New Support Measure for Co-location Pattern Mining
LINQ to SQL Part 3.
Shelly Cashman: Microsoft Access 2016
Introduction to SQL Server and the Structure Query Language
CS 405G: Introduction to Database Systems
Presentation transcript:

Yue (Jenny) Cui and William Perrizo North Dakota State University Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui and William Perrizo North Dakota State University

Outline Introduction Review of Aggregate Functions Review of Iceberg Queries Algorithms of Aggregate Function Computation Using P-trees SUM, COUNT, and AVERAGE. MAX, MIN, MEDIAN, RANK, and TOP-K. Iceberg Query Operation Using P-trees An Iceberg Query Example Performance Analysis Conclusion

Introduction Commonly used aggregation functions include COUNT, SUM, AVERAGE, MIN, MAX, MEDIAN, RANK, and TOP-K. Iceberg queries perform aggregate functions across attributes and then eliminate aggregate values that are below some specified threshold. We use an example to review iceberg queries. SELECT Location, Product Type, Sum (# Product) FROM Relation Sales GROUPBY Location, Product Type HAVING Sum (# Product) >= T

Introduction (Cont.) We illustrate the procedure of calculating by three steps. Step one: Generate Location-list. SELECT Location, Sum (# Product) FROM Relation Sales GROUPBY Location HAVING Sum (# Product) >= T Step Two: Generate Product Type-list. SELECT Type, Sum (# Product) FROM Relation Sales GROUPBY Product Type HAVING Sum (# Product) >= T Step Three: Generate location & Product Type pair groups. From the Location-list and the Type-list we generated in first two steps, we can eliminate many of the location & Product Type pair groups

Algorithms of Aggregate Function Computation Using P-trees The dataset we used in our example. We use the data in relation Sales to illustrate algorithms of aggregate function. Id Mon Loc Type On line # Product 1 Jan New York Notebook Y 10 2 Minneapolis Desktop N 5 3 Feb Printer 6 4 Mar 7 11 Chicago 9 Apr Fax Table 1. Relation Sales.

Algorithms of Aggregate Function Computation Using P-trees (Cont.) Table 2 shows the binary representation of data in relation Sales. Id Mon Loc Type On line # Product P0,3 P0,2 P0,1 P0,0 P1,4 P1,3 P1,2 P1,1 P1,0 P2,2 P2,1 P2,0 P3,0 P4,3 P4,2 P4,1 P4,0 1 0001 00001 001 1010 2 00101 010 0101 3 0010 100 0110 4 0011 0111 5 1011 6 00110 1001 7 0100 101 Table 2. Binary Form of Sales.

Algorithm of Aggregate Function COUNT COUNT function: It is not necessary to write special function for COUNT because P-tree RootCount function has already provided the mechanism to implement it. Given a P-tree Pi, RootCount(Pi) returns the number of 1s in Pi. Id Mon Loc Type On line # Product P0,3 P0,2 P0,1 P0,0 P1,4 P1,3 P1,2 P1,1 P1,0 P2,2 P2,1 P2,0 P3,0 P4,3 P4,2 P4,1 P4,0 1 0001 00001 001 1010 2 00101 010 0101 3 0010 100 0110 4 0011 0111 5 1011 6 00110 1001 7 0100 101 Table 1. Relation Sales.

Algorithm of Aggregate Function SUM SUM function: Sum function can total a field of numerical values. Algorithm 4.1 Evaluating sum () with P-tree. total = 0.00; For i = 0 to n { total = total + 2i * RootCount (Pi); } Return total Algorithm 4. 1. Sum Aggregate

Algorithm of Aggregate Function SUM P4,3 P4,2 P4,1 P4,0 10 5 6 7 11 9 3 1 1 1 1 For example, if we want to know the total number of products which were sold out in relation Sales, the procedure is showed on left {3} {3} {5} {5} 23 * + 22 * + 21 * + 20 * = 51

Algorithm of Aggregate Function AVERAGE Average function: Average function will show the average value in a field. It can be calculated from function COUNT and SUM. Average () = Sum ()/Count ().

Algorithm of Aggregate Function MAX Max function: Max function returns the largest value in a field. Algorithm 4.2 Evaluating max () with P-tree. max = 0.00; c = 0; Pc is set all 1s For i = n to 0 { c = RootCount (Pc AND Pi); If (c >= 1) Pc = Pc AND Pi; max = max + 2i; } Return max; Algorithm 4. 2. Max Aggregate.

Algorithm of Aggregate Function MAX Steps IF Pos Bits P4,3 P4,2 P4,1 P4,0 1. Pc = P4,3 RootCount (Pc) = 3 >= 1 10 5 6 7 11 9 3 1 1 1 1 {1} 2. RootCount (Pc AND P4,2) = 0 < 1 Pc = Pc AND P’4,2 {0} 3. RootCount (Pc AND P4,1 ) = 2 >= 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P4,0 ) = 1 >= 1 {1} 23 * + 22 * + 21 * + 20 * = {1} {0} {1} {1} 11

Algorithm of Aggregate Function MIN Min function: Min function returns the smallest value in a field. Algorithm 4.3. Evaluating Min () with P-tree. min = 0.00; c = 0; Pc is set all 1s For i = n to 0 { c = RootCount (Pc AND NOT (Pi)); If (c >= 1) Pc = Pc AND NOT (Pi); Else min = min + 2i; } Return min; Algorithm 4. 2. Max Aggregate.

Algorithm of Aggregate Function MIN Steps IF Pos Bits P4,3 P4,2 P4,1 P4,0 1. Pc = P’4,3 RootCount (Pc) = 4 >= 1 10 5 6 7 11 9 3 1 1 1 1 {0} 2. RootCount (Pc AND P’4,2) = 1 >= 1 Pc = Pc AND P’4,2 {0} 3. RootCount (Pc AND P’4,1 ) = 0 < 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P’4,0 ) = 0 < 1 {1} 23 * + 22 * + 21 * + 20 * = {0} {0} {1} {1} 3

Algorithms of Aggregate Function MEDIAN and RANK Algorithm 4.4. Evaluating Median () with P-tree median = 0.00; pos = N/2; for rank pos = K; c = 0; Pc is set all 1s for single attribute For i = n to 0 { c = RootCount (Pc AND Pi); If (c >= pos) median = median + 2i; Pc = Pc AND Pi; Else pos = pos - c; Pc = Pc AND NOT (Pi); } Return median; Median function returns the median value in a field. Rank (K) function returns the value that is the kth largest value in a field. Algorithm 4. 2. Median Aggregate.

Algorithm of Aggregate Function MEDIAN Steps IF Pos Bits P4,3 P4,2 P4,1 P4,0 1. Pc = P4,3 RootCount (Pc) = 3 < 4 Pc = P’4,3 pos = 4 – 3 = 1 10 5 6 7 11 9 3 1 1 1 1 {0} 2. RootCount (Pc AND P4,2) = 3 >= 1 Pc = Pc AND P4,2 {1} 3. RootCount (Pc AND P4,1 ) = 2 >= 1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc AND P4,0 ) = 1 >= 1 {1} 23 * + 22 * + 21 * + 20 * = {0} {1} {1} {1} 7

P3 P2 P1 P0 Rank = 3 1. Count = 3 3 >= 3 B3 = 1 2. Count = 1 1 < 3 B2 = 0 r = r-c = 3-1 = 2 3. Count = 2 2 >= 2 B1 = 1 4. Count = 1 1 < 2 B0 = 0 Rank = 4 1. Count = 3 3 < 4 B3 = 0 r = r – c = 1 2. Count = 4 4 > 1 B2 = 1 3. Count = 2 2 >= 1 B1 = 1 4. Count = 1 1 >= 1 B0 = 1 7 3 11 1 5 14 10 6 4 1 3 4 5 6 7 10 11 14 P3 R=4 C=3 C<R P3 R=3 C=3 C>=R root P3’P2 R=1 C=4 C>=R P3P2 R=3 C=1 C<R P3 P2 P1 P0 1 0 1 0 P3 P2 P1 P0 0 1 1 1 P3 P3’ P3’P2P1 R=1 C=2 C>=R P3P2’P1 R=2 C=2 C>=R P3P2 P3P2’ P3’P2 P3’P2’ P3’P2P1P0 R=1 C=1 C>=R P3P2’P1P0 R=2 C=1 C<R P3P2P1 P3P2P1’ P3P2’P1 P3P2’P1’ P3’P2P1 P3’P2P1’ P3’P2’P1 P3’P2’P1’ 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Algorithm of Aggregate Function TOP-K Top-k function: In order to get the largest k values in a field, first, we will find rank k value Vk using function Rank (K). Second, we will find all the tuples whose values are greater than or equal to Vk. Using ENRING technology of P-tree

Performance Analysis Figure 15. Iceberg Query with multi-attributes aggregation Performance Time Comparison

Performance Analysis Our experiments are implemented in the C++ language on a 1GHz Pentium PC machine with 1GB main memory running on Red Hat Linux. In figure 15, we compare the running time of P-tree method and bitmap method on calculating multi-attribute iceberg query. In this case P-trees are proved to be substantially faster.

Conclusion we believe our study confirms that the P-tree approach is superior to the bitmap approach for aggregation of all types and multi-attribute iceberg queries. It also proves that the advantages of basic P-tree representations of files are: First, there is no need for redundant, auxiliary structures. Second basic P-trees are good at calculating multi-attribute aggregations, numeric value, and fair to all attributes.