Storing and Querying Tree- Structured Records in Dremel Foto N. Afrati^, Dan Delorey*, Mosha Pasumansky*, Jeffrey D. Ullman+ *Google, Inc. +Stanford University.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Chapter 10: Designing Databases
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Chapter 3 : Relational Model
1 Chapter 5 : Query Processing and Optimization Group 4: Nipun Garg, Surabhi Mithal
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Assignment of Different-Sized Inputs in MapReduce Shantanu Sharma 2 joint work with Foto N. Afrati 1, Shlomi Dolev 2, Ephraim Korach 2, and Jeffrey D.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan
D ATABASE S YSTEMS I A DMIN S TUFF. 2 Mid-term exam Tuesday, Oct 2:30pm Room 3005 (usual room) Closed book No cheating, blah blah No class on Oct.
Aki Hecht Seminar in Databases (236826) January 2009
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
IST Databases and DBMSs Todd S. Bacastow January 2005.
1 Implementation of Relational Operations: Joins.
Relational Data Model, R. Ramakrishnan and J. Gehrke with Dr. Eick’s additions 1 The Relational Model Chapter 3.
Search Engines and Information Retrieval Chapter 1.
THE RELATIONAL DATA MODEL CHAPTER 3 (6/E) CHAPTER 5 (5/E) 1.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
©Silberschatz, Korth and Sudarshan9.1Database System Concepts Chapter 9: Object-Oriented Databases Nested Relations Complex Types and Object Orientation.
FALL 2004CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
1 III) COMPLEX VALUE DATABASES. 2 Introduction l Relax the 1 Normal Form of the Relational Model  Set-value attributes (e.g., set of tuples => Relations)
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Database Systems Part VII: XML Querying Software School of Hunan University
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
Component 4: Introduction to Information and Computer Science Unit 6a Databases and SQL.
Chapter 2 Introduction to Relational Model. Example of a Relation attributes (or columns) tuples (or rows) Introduction to Relational Model 2.
12/2/2015CPSC , CPSC , Lecture 41 Relational Model.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 5 – September 4 th,
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Jennifer Widom Relational Databases The Relational Model.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Databases and DBMSs Todd S. Bacastow January
CPSC 603 Database Systems Lecturer: Laurie Webster II, Ph.D., P.E.
Chapter (6) The Relational Algebra and Relational Calculus Objectives
Capability-Sensitive Query Processing on Internet Sources
File Format Benchmark - Avro, JSON, ORC, & Parquet
Inference and search for the propositional satisfiability problem
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Relational Database Design by Dr. S. Sridhar, Ph. D
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
TT-Join: Efficient Set Containment Join
Preference Query Evaluation Over Expensive Attributes
Examples of Physical Query Plan Alternatives
Relational Databases The Relational Model.
Relational Databases The Relational Model.
Relational Query Optimization
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Relational Database Design
Query Optimization.
Normalization.
Presentation transcript:

Storing and Querying Tree- Structured Records in Dremel Foto N. Afrati^, Dan Delorey*, Mosha Pasumansky*, Jeffrey D. Ullman+ *Google, Inc. +Stanford University ^National Technical University of Athens VLDB 2014 January 23, 2015 Heymo Kou

2/17 Outline  Introduction  Trees as Data and as Data Types  Querying Tree-Structured Data  Filter Queries  The Dominance Relation  Semi-flattening and Repetition Context  Efficient Data Storage and Retrieval  Conclusion

3/17 Introduction Dremel [Melnik et al., VLDB ‘10]  Distributed system for interactively querying large datasets  Developed at Google  Column-Store Oriented  Google BigQuery is powered by Dremel  Data is stored as nested relations

4/17 Introduction Nested Relations [1/3]  Remember 1NF?  Nested Relations are non-first-normal-form relations  Simply, a cell may have more than one value 1NF requires that all attributes have atomic (indivisible) domains. AB AB NF relationNon 1NF relation

5/17 Introduction Nested Relations [2/3]  1NF & 4NF & Nested Relation comparison TitleAuthorPub-namePub-branchKeyword CompilersSmithMcGraw-HillNew YorkParsing CompilersJonesMcGraw-HillNew YorkParsing CompilersSmithMcGraw-HillNew YorkAnalysis CompilersJonesMcGraw-HillNew YorkAnalysis NetworksJonesOxfordLondonInternet NetworksFrickOxfordLondonInternet NetworksJonesOxfordLondonWeb NetworksFrickOxfordLondonWeb 1NF version4NF version TitleAuthor CompilersSmith CompilersJones NetworksJones NetworksFrick TitleKeyword CompilersParsing CompilersAnalysis NetworksInternet NetworksWeb TitlePub-namePub-branch CompilersMcGraw-HillNew York NetworksOxfordLondon TitleAuthor-setPublisherKeyword-set (name, branch) Compilers{Smith, Jones}(McGraw-Hill, New York){Parsing, Analysis} Networks{Jones, Frick}(Oxford, London){Internet, Web} Non 1NF version Space efficient than 1NF Lesser join than 4NF Querying and storing data gets lot more complicated

6/17 Trees as Data and as Data Types  tuple type – a list of attribute names and a type for each attribute  type of an attribute – Basic type – integer, real number, string, etc. – Tuple type  Required – 1 occurrence  Optional – 0 or 1 occurrence  Repeated – 0 or 1, or more occurrence  Required and repeated – 1 or more occurrence  Relation type (schema) – Repeated tuple type

7/17 Trees as Data and as Data Types Representing Schemas  Denote as T = { A 1 : T 1, ….., A n : T n }  Repeated type : T *  Optional type : T?  One or more occurrences : T +

8/17 Trees as Data and as Data Types Instances of a Schema  An example data for the same schema below

9/17 Querying Tree-Structured Data  Query languages in Dremel  Fundamentally navigation languages on trees  Flattening (Unnesting) – Ordinary SQL cannot be applied – Tree should be flatten in order to apply SQL

10/17 Flatten  R = {Name, , {Campaign}}  Flatten(R) = {Name, , CID, Budget, Bid, Word, Fee, Date}

11/17 Querying Tree-Structured Data Flattening [1/2]  Flattening nested relation  NEST Attribute (FLATTEN Attribute (Relation)) ≠ Relation

12/17 Querying Tree-Structured Data Flattening [2/2] Flatten

13/17 Filter Queries  Filter – Conjunction of comparisons AƟB – A : any attribute – B : an attribute or a constant value – Ɵ : any comparison of two values which results Boolean  {=, ≠, ≤, }  Ordinary SQL may be used to flattened relation  However, 2 problems rise

14/17 Filter Queries 2 problems applying SQL  Flattening expand great amount of space needed to hold tuple  Flattening a relation and then applying filter – No way to prune unnecessary nodes  Purpose of this paper is to resolve problems by  Investigating when the result of filtering a flattened relation is equal to flattening a filtered(pruned) relation  Giving an algorithm to perform the filtering on the tree itself

15/17 Filter Queries Reduced and full flattening

16/17 Experiments  No graphs, no environment, Google Style

17/17 Conclusion  Dremel is used in BigQuery  Columnar storage is not enough for Google’s service  Tree-structured model for reducing redundancy  Evaluating and Processing Query is tougher  Still, outperforms the ordinary columnar storage