FlexTable: Using a Dynamic Relation Model to Store RDF Data 2010. 7. 14 IDS Lab. Seungseok Kang.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.

Indexing DNA Sequences Using q-Grams

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.

CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

Query Processing and Optimizing on SSDs Flash Group Qingling Cao

Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.

Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.

TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.

Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.

1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Xyleme A Dynamic Warehouse for XML Data of the Web.

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.

WEDAGEN: A Synthetic Web Database Generator. Presentation Outline l Existing WWW search mechanisms l WHOWEDA: A Warehouse of Web Data l Modular structure.

16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.

Presented by Gentre Dozier and Spencer Dille management.com/newsletters/database_metadata_unstructured_data_triple_store html.

Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.

Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.

Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.

Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.

Hexastore: Sextuple Indexing for Semantic Web Data Management

Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.

Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.

Access Path Selection in a Relational Database Management System Selinger et al.

DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.

1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.

Querying Structured Text in an XML Database By Xuemei Luo.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Lesley Charles November 23, 2009.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

The Volcano Query Optimization Framework S. Sudarshan (based on description in Prasan Roy’s thesis Chapter 2)

Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,

Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

A.Alzubair Hassan Abdullah Dept. Computer Sciences Kassala University A.Alzubair Hassan Abdullah Dept. Computer Sciences Kassala University NESTED SUBPROGRAMS.

An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.

GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.

RDF-3X : a RISC-style Engine for RDF Thomas Neumann, Gerhard Weikum Max-Planck-Institute fur Informatik, Max-Planck-Institute fur Informatik PVLDB ‘08.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.

RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.

CS4432: Database Systems II Query Processing- Part 2.

Restrictions on Concept Lattices for Pattern Management Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor, Lahcen Boumedjout, Jean Vaillancourt October 20,

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.

Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉教授 : 許毅然作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.

CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.

Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.

Implementing Subprograms

Antara Ghosh Jignashu Parikh

Module 11: File Structure

CPS216: Data-intensive Computing Systems

Indexes By Adrienne Watt.

Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.

Chapter 15 QUERY EXECUTION.

Implementing Subprograms

1 Demand of your DB is changing Presented By: Ashwani Kumar

Hierarchical clustering approaches for high-throughput data

Selected Topics: External Sorting, Join Algorithms, …

Lecture 19: Data Storage and Indexes

Web Couple: Coupling web information

Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.

Lecture 20: Representing Data Elements

Implementing Subprograms

Presentation transcript:

FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang

Copyright  2008 by CEBT Outline  Introduction  Preliminary  Schema Evolution Similarity Measurement Lattice-Based Algorithm Control Parameter  Modification of Physical Storage  Experiment and Analysis

Copyright  2008 by CEBT Introduction  Resource Description Framework (RDF) Flexible model for representing information about resources  Solutions to store and query RDF data TripleStore – Storing predicate as values in table VertPart – Statistics of predicate correlation are lost

Copyright  2008 by CEBT Introduction  Requirement for reducing scan and join cost Triple should be organized as triple groups – How to group the triples to reduce query cost? All triples sharing same subject should be stored in one page – How to support this process dynamically?  FlexTable Dynamic relation model Contributions of FlexTable – A method based on lattice-structure to design evolving triple groups – A new data page for reducing cost of schema evolution

Copyright  2008 by CEBT Preliminaries  Triple (s,p,v) ∈ (U ∪ B)XUX(U ∪ B ∪ L) U: a set of URLs, B: a set of blank node, L: a set of literals  RDF tuple A tuple coalesced with a set of triples having a same subject  RDF schema A set of RDF tuples stored as a table in FlexTable

Copyright  2008 by CEBT Schema Evolution  Classification of triples When triples are considered as a whole, the correlation of all predicates are difficult to compute (e.g. queries with join) Predicates could be clustered into several classes – Join order and predicate correlation statistics would have a great effect on query performance  Schema evolution Extract RDF schema from RDF tuple Similar schemas are merged automatically according to their similarity – Similarity measurement – Lattice-based algorithm (LBA) – Control parameter

Copyright  2008 by CEBT Similarity Measurement  Two schemas with maximum similarity value will be merged While a new RDF tuple is inserted  Cosine-distance measure Compute the importance of an attribute in one schema – Example: if attribute “a 1 ” exists in less schemas than “a 2 ”, two schemas sharing attribute “a 1 ” are more similar than those only sharing “a 2 ” (e.g. “inUniversity” vs. “name”) Cosine-distance which denotes the similarity of two schemas A ratio of RDF tuples which have values in attribute a j to all RDF tuples containted in s i

Copyright  2008 by CEBT Lattice-Based Algorithm  A straightforward method Compute every similarity pairs, pick up the most similar pair – O(n) time complexity / O(n 2 ) space complexity  Lattice-Based algorithm (LBA) Each RDF schema is corresponded to a node in the lattice With all the attribute of schema A is contained in attribute set of schema B, A is an ancestor (parent) of B – Upper node is parent node / Dashed line is brother node Only the similarities between parent-child schema or brother schema pair are computed

Copyright  2008 by CEBT Lattice-Based Approach Algorithm EvolutionLattice(tuple, lattice) Input: tuple – An RDF tuple lattice – An RDF schema lattice Output: lattice 1: schema <- ExtractSchema(tuple); 2: AddSchema(schema, lattice); 3: schemaPair,<-GetMaxSimPair(lattice); 4: if(NeedMerge(schemaPair)) 5: newSchema=MergeSchema(schemaPair); 6: AddSchema(newSchema,lattice) 7: InsertTuple(tuple); 8: return lattice; Algorithm AddSchema(schema, lattice) Input: schema - A new schema lattice – An RDF schema lattice Output: lattice 1: bottom <- getBottomNode(lattice); 2: stack <- new Stack(bottom); 3: while(!isEmpty(stack)) 4: temp <- pop(stack); 5: if (schema is ancestor of temp) 6: push all parents of temp into stack; 7: else 8: AddChildren(temp’s children, schema); 9: compute similarity between temp’s children and schema; 10: top<-getTopNode(lattice); 11: push top in stack; 12: while(!isEmpty(stack)) 13: temp<-pop(stack); 14: if (temp is ancestor of schema) 15: push all children of temp into stack; 16: else 17: AddParents(temp’s parents, schema); 18: compute similarity between temp’s parents and schema; 19: compute similarity between temp and schema; 20: compute similarity between temp’s brothers and schema; 21: return lattice; AddSchema

Copyright  2008 by CEBT Control Parameter  Problem of schema evolution Stop merge: to compute the storage gain evolution – If storage cost of a new schema is smaller than existing two schemas, merge these two schemas into the new one – Otherwise, no need for action Storage cost of a schema Storage gain for schema merging – While C gain >0, NeedMerge is T, otherwise F  Summary Compute similarity between two schemas Lattice-Based algorithm for dynamic relational schemas A formula to determine when to merge two schemas a: Storage cost of schema information b: Storage cost of each attribute in one schema |A|: Number of attributes |N|: Number of RDF tuples r: Storage cost of each bitmap C val : storage cost of actual values

Copyright  2008 by CEBT Physical Storage  A tuple’s values are stored in the same order as order as attributes in schema (traditional databases) Benefit to reduce storage space Inefficient when schema evolution happens frequently – {name,age,univ}{Kate,53}(110)+{name,sex,univ}{Jim,MEN,UCLA}(111) -> {name,age,univ,sex}(1100)(1011) Problems – The cost of schema merging is prohibitively high Solutions – System must “interpret” the attribute names and values for each tuple at query access time – Page-interpret to divide data page into three region Page header, attribute interpreted area, data value area

Copyright  2008 by CEBT Physical Storagae  Physical storage design of FlexTable

Copyright  2008 by CEBT Experiment and Analysis  Setting 1GB Ram, 160GB SATA FreeToGovCyc with 45,823 triples, 10,905 instances Yago with 1,000,000 triples, 152,362 instances  Analysis Analysis of triples import Analysis of storage cost Analysis of query performance

Copyright  2008 by CEBT Experiment and Analysis  Analysis of triples import  Analysis of Storage Cost

Copyright  2008 by CEBT Experiment and Analysis  Analysis of query performance Test queries – search all instances having predicates in the query – “SELECT ?x WHERE {?x pred1 ?val1. {?x pred2 ?val2} … {?x predN ?valN} } – Add predicates to the query pattern one by one Number of joins is increased by predicate sequence

Copyright  2008 by CEBT Conclusion  FlexTable RDF storage system using dynamic relation model Support efficient storage and query for DF data Features of the paper – Mechanism to support dynamic schema evolution – Novel page layout to avoid physical data rewritten – Comprehensive experiments Advantage of FlexTable – Less storage cost than state-of-the-art – Better time for triple import, storage, and query performance  Future work Extending FlexTable to column-oriented database