Hexastore: Sextuple Indexing for Semantic Web Data Management

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao 1 1 gStore: Answering SPARQL Queries Via Subgraph Matching 1 Peking University, 2 Hong.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
RDF-3X: a RISC style Engine for RDF Ref: Thomas Neumann and Gerhard Weikum [PVLDB’08 ] Presented by: Pankaj Vanwari Course: Advanced Databases (CS 632)
Adaptive Fastest Path Computation on a Road Network : A Traffic Mining Approach Hector Gonzalez Jiawei Han Xiaolei Li Margaret Myslinska John Paul Sondag.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Summarized by: Arpit Gagneja.
Graph Data Management Lab, School of Computer Scalable SPARQL Querying of Large RDF Graphs Xu Bo
Storing RDF Data in Hadoop And Retrieval Pankil Doshi Asif Mohammed Mohammad Farhan Husain Dr. Latifur Khan Dr. Bhavani Thuraisingham.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach VLDB, 2007 Oct 15, 2014.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB Summarized.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools Mohammad Farhan Husain, Latifur Khan, Murat Kantarcioglu and Bhavani Thuraisingham.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Ultrawrap: SPARQL Execution on Relational Data Juan F. Sequeda, Daniel P. Miranker University of Texas - Austin ISWC 2009 Seoul National University Internet.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
KIT – University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Center for E-Business Technology Seoul National University Seoul, Korea Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Database Tuning Chap 8 : IOT Architecture Chap 9 : Cluster Factor Optimization Center for E-Business Technology Seoul National University Seoul, Korea.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
GStore: Answering SPARQL Queries Via Subgraph Matching Lei Zou 1, Jinghui Mo 1, Lei Chen 2, M. Tamer Özsu 3, Dongyan Zhao Peking University, 2 Hong.
RDF-3X : RISC-Style RDF Database Engine
Scalable Distributed Reasoning Using MapReduce Jacopo Urbani, Spyros Kotoulas, Eyal Oren, and Frank van Harmelen Department of Computer Science, Vrije.
RDF-3X : a RISC-style Engine for RDF Thomas Neumann, Gerhard Weikum Max-Planck-Institute fur Informatik, Max-Planck-Institute fur Informatik PVLDB ‘08.
RDF-3X: a RISC-style Engine for RDF Presented by Thomas Neumann, Gerhard Weikum Max-Planck-Institut fur Informatik Saarbrucken, Germany Session 19: System.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Adam Samuel R. Kate Abadi Marcus Madden MIT Daniel Hurwitz Technion:
A Comparison of Approaches to Large-Scale Data Analysis Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, Michael.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Triple Storage. Copyright  2006 by CEBT Triple(RDF) Storages  A triple store is designed to store and retrieve identities that are constructed from.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
A Comparison of Join Algorithms for Log Processing in MapReduce SIGMOD 2010 Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita,
Research Meeting Jaeseok Myung. Copyright  2009 by CEBT Summary  TA DB : project 3, midterm(24 명 응시 ) WEC : report, project (android), classroom,
DB Tuning : Chapter 10. Optimizer Center for E-Business Technology Seoul National University Seoul, Korea 이상근 Intelligent Database Systems Lab School of.
Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer University of London, Birkbeck VLDB’07, September 23-28, 2007,
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
Scalable Semantic Web Data Management Using Vertical Partitioning Daniel J. Abadi, Adam Marcus, Samuel R. Madden, Kate Hollenbach David Yona Seminar On.
Research Meeting Jaeseok Myung. Copyright  2009 by CEBT Summary  TA DB: 중간고사 채점 – 평균 : 66.04, 표준편차 : – 지난학기 평균 : 59.33, 표준편자 :
Why indexing? For efficient searching of a document
Practical Database Design and Tuning
Module 11: File Structure
Physical Database Design and Performance
RDF Stores S. Sakr and G. A. Naymat.
Physical Database Design
Practical Database Design and Tuning
Presentation transcript:

Hexastore: Sextuple Indexing for Semantic Web Data Management Presented by Cathrin Weiss, Panagiotis Karras, Abraham Bernstein Department of Informatics, University of Zurich Session: Indexing and Query Processing, VLDB 2008 2010-01-22 Summarized by Jaeseok Myung Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea

Overview Hexastore – Sextuple Indexing In this presentation, A Triple (S, P, O) can be represented in six ways (3! = 6) SPO, SOP, PSO, POS, OSP, OPS Every possible indexing scheme can be materialized Allows quick and scalable query processing Up to five times bigger index space is needed In this presentation, Review conventional RDF storage structures Introduction to Hexastore Discussion Center for E-Business Technology

Physical Designs for RDF Storage (1/4) Giant Triples Table SELECT ?title WHERE { ?book <title> ?title. ?book <author> <Fox, Joe>. ?book <copyright> <2001> } Join! Join! Entire Table Scan! Redundancy! Center for E-Business Technology

Physical Designs for RDF Storage (2/4) Clustered Property Table Contains clusters of properties that tend to be defined together Center for E-Business Technology

Physical Designs for RDF Storage (3/4) Property-Class Table Exploits the type property of subjects to cluster similar sets of subjects together in the same table Unlike clustered property table, a property may exist in multiple property-class tables Values of the type property Center for E-Business Technology

Physical Designs for RDF Storage (4/4) Vertically Partitioned Table The giant table is rewritten into n two column tables where n is the number of unique properties in the data We don’t have to Maintain null values Have a certain clustering algorithm property subject object Center for E-Business Technology

Motivation The problem of having non-property-bound queries Center for E-Business Technology

Hexastore: Sextuple Indexing Center for E-Business Technology

Hexastore: Sextuple Indexing Center for E-Business Technology

Five-fold Increase in Index Space Sharing The Same Terminal Lists SPO-PSO, SOP-OSP, POS-OPS The key of each of the three resources in a triple appears in two headers and two vectors, but only in one list Center for E-Business Technology

Mapping Dictionary Replacing all literals by unique IDs using a mapping dictionary Mapping dictionary compresses the triple store Reduced redundancy, Saving a lot of physical space We can concentrate on a logical index structure rather than the physical storage design S P O object214 hasColor blue belongsTo object352 … S P O 1 2 3 4 … ID Value object214 1 hasColor … Center for E-Business Technology

Clustered B+-Tree (RDF-3X, VLDB 2008) Store everything in a clustered B+-Tree Triples are sorted in lexicographical order Allowing the conversion of SPARQL patterns into range scan We don’t have to do entire table scan S P O 1 2 3 4 … Actually, we don’t need this table! <Mapping Dictionary> ID Value object214 1 hasColor … 002 … 000 001 002 003 Center for E-Business Technology

Argumentation Concise and Efficient Handling of Multi-valued Resources Index can contain multiple items cf. Multi-valued Property Table Avoidance of NULLs Only those RDF elements that are relevant to a particular other element need to be stored in a particular index No ad-hoc Choices Needed Most other RDF data storage schemes require several ad-hoc decisions about their data representation architecture ex. Clustered Property Table (which properties to be stored together) Center for E-Business Technology

Argumentation Reduced I/O cost Other RDF storage schemes may need to access multiple tables which are irrelevant to a query Queries that are not bounded by property All First-step Pairwise Joins are Fast Merge-Joins The key of resources in all vectors and lists used in a Hexastore are sorted Reduction of Unions and Joins ex. a list of subjects related to two particular objects through any property Hexastore can use osp index Center for E-Business Technology

Treating the Path Expression Problem Select B.subj FROM triples AS A, triples AS B WHERE A.prop = wasBorn AND A.obj = ‘1860’ AND A.subj = B.obj AND B.prop = ‘Author’ A path expression requires (n-1) subject-object self-joins where n is the length of the path Vertical Partitioning Materialized Path Expressions (A.author:wasBorn = ‘1860’) n-1C2 = O(n2) possible additional properties Hexastore (n-1) merge-join using pso and pos indices Center for E-Business Technology

Experimental Evaluation Setup 2.8GHz dual core, 16GB RAM Competitors Column-oriented Vertical Partitioning Approaches COVP1 – PSO Index COVP2 – PSO Index + POS Index (second copy) Hexastore SPO, SOP, PSO, POS, OSP, OPS Datasets Barton, MIT library data, 61 mil. triples, 258 properties LUBM, A synthetic benchmark data set(10 univ.), 6.8 mil. triples, 18 predicates Center for E-Business Technology

Performance (Barton Data) Center for E-Business Technology

Performance (LUBM, 10) Center for E-Business Technology

Memory Usage In practice, Hexastore requires a four-fold increase in memory in comparison to COVP1, which is an affordable cost for the derived advantages Center for E-Business Technology

Conclusion Hexastore: Sextuple-Indexing Scheme My Question Worst-case five-fold storage increase in comparison to a conventional triples table Quick and scalable general-purpose query processing All pairwise joins in a Hexastore can be rendered as merge joins My Question Main-memory Indexing (Is it possible?) 7GB RAM for 6 mil. triples Other Options? Center for E-Business Technology