Download presentation
Presentation is loading. Please wait.
Published byDarrell Bailey Modified over 9 years ago
1
FlexTable: Using a Dynamic Relation Model to Store RDF Data 2010. 7. 14 IDS Lab. Seungseok Kang
2
Copyright 2008 by CEBT Outline Introduction Preliminary Schema Evolution Similarity Measurement Lattice-Based Algorithm Control Parameter Modification of Physical Storage Experiment and Analysis
3
Copyright 2008 by CEBT Introduction Resource Description Framework (RDF) Flexible model for representing information about resources Solutions to store and query RDF data TripleStore – Storing predicate as values in table VertPart – Statistics of predicate correlation are lost
4
Copyright 2008 by CEBT Introduction Requirement for reducing scan and join cost Triple should be organized as triple groups – How to group the triples to reduce query cost? All triples sharing same subject should be stored in one page – How to support this process dynamically? FlexTable Dynamic relation model Contributions of FlexTable – A method based on lattice-structure to design evolving triple groups – A new data page for reducing cost of schema evolution
5
Copyright 2008 by CEBT Preliminaries Triple (s,p,v) ∈ (U ∪ B)XUX(U ∪ B ∪ L) U: a set of URLs, B: a set of blank node, L: a set of literals RDF tuple A tuple coalesced with a set of triples having a same subject RDF schema A set of RDF tuples stored as a table in FlexTable
6
Copyright 2008 by CEBT Schema Evolution Classification of triples When triples are considered as a whole, the correlation of all predicates are difficult to compute (e.g. queries with join) Predicates could be clustered into several classes – Join order and predicate correlation statistics would have a great effect on query performance Schema evolution Extract RDF schema from RDF tuple Similar schemas are merged automatically according to their similarity – Similarity measurement – Lattice-based algorithm (LBA) – Control parameter
7
Copyright 2008 by CEBT Similarity Measurement Two schemas with maximum similarity value will be merged While a new RDF tuple is inserted Cosine-distance measure Compute the importance of an attribute in one schema – Example: if attribute “a 1 ” exists in less schemas than “a 2 ”, two schemas sharing attribute “a 1 ” are more similar than those only sharing “a 2 ” (e.g. “inUniversity” vs. “name”) Cosine-distance which denotes the similarity of two schemas A ratio of RDF tuples which have values in attribute a j to all RDF tuples containted in s i
8
Copyright 2008 by CEBT Lattice-Based Algorithm A straightforward method Compute every similarity pairs, pick up the most similar pair – O(n) time complexity / O(n 2 ) space complexity Lattice-Based algorithm (LBA) Each RDF schema is corresponded to a node in the lattice With all the attribute of schema A is contained in attribute set of schema B, A is an ancestor (parent) of B – Upper node is parent node / Dashed line is brother node Only the similarities between parent-child schema or brother schema pair are computed
9
Copyright 2008 by CEBT Lattice-Based Approach Algorithm EvolutionLattice(tuple, lattice) Input: tuple – An RDF tuple lattice – An RDF schema lattice Output: lattice 1: schema <- ExtractSchema(tuple); 2: AddSchema(schema, lattice); 3: schemaPair,<-GetMaxSimPair(lattice); 4: if(NeedMerge(schemaPair)) 5: newSchema=MergeSchema(schemaPair); 6: AddSchema(newSchema,lattice) 7: InsertTuple(tuple); 8: return lattice; Algorithm AddSchema(schema, lattice) Input: schema - A new schema lattice – An RDF schema lattice Output: lattice 1: bottom <- getBottomNode(lattice); 2: stack <- new Stack(bottom); 3: while(!isEmpty(stack)) 4: temp <- pop(stack); 5: if (schema is ancestor of temp) 6: push all parents of temp into stack; 7: else 8: AddChildren(temp’s children, schema); 9: compute similarity between temp’s children and schema; 10: top<-getTopNode(lattice); 11: push top in stack; 12: while(!isEmpty(stack)) 13: temp<-pop(stack); 14: if (temp is ancestor of schema) 15: push all children of temp into stack; 16: else 17: AddParents(temp’s parents, schema); 18: compute similarity between temp’s parents and schema; 19: compute similarity between temp and schema; 20: compute similarity between temp’s brothers and schema; 21: return lattice; AddSchema
10
Copyright 2008 by CEBT Control Parameter Problem of schema evolution Stop merge: to compute the storage gain evolution – If storage cost of a new schema is smaller than existing two schemas, merge these two schemas into the new one – Otherwise, no need for action Storage cost of a schema Storage gain for schema merging – While C gain >0, NeedMerge is T, otherwise F Summary Compute similarity between two schemas Lattice-Based algorithm for dynamic relational schemas A formula to determine when to merge two schemas a: Storage cost of schema information b: Storage cost of each attribute in one schema |A|: Number of attributes |N|: Number of RDF tuples r: Storage cost of each bitmap C val : storage cost of actual values
11
Copyright 2008 by CEBT Physical Storage A tuple’s values are stored in the same order as order as attributes in schema (traditional databases) Benefit to reduce storage space Inefficient when schema evolution happens frequently – {name,age,univ}{Kate,53}(110)+{name,sex,univ}{Jim,MEN,UCLA}(111) -> {name,age,univ,sex}(1100)(1011) Problems – The cost of schema merging is prohibitively high Solutions – System must “interpret” the attribute names and values for each tuple at query access time – Page-interpret to divide data page into three region Page header, attribute interpreted area, data value area
12
Copyright 2008 by CEBT Physical Storagae Physical storage design of FlexTable
13
Copyright 2008 by CEBT Experiment and Analysis Setting T2390@1.86GHz, 1GB Ram, 160GB SATA T2390@1.86GHz FreeToGovCyc with 45,823 triples, 10,905 instances Yago with 1,000,000 triples, 152,362 instances Analysis Analysis of triples import Analysis of storage cost Analysis of query performance
14
Copyright 2008 by CEBT Experiment and Analysis Analysis of triples import Analysis of Storage Cost
15
Copyright 2008 by CEBT Experiment and Analysis Analysis of query performance Test queries – search all instances having predicates in the query – “SELECT ?x WHERE {?x pred1 ?val1. {?x pred2 ?val2} … {?x predN ?valN} } – Add predicates to the query pattern one by one Number of joins is increased by predicate sequence
16
Copyright 2008 by CEBT Conclusion FlexTable RDF storage system using dynamic relation model Support efficient storage and query for DF data Features of the paper – Mechanism to support dynamic schema evolution – Novel page layout to avoid physical data rewritten – Comprehensive experiments Advantage of FlexTable – Less storage cost than state-of-the-art – Better time for triple import, storage, and query performance Future work Extending FlexTable to column-oriented database
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.