RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.

RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web

Outline RDF storages –Jena –Sesame –Redland –Brahms Indexing RDF –difference from DB indexing –what to index –examples of index types

Storages Jena –Implemented in Java –Supports RDF, RDFS and OWL –In memory and persistent storage (Oracle, MySQL, PostgreSQL) –RDQL –Reasoning/inference engine –Optimization for common statement patterns - grouping of properties –Powerful, but slow and memory exhaustive

Storages Sesame –Implemented in Java –Modules (HTTP/SOAP handler, admin, query, export, Repository Abstraction Layer) –Persistent RDF store traditional DBMS or dedicated RDF triple storage –Database independent –Scalable architecture –Node-centric approach –Fast and efficient, as for Java implementation

Storages Redland – together with Rasqual and Raptor –Modular approach –Redland – only storage for RDF triples + low level API –Implemented in pure C for portability –Rich API and bindings to other languages –Rasqual - RDF query module (RDQL, SPARQL) –Raptor - a very fast RDF parser –Average performance

Storages Brahms /from LSDIS lab/ –Read-only main-memory storage for RDF read RDF and saves optimized snapshot –Written in C++, optimized for speed additional bindings to Java –Full indexing of Subject-Predicate-Object –Uses Raptor as RDF parser –Rich low level API for graph manipulation –Very fast and memory efficient –Waiting for SPARQL implementation

Brahms Separation of different resource types: –InstanceNode, Literal, SchemaClass, SchemaProperty –Statements InstanceStatament (instance – property – instance) LiteralStatement (instance – property – literal) TypeOfStatement (instance – type – class) –Taxonomy for classes and properties Iterators deal only with one type of resource –not wasting time during instance search algorithm to check for literal or type relation

Indexing of RDF RDF = Graph –traditional DB indexes may not be sufficient XML cannot be indexed directly as relational DB Indexing may take advantage of tree structure –depth of node –common path from the root –convert each path to string expression –precalculate the path tree Simple indexes on statements may also be powerful

Redland Brahms What to index? Most straight-forward approach Statements : subject –[predicate]  object Possibilities: Single: S  PO S  OP O  SP O  PS P  SO P  OS Double: SO  P SP  O PO  S

Single indexes in Brahms [design]

Power of single indexes Full indexing of statements –SPO, SOP, PSO, POS, OSP, OPS –indexes for each type of statements (InstanceStatements, LiteralStatements...) –fast check if given resrouce is connected to another, or uses given property – use of binary search –merge of 2-hop path element in linear time All RDF storages are based on simple indexes and their extensions

Schema Vs. Instances [Brahms] Schema is small compared to instances Instance to taxonomy –know or check for type of the instance Taxonomy index (classes and properties) –direct subtypes/supertypes –all ancesstors/descendants –dynamically build index of instances for given type and all its subtypes

Tree-based index Idea is based on Patricia’s trie Index should scale with the growth of data Path together with leaf is encoded into string -> the Index Fabric „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

Index fabrics Index is used to accelerate path expressions - mainly for queries that ask for root-to-leaf path Idea of prefix encoding –xml: alpha beta gamma –paths: alpha ; beta ; gamma –encoded: A alpha ; A B beta ; A B C gamma –infix (not common): A alpha B beta C gamma Convert path to string for fast searches Replace tags with ‘non-terminal’ characters (like in automata)

Indexing of graphs Backbone http://www.aisee.com/

Indexing of graphs http://www.aisee.com/ Tree-type - prefixes - tries

Indexing of graphs „Index Structure for Path Expressions” - Tova Milo, Dan Suciu 1-index 2-index T-index Path templates

Indexing of graphs http://www.aisee.com/ Landmarks

Indexing of graphs Indexing semistructured data –index fabric - encoding, multilayered –common prefixes - trie structure –backbone - highways between points –landmarks - county division –path templates - precalculated expressions –clustering - grouping by theme access Indexing such data is NOT easy, solution depends how you want to search the graph

References Beckett, D., „The Design and Implementation of the Redland RDF Application Framework”. Cooper et al., „A Fast Index for Semistructured Data” Janik M. And Kochut K., „BRAHMS: A WorkBench RDF Store And High Performance Memory System for Semantic Association Discovery” Milo T. and Suciu D., „Index Structures for Path Expressions” Wilkinson et al., „Efficient RDF Storage and Retrieval in Jena2” Jena - http://jena.sourceforge.net/http://jena.sourceforge.net/ Raptor - http://librdf.org/raptor/http://librdf.org/raptor/ Redland – http://librdf.org/http://librdf.org/ Sesame - http://www.openrdf.org/http://www.openrdf.org/

RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.

Similar presentations

Presentation on theme: "RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.

Similar presentations

Presentation on theme: "RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web."— Presentation transcript:

Similar presentations

About project

Feedback