On Triple Dissemination, Forward- Chaining and Load Balancing in DHT Based RDF Stores Dominic Battre, Felix Heine, Andre Höing, and Odej Kao Presented by Aldarwich Yaser Albert-Ludwigs-University Freiburg SS 2009 Department of Computer Science Computer Networks and Telematics Prof. Christian Schindelhaue
Overview Motivation Introduction RDF DHT Pastry Triples dissemination Reasoning Load Balancing References 1
Motivation Centralized database Shortcomings Incapable to handle load Capacities limitation like in (Seasame,Jena) Decentralized database Example: B abelpeers,RDFpeers and Edutella Provides scalibility,effeciency and capacity Reasoning Infer new data from existing information Load balancing
RDF Introduction Resource Description Framework (RDF) Used for representing information on the Web RDFs provides a powerful model for storing and inferencing knowledge. In RDF everything is represented by triples of the form(S,P,O) Example: Germany has Capital Berlin S P O 2
DHT Introduction Solve the item location problem in a distributed network of nodes Use a key k to calculate the ID ID=hash(k) Operations: Put(k, x) Get(k) 3
Triple dissemination Triple T=(s,p,o) identifier = (hash(s)) identifier = (hash(p)) identifier = (hash(o)) Responsible node for p Responsible node for o Responsible node for s Query q = (s, p, o) identifier = (hash(p)) 4
Pastry Protocol Each peer has a 128-bit ID: nodeID Unique and uniformly distributed Use cryptographic function applied to IP-address Message takes O(log N) steps to destination Node state contains: Leaf Set Routing table explain Neighborhood Set
Pastry (prefix-matching) Route(m, )? Node-id Key
RDf Reasoning The query is formulated gernerally RDFs extract data even if the description does not exactly match the query Example: Christian fatherof Schindelhauer Father subpropertyof relatives => Christian relative of Schindelhauer
RDFS Rules Generated TriplePreconditionRule Name u, rdf:type, xa,rdfs:domain,x u, a, v rdfs2 v, rdf:type, xa, rdfs:range, x u, a, v Rdfs3 u,rdfs:subPropertyOf,xu, rdfs:subPropertyOf, v v, rdfs:subpropertyOf, x rdfs5 v, rdf:type, xu, rdfs:subClassOf, x v, rdf:type, u rdfs9 u, rdfs:subClassOf, xu, rdfs:subClassOf, v v, rdfs:subClassOf, x rdfs11 6
Node Architecture Each node hosts multiple RDf databases local triples database Received triples database Replica database Generated triples Generated Triples Local Triples Received Triples Replica 5 Node
Triple dissemination in DHT Node1 Node2 Node3 Node4 Generated Triples Local Triples Received Triples Replica Generated Triples Local Triples Received Triples Replica Generated Triples Local Triples Received Triples Replica Generated Triples Local Triples Received Triples Replica 7
Triples life-cycle Triples are subjected to different events like (Joining, Departure) Triples life-time long life time triples has few refreshes refreshes short life time triples(generated triples) Update triples update inferred triples Soft-state
Node Departure Node substitution Correction of routing table Replica duty Decreasing number of replicas 8 n1 n4 n3 n2 n9
Node Arrival More complicated Query recieving Task of replica nodes Time reduction 9 n1 n4 n3 n2 n6 n9
Load balancing Major criticism against DHT based RDF strores Many collisions are unavoidable Example: DHT stores many triples with predicate rdf:type “ rdfs:subClassOf“ create many triples with Predicate rdf:type Overlay Tree B uilds for discrete DHT positions like the one stores triples with rdf:type 10
Node1 Node2 Node3 Node4 Local Triples Received Triples Local Generated Triples Remote Triples Exte Local Remote Triples Local Triples Received Triples Generated Triples Local Triples Received Triples Generated Triples Local Triples Received Triples Generated Triples Local Remote Triples Exte Local Remote Triples Local Remote Triples references Load-balancing with remote triples database 11
Replicated overlay tree Root Rank1 Rank2 12
Query routing in overlay tree Root Rank1 Rank2 Qeury Result 13
Handling RDFs rules in load balancing Problem of RDF rules As node is overloaded, the triples are splited into other nodes Example: a, rdfs:domain, x u, a, v a, rdfs:domain, x u,a,v a, rdfs:domain, x Node3Node1Node2
Handling RDFs rules in load balancing Solution Make copy of most common rdfs schema into each node in overlay tree a, rdfs:domain, x u,a,v Node1Node4Node3 a, rdfs:domain, x u, a, v Node2 a, rdfs:domain, x
Conclusion P2p based distributed database offer better scalability and source integration Real power of RDF is stems from possibility to derive new data from explicit knwoledge Overlay tree is the solution for overloading problem
References Battre,heine,Kao:Top k RDF query evaluation in p2p 14
Thanks for your Attention