Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-Peer Discovery of Semantic Associations

Similar presentations


Presentation on theme: "Peer-to-Peer Discovery of Semantic Associations"— Presentation transcript:

1 Peer-to-Peer Discovery of Semantic Associations
Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Budak Arpinar, Amit Sheth 2nd International Workshop on Peer-to-Peer Knowledge Management, San Diego, California, July 17, 2005

2 Semantic Discovery1 From ….. Finding things To …..
Finding out about things Relationships! 1.

3 Semantic Associations
Relationship-centric nature of Semantic Web data models We can ask questions about the relationships between objects How is entity A related to entity B? Applications National Security – Insider Threat1 Improved Searching – Bio Patent Miner2 B. Aleman-Meza, P. Burns, M. Eavenson, D. Palaniswami, A. Sheth, An Ontological Approach to the Document Access Problem of Insider Threat, Proceedings of the IEEE Intl. Conference on Intelligence and Security Informatics (ISI-2005), May 19-20, 2005 Sougata Mukherjea, Bhuvan Bamba, BioPatentMiner: An Information Retrieval System for BioMedical Patents, VLDB 2004.

4 Semantic Associations
Define a set of operators ρ for querying complex relationships between entities (Semantic Associations)1 “Matt” “Perry” fname lname Semantic Association name “The University of Georgia” &r1 &r6 worksFor associatedWith ρ-path &r5 name “LSDIS Lab” Adapted From: Kemafor Anyanwu, and Amit Sheth, ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web, The Twelfth International World Wide Web Conference, Budapest, Hungary, pp

5 Uniqueness of Semantic Association Queries
Simple query specification (only the two endpoints) Doesn’t require extensive knowledge of schema ρ-path (A, B)

6 Difficult to express with existing Query Languages
SELECT ?startURI, ?property_1, ?endURI FROM (?startURI ?property_1 ?endURI) FROM (?endURI ?property_1 ?start) SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI FROM (?startURI ?property_1 ?x)(?x ?property_2 ?endURI) WHERE ?startURI ne ?x && ?endURI ne ?x FROM (?startURI ?property_1 ?x)(?endURI ?property_2 ?x) FROM (?x ?property_1 ?startURI)(?x ?property_2 ?endURI) FROM (?x ?property_1 ?startURI)(?endURI ?property_2 ?x) RDQL: Find paths of length at most 2 from startURI to endURI

7 Why Semantic Associations in P2P?
Data on the web by its nature is distributed Knowledge will be stored in multiple stores and multiple ontologies Search for semantic paths will have to include many knowledge sources In the spirit of the Semantic Web (collaborative knowledge discovery)

8 Contributions Super-Peer Architecture for Querying Semantic Associations Knowledgebase Borders and Distances between Borders Query Planning Algorithm based on Knowledgebase Borders and Distances

9 Assumptions Pair-wise mapping of resources between peers (solution to Entity Disambiguation / Reference Reconciliation problem) We use URIs to solve Entity Disambiguation problem Main focus is Query Planning over P2P network Not concerned with fault tolerance, details of network formation, etc. at this point

10 RDF Instance Graph typeOf(instance) purchased String Passenger Ticket
subClassOf(isA) fname for String subPropertyOf lname forflight number String purchased paidby no creditedto Flight Bank Account String Customer RDF Instance Graph Payment amount holder float FFlyer fflierno FFNo ffid String CCard Cash Client &r4 ffid “XYZ123” &r11 holder fflierno “John” fname purchased &r2 paidby &r3 &r1 “Smith” lname creditedto paidby purchased “Jeff” “Brown ” fname for &r5 &r6 lname “Bill” “Jones” fname paidby &r7 purchased &r8 &r9 holder lname

11 ρ-path Problem (k-hop limited)
Given: An RDF instance graph G, vertices a and b in G, an integer k Find: All simple, undirected paths p, with length less than or equal to k, which connect a and b

12 Distributed ρ-path problem: Find all paths from a start node to an end node over the distributed RDF graphs Knowledge bases - ontologies

13 What do we need? Efficiently explore node neighborhoods
When to stop a search in one peer and continue it in another Determine the search distance in each peer Determine which peers to include in the search

14 Approach Peer KB Peer Peer KB KB Super-Peer subgraph Peer ρ-path
RDF data store (sesame, bhrams) ρ-path (a, b, k) returns subgraph Approach No data store Responsible for Query Planning Peer KB Peer Peer KB KB Super-Peer Peer subgraph ρ-path ρ-sub-plan ρ-sub-plan ρ-plan KB ρ-path ρ-sub-plan ρ-sub-plan Super-Peer Peer Super-Peer KB Peer subgraph KB ρ-path Peer ρ-sub-plan ρ-sub-plan KB Peer KB Peer KB subgraph ρ-path

15 Knowledgebase Borders
Overlap (Peer_1:Peer_2 Border) Peer 2 Peer 1 Border Node

16 Distance Between Borders
P1:P2 Peer 2 Peer 1 Border node Query end point End P1:P3 dist (P1:P2, P1:P3) = 3 dist (P1:P2, P2:P3) = 1 Dist (P1:P3, P2:P3) = 1 P2:P3 Peer 3 Start

17 Query Planning Graph Directed Graph Node for each distinct border
For each pair of connected borders, create 2 edges (one in each direction) Weight is the minimum of the minimum distances (reported by peers) For example you can get from A:B to A:B:C through either A or B

18 Query Planning Graph A B C 3 AB 2 4 3 ABC 2 3 AC BC 5 2 3
Minimum Distances dist (AB, BC) = 4 dist (AB, AC) = 3 dist (AB, ABC) = 2 dist (BC, AC) = 5 dist (BC, ABC) = 3 dist (AC, ABC) = 2 dist (AB, BB) = 3 dist (AC, AC) = 3 dist (BC, BC) = 2 dist (ABC, ABC) = ∞ Borders AB AC BC ABC 4 3 ABC 2 3 AC BC 5 3 2

19 Using the Query Planning Graph
Example Query: r-path (start, end, 10) A 1) Find Start and End Points 2 C 3 2 4 2) Compute Distances to Borders B end 2 2 start

20 3) Add this Information to QPG
AB 2 start 2 4 3 ABC 2 4 4) Find all paths from start to end (including cycles) <= k (10) 2 3 AC BC 5 2 In this case 22 paths 3 3 2 2 end

21 5) Convert Set of Paths to Set of Queries
start – 2  Peer_A:Peer_B – 3  Peer_A:Peer_C – 3  end start – 2  Peer_B:Peer_C – 2  Peer_B:Peer_C – 2  end A 3 3 2 C 2 2 B end 2 start

22 Converting Paths to Queries
2 3 start 3 end A:B A:C Each edge (pair of endpoints) represents a query For example, ρ-path (start, Peer_A:Peer_B, 2) What is the correct hop-limit? hop-limit = edge weight + (k – path weight) ρ-path (start, Peer_A:Peer_B, 4) ρ-path (Peer_A:Peer_B, Peer_A:Peer_C, 5) ρ-path (Peer_A:Peer_C, end, 5) k = 10

23 Find the maximum hop-limit for each pair of end points
(start, Peer_A:Peer_B) 5 (start, Peer_A:Peer_B:Peer_C) 7 (start, Peer_B:Peer_C) 8 (Peer_A:Peer_B, Peer_A:Peer_C) (Peer_A:Peer_B, Peer_A:Peer_B:Peer_C) (Peer_A:Peer_B, Peer_A:Peer_B) 3 (Peer_A:Peer_B, Peer_B:Peer_C) 6 (Peer_A:Peer_C, Peer_A:Peer_B:Peer_C) (Peer_A:Peer_C, Peer_B:Peer_C) (Peer_A:Peer_C, end) (Peer_B:Peer_C, end) (Peer_B:Peer_C, Peer_B:Peer_C) (Peer_B:Peer_C, Peer_A:Peer_B:Peer_C) (Peer_A:Peer_B:Peer_C, end)

24 Which Peer gets each query?
ρ-path (Peer_B:Peer_A, Peer_A:Peer_C, 5) Peer_A Peer_A ρ-path (Peer_B:Peer_C, Peer_B:Peer_C, 5) 5 Peer_B and Peer_C Peer_C Peer_B

25 Final Query Plan Queries for Peer_B Queries for Peer_A
FROM: Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 6 FROM: Peer_B:Peer_C TO: start Hop Limit: 8 FROM: Peer_A:Peer_B TO: start Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B Hop Limit: 3 FROM: Peer_A:Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_B:Peer_C Hop Limit: 6 FROM: Peer_A:Peer_B:Peer_C TO: start Hop Limit: 7 Queries for Peer_C FROM: Peer_B:Peer_C TO: end Hop Limit: 8 FROM: Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 6 FROM: Peer_A:Peer_C TO: Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B:Peer_C TO: end Hop Limit: 6 FROM: Peer_A:Peer_B:Peer_C TO: Peer_A:Peer_C Hop Limit: 3 FROM: Peer_A:Peer_C TO: end Hop Limit: 5 FROM: Peer_A:Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 5 Queries for Peer_A FROM: Peer_A:Peer_B:Peer_C TO: Peer_A:Peer_C Hop Limit: 3 FROM: Peer_A:Peer_B TO: Peer_A:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B Hop Limit: 3

26 Query Execution at Peer
Input: Set of Queries: { ρ-path ({uri, …}, {uri, …}, k), …} Algorithm: Graph Traversal of Main Memory representation Bi-directional BFS Results in a set of statements Output: Union of each set of statements

27 Query Execution at Peer
Peer does not enumerate paths Returns a subgraph (set of triples) Benefits Eliminates redundant data transfer Saves computation time

28 Scalability: Multiple Super-Peers
Super-Peer/Super-Peer Borders Super-Peer_1:Super-Peer_2 Super-Peer_1:Super-Peer_3 Super-Peer_2:Super-Peer_3 Super-Peer_1 Super-Peer_2 Peer_B Super-Peer/Peer Borders Peer_B:Super-Peer_2 Peer_A:Super-Peer_3 Peer_C:Super-Peer_3 Peer_A Super-Peer_1 Peer_C Super-Peer_3

29 Integration of SP graph and Peer Graph
Super-Peer_1’s new Peer-Level QPG A:B 2 4 B:SP2 4 A:SP3 3 3 2 5 A:B:C 3 4 SP1:SP2 2 3 5 A:C B:C SP1:SP3 2 2 4 C:SP3

30 Query Planning Algorithm
SP2 SP1 B start D A C E end SP3 1) Find start and end points 2) Compute distances to borders

31 4 3) Add temporary information for endpoints (both peer and super-peer QPG) Super-Peer QPG SP2:SP3 3 4 4) Find all directed paths <= k connecting start to end in the Super-Peer QPG 3 6 SP1:SP2 SP1:SP3 2 3 2 6 6 start end 10 k = 10 start – 6  SP1/SP3 – 2  SP1/SP3 – 2  end start – 6  SP1/SP3 – 2  end start – 3  SP1/SP2 – 6  end start – 10  end

32 5) Form a list of sub-query-plan requests for each super-peer
FROM: start TO: end Hop-Limit: 10 FROM: start TO: Super-Peer_1:Super-Peer_2 Hop-Limit: 4 FROM: SuperPeer_1:Super-Peer_2 TO: end Hop-Limit: 7 FROM: start TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 8 FROM: Super-Peer_1:Super-Peer_3 TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 2 FROM: Super-Peer_1:Super-Peer_3 TO: end Hop-Limit: 4 Super-Peer_3 FROM: Super-Peer_1:Super-Peer_3 TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 2

33 7) Each super-peer goes through the previous process on its peer QPG
to form a list of ρ-path queries for its peers Queries for Peer E: FROM: E:SP1 TO: E:SP1 Hop Limit: 2 Queries for Peer B: FROM: A:B TO: A:B Hop Limit: 3 FROM: A:B TO: B:C Hop Limit: 6 FROM: A:B:C TO: B:SP2 Hop Limit: 4 FROM: A:B TO: B:SP2 Hop Limit: 2 FROM: A:SP2 TO: start Hop Limit: 4 FROM: B:C TO: B:SP2 Hop Limit: 5 FROM: B:C TO: start Hop Limit: 8 FROM: B:C TO: B:C Hop Limit: 6 FROM: A:B TO: start Hop Limit: 5 FROM: A:B TO: A:B:C Hop Limit: 5 FROM: A:B:C TO: start Hop Limit: 7 FROM: A:B:C TO: B:C Hop Limit: 5 Queries for Peer A: FROM: A:B TO: A:B Hop Limit: 3 FROM: A:B:C TO: A:SP3 Hop Limit: 4 FROM: A:B TO: A:SP3 Hop Limit: 6 FROM: A:B TO: A:C Hop Limit: 5 FROM: A:B TO: A:B:C Hop Limit: 5 FROM: A:B:C TO: A:C Hop Limit: 3 FROM: A:C TO: A:SP3 Hop Limit: 3 Queries for Peer C: FROM: A:B TO: B:C Hop Limit: 5 FROM: A:B TO: end Hop Limit: 5 FROM: A:B:C TO: end Hop Limit: 6 FROM: B:C TO: end Hop Limit: 8 FROM: B:C TO: B:C Hop Limit: 6 FROM: B:C TO: C:SP3 Hop Limit: 6 FROM: A:C TO: C:SP3 Hop Limit: 3 FROM: A:B:C TO: A:C Hop Limit: 3 FROM: A:B:C TO: B:C Hop Limit: 5 FROM: A:B:C TO: C:SP3 Hop Limit: 4 FROM: C:SP3 TO: end Hop Limit: 4 8) Querying peer now communicates directly with other peers to execute the ρ-path queries

34 Conclusions and Future Work
Presented a Query-Planning Algorithm for r-path queries over distributed data set Problems Efficiently compute node neighborhoods How to continue searches across KBs How to check for the many possible cases How to determine search length in each KB

35 Conclusions and Future Work
Performance Testing Effect of relative border size Different criteria for group formation How to accommodate other types of queries

36 Questions?

37 Computing Borders Super-Peer maintains Sorted Map of URIs Peer Border
Traverse new list and update Sorted Map Super Peer Border Don’t care about other URIs not in this group Keep total data transferred at a minimum

38 Forming the Network I want to join the network 1) Broadcast
3) List of URIs 1) Broadcast SP2 P New SP1 SP3 P2 P1 2) I am a super-peer

39 Forming the Network 6) New peer picks one super-peer reject SP2 accept
P New SP1 SP3 reject P2 P1 4) SPs compute overlap O(n log k) (maintain border information) 5) Send overlap count to new peer

40 Forming the Network 10) Peers send minimum distances SP2 P New SP1 SP3
9) Here are your borders 8) SP1 recomputes SP borders 7) SP1 updates permanent uri index

41 Computing Super-Peer Borders
SP1 SP2 C E L M U A B G J S (SP1, K, false) (SP1, H, false) (SP1, C, false) (SP1, U, true) H H K K H H (SP2, G, false) (SP2, null, null) (SP2, J, true) (SP2, R, true) K K R R R R

42 Super-Peer Level QPG A B C Super-Peer 1 Super- Peer 3 Super- Peer 2
Borders AB AC BC A/SP3 B/SP2 C/SP3 Minimum Distances dist (AB, BC) = 4 dist (AB, AC) = 3 dist (AB, ABC) = 2 dist (BC, AC) = 5 dist (BC, ABC) = 3 dist (AC, ABC) = 2 dist (AC, A/SP3) = 3 dist (AB, A/SP3) = 4 dist (ABC, A/SP3) = 3 dist (AC, C/SP3) = 2 dist (BC, C/SP3) = 4 dist (ABC, C/SP3) = 2 dist (AB, B/SP2) = 2 dist (BC, B/SP2) = 2 dist (ABC, B/SP2) = 2 Super-Peer 1 A B C Super- Peer 3 Super- Peer 2


Download ppt "Peer-to-Peer Discovery of Semantic Associations"

Similar presentations


Ads by Google