Peer-to-Peer Discovery of Semantic Associations Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Budak Arpinar, Amit Sheth 2 nd International.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Lecture 15. Graph Algorithms
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Lecture 8: Asynchronous Network Algorithms
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
1 Minimum Spanning Tree Prim-Jarnik algorithm Kruskal algorithm.
An Ontological Approach to the Document Access Problem of Insider Threat ISI 2005, (May 20) Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1.
Lecture 7: Synchronous Network Algorithms
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
The Shortest Path Problem. Shortest-Path Algorithms Find the “shortest” path from point A to point B “Shortest” in time, distance, cost, … Numerous.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Graph & BFS.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Graph & BFS Lecture 22 COMP171 Fall Graph & BFS / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on Mikko Vapa, research student.
Graphs and Sets Dr. Andrew Wallace PhD BEng(hons) EurIng
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,
Introduction to Databases
Analysis of Algorithms
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Ranking Relationships on the Semantic Web Budak Arpinar This work is funded by NSF-ITR-IDM Award# titled '‘SemDIS: Discovering Complex Relationships.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Network Aware Resource Allocation in Distributed Clouds.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 5 Graph Algorithms.
Lesley Charles November 23, 2009.
SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia.
Distributed Semantic Associations Matt Perry Maciej Janik Conrad Ibanez.
SPARQLeR: Extended Sparql for Semantic Association Discovery Krzysztof Kochut and Maciej Janik Work supported by the National Science Foundation Grant.
Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This.
Graph Summaries for Subgraph Frequency Estimation 1 Angela Maduko, 2 Kemafor Anyanwu, 3 Amit Sheth, 4 Paul Schliekelman 1 LSDIS Lab, University of Georgia.
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
Context Aware Semantic Association Ranking SWDB Workshop Berlin, September 7, 2003 Boanerges Aleman-MezaBoanerges Aleman-Meza, Chris Halaschek, I. Budak.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
Graphs Slide credits:  K. Wayne, Princeton U.  C. E. Leiserson and E. Demaine, MIT  K. Birman, Cornell U.
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
 -Queries: Enabling Querying for Semantic Associations on the Semantic Web WWW2003 (Budapest, May 23, 2003) Paper Presentation Kemafor Anyanwu Amit Sheth.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
1 Plaxton Routing. 2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several.
CHAPTER 13 GRAPH ALGORITHMS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA.
Discovering and Ranking Semantic Associations over a Large RDF Metabase Chris Halaschek, Boanerges Aleman- Meza, I. Budak Arpinar, Amit P. Sheth 30th International.
Towards Peer-to-Peer Semantic Web: A Distributed Environment for Sharing Semantic Knowledge on the Web Madhan Arumugam, Amit Sheth, and I. Budak Arpinar.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
NoSQL: Graph Databases
Peer-to-Peer Discovery of Semantic Associations
Harnessing the Semantic Web to Answer Scientific Questions:
Knowledge Discovery in the Semantic Web
Analyzing and Securing Social Networks
CS120 Graphs.
Graphs & Graph Algorithms 2
G-CORE: A Core for Future Graph Query Languages
Lecture 8: Synchronous Network Algorithms
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Presentation transcript:

Peer-to-Peer Discovery of Semantic Associations Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Budak Arpinar, Amit Sheth 2 nd International Workshop on Peer-to-Peer Knowledge Management, San Diego, California, July 17, 2005

From ….. Finding things To ….. Finding out about things Relationships! Semantic Discovery

Semantic Associations Relationship-centric nature of Semantic Web data models We can ask questions about the relationships between objects How is entity A related to entity B? Applications –National Security – Insider Threat 1 –Improved Searching – Bio Patent Miner 2 1.B. Aleman-Meza, P. Burns, M. Eavenson, D. Palaniswami, A. Sheth, An Ontological Approach to the Document Access Problem of Insider Threat, Proceedings of the IEEE Intl. Conference on Intelligence and Security Informatics (ISI-2005), May 19-20, Sougata Mukherjea, Bhuvan Bamba, BioPatentMiner: An Information Retrieval System for BioMedical Patents, VLDB 2004.

Semantic Associations &r1 &r5 &r6 worksFor “Matt” “Perry” fname lname Semantic Association “LSDIS Lab” name “The University of Georgia” name associatedWith ρ -path Define a set of operators ρ for querying complex relationships between entities (Semantic Associations) 1 1.Adapted From: Kemafor Anyanwu, and Amit Sheth, ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web, The Twelfth International World Wide Web Conference, Budapest, Hungary, pp

Uniqueness of Semantic Association Queries Simple query specification (only the two endpoints) Doesn’t require extensive knowledge of schema ρ-path (A, B)

Difficult to express with existing Query Languages SELECT ?startURI, ?property_1, ?endURI FROM (?startURI ?property_1 ?endURI) SELECT ?startURI, ?property_1, ?endURI FROM (?endURI ?property_1 ?start) SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI FROM (?startURI ?property_1 ?x)(?x ?property_2 ?endURI) WHERE ?startURI ne ?x && ?endURI ne ?x SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI FROM (?startURI ?property_1 ?x)(?endURI ?property_2 ?x) WHERE ?startURI ne ?x && ?endURI ne ?x SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI FROM (?x ?property_1 ?startURI)(?x ?property_2 ?endURI) WHERE ?startURI ne ?x && ?endURI ne ?x SELECT ?startURI, ?property_1, ?x, ?property_2, ?endURI FROM (?x ?property_1 ?startURI)(?endURI ?property_2 ?x) WHERE ?startURI ne ?x && ?endURI ne ?x RDQL: Find paths of length at most 2 from startURI to endURI

Why Semantic Associations in P2P? Data on the web by its nature is distributed Knowledge will be stored in multiple stores and multiple ontologies Search for semantic paths will have to include many knowledge sources In the spirit of the Semantic Web (collaborative knowledge discovery)

Contributions Super-Peer Architecture for Querying Semantic Associations Knowledgebase Borders and Distances between Borders Query Planning Algorithm based on Knowledgebase Borders and Distances

Assumptions Pair-wise mapping of resources between peers (solution to Entity Disambiguation / Reference Reconciliation problem) We use URIs to solve Entity Disambiguation problem Main focus is Query Planning over P2P network Not concerned with fault tolerance, details of network formation, etc. at this point

&r4 &r8 &r2 Bank Account no FFlyer Payment paidby typeOf(instance) subClassOf(isA) subPropertyOf paidby “Bill” “Jones” &r7 &r1 fname lname purchased “John” “Smith” Ticket Flight forflight String number purchased Client fname lname String Passenger FFNo String Customer Cash fflierno creditedto fflierno creditedto paidby holder &r9 float amount purchased for &r5&r6 purchased for CCard fname lname “Jeff” “Brown ” fname lname “XYZ123” &r3 String ffid &r11 holder paidby holder RDF Instance Graph

ρ -path Problem ( k -hop limited) Given: –An RDF instance graph G, vertices a and b in G, an integer k Find: –All simple, undirected paths p, with length less than or equal to k, which connect a and b

Distributed ρ -path problem: Find all paths from a start node to an end node over the distributed RDF graphs Knowledge bases - ontologies

What do we need? Efficiently explore node neighborhoods When to stop a search in one peer and continue it in another Determine the search distance in each peer Determine which peers to include in the search

Peer Approach Super-Peer Peer KB Peer KB Peer KB Peer KB Peer KB Peer KB RDF data store (sesame, bhrams) ρ-path (a, b, k) returns subgraph No data store Responsible for Query Planning ρ-path ρ-sub-plan ρ-plan ρ-path subgraph

Knowledgebase Borders Peer 1 Peer 2 Border Node Overlap (Peer_1:Peer_2 Border)

Distance Between Borders Peer 3 Peer 1 Peer 2 dist (P1:P2, P1:P3) = 3 dist (P1:P2, P2:P3) = 1 Dist (P1:P3, P2:P3) = 1 Start End P1:P3 P2:P3 P1:P2 Border node Query end point

Query Planning Graph Directed Graph Node for each distinct border For each pair of connected borders, create 2 edges (one in each direction) Weight is the minimum of the minimum distances (reported by peers) –For example you can get from A:B to A:B:C through either A or B

A B C Borders AB AC BC ABC Minimum Distances dist (AB, BC) = 4 dist (AB, AC) = 3 dist (AB, ABC) = 2 dist (BC, AC) = 5 dist (BC, ABC) = 3 dist (AC, ABC) = 2 dist (AB, BB) = 3 dist (AC, AC) = 3 dist (BC, BC) = 2 dist (ABC, ABC) = ∞ Query Planning Graph AB AC ABC BC

Using the Query Planning Graph end start A C B 1) Find Start and End Points 2) Compute Distances to Borders Example Query: r-path (start, end, 10)

3) Add this Information to QPG AB AC ABC BC end start ) Find all paths from start to end (including cycles) <= k (10) In this case 22 paths

5) Convert Set of Paths to Set of Queries start – 2  Peer_A:Peer_B – 3  Peer_A:Peer_C – 3  end end start A C B start – 2  Peer_B:Peer_C – 2  Peer_B:Peer_C – 2  end

Converting Paths to Queries Each edge (pair of endpoints) represents a query For example, ρ -path (start, Peer_A:Peer_B, 2) start 2 3 What is the correct hop-limit? hop-limit = edge weight + (k – path weight) ρ-path (start, Peer_A:Peer_B, 4) ρ-path (Peer_A:Peer_B, Peer_A:Peer_C, 5) ρ-path (Peer_A:Peer_C, end, 5) k = 10 end 3 A:B A:C

Find the maximum hop-limit for each pair of end points PairHop-limit (start, Peer_A:Peer_B)5 (start, Peer_A:Peer_B:Peer_C)7 (start, Peer_B:Peer_C)8 (Peer_A:Peer_B, Peer_A:Peer_C)5 (Peer_A:Peer_B, Peer_A:Peer_B:Peer_C)5 (Peer_A:Peer_B, Peer_A:Peer_B)3 (Peer_A:Peer_B, Peer_B:Peer_C)6 (Peer_A:Peer_C, Peer_A:Peer_B:Peer_C)3 (Peer_A:Peer_C, Peer_B:Peer_C)6 (Peer_A:Peer_C, end)5 (Peer_B:Peer_C, end)8 (Peer_B:Peer_C, Peer_B:Peer_C)6 (Peer_B:Peer_C, Peer_A:Peer_B:Peer_C)5 (Peer_A:Peer_B:Peer_C, end)6

Which Peer gets each query? ρ-path (Peer_B:Peer_A, Peer_A:Peer_C, 5) Peer_A Peer_C Peer_B 5 Peer_A ρ-path (Peer_B:Peer_C, Peer_B:Peer_C, 5) Peer_B and Peer_C

Final Query Plan Queries for Peer_A FROM: Peer_A:Peer_B:Peer_C TO: Peer_A:Peer_C Hop Limit: 3 FROM: Peer_A:Peer_B TO: Peer_A:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B:Peer_CHop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B Hop Limit: 3 Queries for Peer_B FROM: Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 6 FROM: Peer_B:Peer_C TO: start Hop Limit: 8 FROM: Peer_A:Peer_B TO: start Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_A:Peer_B Hop Limit: 3 FROM: Peer_A:Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B TO: Peer_B:Peer_C Hop Limit: 6 FROM: Peer_A:Peer_B:Peer_C TO: start Hop Limit: 7 Queries for Peer_C FROM: Peer_B:Peer_C TO: end Hop Limit: 8 FROM: Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 6 FROM: Peer_A:Peer_C TO: Peer_B:Peer_C Hop Limit: 5 FROM: Peer_A:Peer_B:Peer_C TO: end Hop Limit: 6 FROM: Peer_A:Peer_B:Peer_C TO: Peer_A:Peer_C Hop Limit: 3 FROM: Peer_A:Peer_C TO: end Hop Limit: 5 FROM: Peer_A:Peer_B:Peer_C TO: Peer_B:Peer_C Hop Limit: 5

Query Execution at Peer Input: Set of Queries: { ρ-path ({uri, …}, {uri, …}, k), …} Algorithm: Graph Traversal of Main Memory representation Bi-directional BFS Results in a set of statements Output: Union of each set of statements

Query Execution at Peer Peer does not enumerate paths Returns a subgraph (set of triples) Benefits –Eliminates redundant data transfer –Saves computation time

Scalability: Multiple Super-Peers Super-Peer_1 Peer_B Peer_A Peer_C Super-Peer_3 Super-Peer_2 Super-Peer/Super-Peer Borders Super-Peer_1:Super-Peer_2 Super-Peer_1:Super-Peer_3 Super-Peer_2:Super-Peer_3 Super-Peer/Peer Borders Peer_B:Super-Peer_2 Peer_A:Super-Peer_3 Peer_C:Super-Peer_3 Super-Peer_1

Integration of SP graph and Peer Graph A:B A:B:C B:C A:C A:SP3 B:SP2 C:SP3 SP1:SP3 SP1:SP Super-Peer_1’s new Peer-Level QPG

Query Planning Algorithm SP2 SP1 SP3 start end B A C 1) Find start and end points 2) Compute distances to borders D E

SP2:SP3 SP1:SP3SP1:SP2 start end 4) Find all directed paths <= k connecting start to end in the Super-Peer QPG start – 6  SP1/SP3 – 2  SP1/SP3 – 2  end start – 6  SP1/SP3 – 2  end start – 3  SP1/SP2 – 6  end start – 10  end k = 10 3) Add temporary information for endpoints (both peer and super-peer QPG) Super-Peer QPG

5) Form a list of sub-query-plan requests for each super-peer Super-Peer_1 FROM: start TO: end Hop-Limit: 10 FROM: start TO: Super-Peer_1:Super-Peer_2 Hop-Limit: 4 FROM: SuperPeer_1:Super-Peer_2 TO: end Hop-Limit: 7 FROM: start TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 8 FROM: Super-Peer_1:Super-Peer_3 TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 2 FROM: Super-Peer_1:Super-Peer_3 TO: end Hop-Limit: 4 Super-Peer_3 FROM: Super-Peer_1:Super-Peer_3 TO: Super-Peer_1:Super-Peer_3 Hop-Limit: 2

7) Each super-peer goes through the previous process on its peer QPG to form a list of ρ-path queries for its peers 8) Querying peer now communicates directly with other peers to execute the ρ-path queries Queries for Peer B: FROM: A:B TO: A:B Hop Limit: 3 FROM: A:B TO: B:C Hop Limit: 6 FROM: A:B:C TO: B:SP2 Hop Limit: 4 FROM: A:B TO: B:SP2 Hop Limit: 2 FROM: A:SP2 TO: start Hop Limit: 4 FROM: B:C TO: B:SP2 Hop Limit: 5 FROM: B:C TO: start Hop Limit: 8 FROM: B:C TO: B:C Hop Limit: 6 FROM: A:B TO: start Hop Limit: 5 FROM: A:B TO: A:B:C Hop Limit: 5 FROM: A:B:C TO: start Hop Limit: 7 FROM: A:B:C TO: B:C Hop Limit: 5 Queries for Peer A: FROM: A:B TO: A:B Hop Limit: 3 FROM: A:B:C TO: A:SP3 Hop Limit: 4 FROM: A:B TO: A:SP3 Hop Limit: 6 FROM: A:B TO: A:C Hop Limit: 5 FROM: A:B TO: A:B:C Hop Limit: 5 FROM: A:B:C TO: A:C Hop Limit: 3 FROM: A:C TO: A:SP3 Hop Limit: 3 Queries for Peer C: FROM: A:B TO: B:C Hop Limit: 5 FROM: A:B TO: endHop Limit: 5 FROM: A:B:C TO: end Hop Limit: 6 FROM: B:C TO: end Hop Limit: 8 FROM: B:C TO: B:C Hop Limit: 6 FROM: B:C TO: C:SP3 Hop Limit: 6 FROM: A:C TO: C:SP3 Hop Limit: 3 FROM: A:B:C TO: A:C Hop Limit: 3 FROM: A:B:C TO: B:C Hop Limit: 5 FROM: A:B:C TO: C:SP3 Hop Limit: 4 FROM: C:SP3 TO: end Hop Limit: 4 Queries for Peer E: FROM: E:SP1 TO: E:SP1 Hop Limit: 2

Conclusions and Future Work Presented a Query-Planning Algorithm for r-path queries over distributed data set Problems –Efficiently compute node neighborhoods –How to continue searches across KBs –How to check for the many possible cases –How to determine search length in each KB

Conclusions and Future Work Future Work –Performance Testing –Effect of relative border size –Different criteria for group formation –How to accommodate other types of queries

Questions?

Computing Borders Super-Peer maintains Sorted Map of URIs Peer Border –Traverse new list and update Sorted Map Super Peer Border –Don’t care about other URIs not in this group –Keep total data transferred at a minimum

Forming the Network SP1 SP2 SP3 P2 P1 P New I want to join the network 1) Broadcast 2) I am a super-peer 3) List of URIs

Forming the Network SP1 SP2 SP3 P2 P1 P New 4) SPs compute overlap O(n log k) (maintain border information) 5) Send overlap count to new peer 6) New peer picks one super-peer accept reject

Forming the Network SP1 SP2 SP3 P2 P1 P New 9) Here are your borders 7) SP1 updates permanent uri index 8) SP1 recomputes SP borders 10) Peers send minimum distances

Computing Super-Peer Borders C E L M U A B G J S SP1 SP2 (SP1, C, false) (SP2, G, false) (SP1, H, false) H H (SP2, J, true) (SP1, K, false) (SP2, R, true) R R (SP1, U, true) (SP2, null, null) H H R R K K K K

A B C Super- Peer 3 Super- Peer 2 Super-Peer 1 Super-Peer Level QPG Minimum Distances dist (AB, BC) = 4 dist (AB, AC) = 3 dist (AB, ABC) = 2 dist (BC, AC) = 5 dist (BC, ABC) = 3 dist (AC, ABC) = 2 dist (AC, A/SP3) = 3 dist (AB, A/SP3) = 4 dist (ABC, A/SP3) = 3 dist (AC, C/SP3) = 2 dist (BC, C/SP3) = 4 dist (ABC, C/SP3) = 2 dist (AB, B/SP2) = 2 dist (BC, B/SP2) = 2 dist (ABC, B/SP2) = 2 Borders AB AC BC A/SP3 B/SP2 C/SP3