Download presentation
Presentation is loading. Please wait.
1
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 DB Lunch @ Berkeley 10.28.05 Semantic Interoperability in Large Scale Heterogeneous Networks Philippe Cudré-Mauroux, EPFL Joint work with: Karl Aberer (advisor @ EPFL) Manfred Hauswirth (Semantic Gossiping) T. van Pelt, L. Zhou & A. Feher (Implementation)
2
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 2 Overview 1.Motivation Picture Sharing in Decentralized Settings 2.Decentralized Data Integration 1.Peer Data Management Systems 2.Probabilistic Message-passing 3.Aspects of self-organization 4.Studying semantic interoperability in the large 3.Applications 1.GridVine 2.PicShark 4.Conclusions
3
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 3 1. Motivation: Picture Sharing Profusion of Digital Images –Variety of powerful devices –gigabytes of pictures is the new norm Most of the images are kept local Some are shared –Mostly point-to-point –Primitive search capabilities MMS HTTP SMTP
4
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 4 Opportunity More and more software use metadata to organize images locally –(Semi) Structured metadata (e.g., XML, PSA) –Ontological metadata (e.g., RDF, XMP) –Type-based metadata (e.g., WinFS) <rdf:RDF xmlns:rdf= 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'> 2001-12-19T18:49:03Z 2001-12-19T20:09:28Z John Doe …
5
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 5 Hurdle: Metadata Heterogeneity Why not taking advantage of those metadata in a distributed setting? X Syntactic discrepancies X Semantic heterogeneity All the aforementioned standards are extensible Shared representation is not enough ImageGUIDcDate A0657B2505.08.04 109E7A2505.08.04 05/08/2004 VS Width Length-Y VS
6
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 6 Beyond Keyword Search searching semantically richer objects in large scale heterogeneous networks 2001-12- 19T18:49:03Z 2001-12- 19T20:09:28Z date? 05/08/2004 Jan 1, 2005 ? ? ? ? ?
7
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 7 2. Decentralized Semantics Traditional database techniques (e.g., LAV/GAV) rely on centralized schemas to integrate data sources Not applicable to our context –Scale (upper ontologies?) –Churn –Autonomy How can we foster semantic interoperability in decentralized settings? Date myDate yourDate m(Date) = yourDate m(Date) = myDate
8
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 8 Semantic Interoperability Q1= $p/GUID FOR $p IN /Photoshop_Image WHERE $p/Creator LIKE "%Robi%" 178A8CD8865 Robinson Tunbridge Wells Royal Council … Photoshop (own schema) 178A8CD8866 Henry Peach Robinson Photographer Tunbridge Council … WinFS (known schema ) T12 = $fs/GUID $fs/Author/DisplayName FOR $fs IN /WinFSImage Q2= $p/GUID FOR $p IN T12 WHERE $p/Creator LIKE "%Robi%" Extending semantic interoperability techniques to decentralized settings
9
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 9 2.1 Peer Data Management Systems Local pairwise mappings –Peer Data Management Systems (PDMS) Pairwise mappings overcome global schema heterogeneity –Transitive closures on mapping operations 2001-12- 19T18:49:03Z 2001-12- 19T20:09:28Z date? 05/08/2004 Jan 1, 2005 article weather es:cDate xap:CreateDate es:cDate myRDF :Date myRDF: Date xap:ModifyDate
10
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 10 Problem: Precision/Recall Tradeoff Semantic Query routing –To whom shall I forward a query posed against my local schema? Some (most) mappings will be (partially) faulty –Low expressive power of mappings –Automatic schema alignment techniques –Granularity of conceptualizations… Local query resolution –Low recall Flooding (PDMS) –Low precision Standard deductive integration is not sufficient –Uncertainty on mappings and conceptualizations abductive reasoning (on transitive closures of mappings)
11
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 11 2.2. Probabilistic Message Passing m0m0 m1m1 m2m2 m3m3 m4m4 m5m5 q VS m 3 (m 4 (m 0 (q))) Link-based analysis of the PDMS: - Mapping Cycles - Parallel Paths Semantics as global agreement
12
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 12 Computing a Marginal for one cycle P(m 0, m 1, m 2, m 3, f 0 ) = P(m 0 ) P(m 1 ) P(m 2 ) P(m 3 ) P(f 0 | m 0, m 1, m 2, m 3, ) P(m 0 | f 0 )= m1, m2, m3 P(m 0, m 1, m 2, m 3, f 0 ) P(f 0 ) -1 But: feedbacks on different cycles are correlated –Need to express a global probabilistic model for the mapping graph observedunknown
13
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 13 A Brief Intro to Factor-Graphs g(x1, x2, x3, x4) = fA(x1, x2)fB(x2, x3, x4)
14
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 14 Deriving PDMS Factor-Graphs
15
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 15 PDMS Factor-Graphs Cyclic graph –Junction Tree? Clustering / Stretching of variables? Not applicable (decentralization) –Iterative Sum-Product Approximate results How to perform iterative sum-product by message passing on the mapping graph? –Message passing in factor graph does not correspond to connectivity of mapping graph –We want to rely on decentralized computations only Locality VS Globality of nodes in the factor graph –Mappings: local –Feedback factor: common, global knowledge –Observed feedback variables: neighborhood
16
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 16 Embedded Message-Passing (1)
17
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 17 Embedded Message-Passing (2)
18
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 18 Sending Messages in the Mapping Graph Message-Passing Schedules –Periodic –Lazy (piggybacking on query forwarding) No message overhead
19
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 19 Implemented System Schemas –Import from OWL (Web Ontology Language) Mappings –KnowledgeWeb Ontology Alignment API –Import from RDF/XML –Automated on-the-fly creation –Comparison to standard alignments Automatic derivation of quality measures P(m=correct | {F}) for the mappings using iterative message-passing Per-Hop Forwarding Behaviors (Semantic Gossiping)
20
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 20 Some (Preliminary) Results: Convergence (undirected example graph, prior 0.7 delta 0.1)
21
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 21 Impact Of Cycle Length (simple cycle, prior 0.5)
22
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 22 Fault-tolerance (faulty links) (undirected example graph, prior 0.8 delta 0.1)
23
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 23 Preliminary Results: EON (Alignment contest) Worst-case scenario: no prior knowledge Set of 6 schemas on bibliographic data (approx. 30-40 attributes) 396 generated attribute mappings (84 incorrect)
24
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 24 2.3. Semantic Gossiping Selectively reformulate queries through mapping links –Semantic disances Cycles analysis ( ) Results analysis –Syntactic distance Lost predicates π Title Author=Joe (R2) π Titre Auteur=Joe (R1) π Title Creator=Joe (R3) π Title Creature=Joe (R5) Author=Joe (R4)) X X π Title Creator=Joe (R4)
25
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 25 Self-Organization Two types of self-organization –Static network Self-organizing dissemination of queries ( ) –Dynamic network Self-organizing network of mappings Idea: –Quality evaluation of mappings through Semantic Gossiping –Drop low quality links –Reorganized network leads to different quality evaluation –Dynamic network changes self-organizing, self-referential semantic network
26
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities Some Results (1) Sensitivity to TTL (cycle analysis only, 25 schemas, 4 concepts)
27
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities Some Results (2) Scalability (results analysis only, 4 concepts, TTL=3, misclassification rate=0.1, 2 documents/peer on avg.)
28
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 28 2.4. Semantic Interoperability in the Large Do we have enough (good) mappings? Modeling semantic interoperability: The semantic connectivity graph –Idea: as for physical network analyses, define a connectivity layer –Unweighted, non-redundant version of the Schema-to-schema graph –Observation: Peers in a set P s are semantically interoperable iff S s is strongly connected, with S s {s | p P s, p s} Schema-to-Schema Graph –Logical model –Directed –Weighted –Redundant
29
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 29 Analyzing Semantic Interoperability in the Large Analyzing semantic interoperability in large-scale, decentralized networks –Percolation theory for directed graphs –Based on recent graph-theoretic frameworks –Random graphs with specific degree distributions p jk, clustering coefficients cc and bidirectionality coefficient bc Necessary condition for semantic interoperability in the large: j,k (jk-j(bc+cc)-k)p jk ≥ 0 Excellent approximations of the size of semantically interoperable clusters in the graph Analysis: Sequence Retrieval System
30
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 30 3. Applications 1.GridVine Self-organizing semantic overlay network 2.PicShark Self-organizing middleware to export pictures and create mappings
31
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 31 3.1 GridVine Building large-scale semantic systems –Self-organizing semantic overlay network
32
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 32 Semantic Mediation Layer Correlated / Uncorrelated Correlated / Uncorrelated “Physical” layer Overlay Layer Semantic Mediation Layer
33
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 33 Features Based on the P-Grid P2P structure –Distributed Hash Table developed at EPFL –Self-organized, scalable, decentralized –Resolves key-based searches in O (log(n)) even for unbalanced trees Semantic Web compliant –RDF triples, RDFS schemas, OWL mappings Structured searches –RDQL queries Semantic Gossiping –Fosters semantic interoperability
34
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 34 GridVine: Annotating Content
35
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 35 Decentralized Query Resolution: Overview
36
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 36 3.2 PicShark Where do the translation links come from? Middleware for sharing semi-structured metadata attached to pictures and creating translation links PSP XMP WinFS Metadata Extractor (Distributed) Hashtable (e.g., GridVine) Insert Retrieve Features Extractor 60 moments Information Tracker PicShark
37
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 37 Features Self-Organization of mappings –Based on low-level features extracted from Picture (color moment, textures) Structured Metadata (lexicographical analysis) Self-Organization of annotations –Probabilistic propagation of annotations between similar individuals Self-Organization of query propagation –Schema distance based on probabilistic subsumption –Propagation within a certain diameter Driven by user interaction Scalable Computationally expensive operations are local at the peers Only simple in-network operations (look-ups) (on-going) collaborative effort with Microsoft Research Asia
38
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 38 PicShark Prototype
39
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 39 4. Conclusions Fundamental issue: Interoperability in large scale (semi) structured environments –Content Sharing –Information search –Semantic Web? Traditional techniques are not sufficient –Scale –Autonomy –Uncertainty Self-organizing, decentralized stochastic processes Data Indexation Data Integration Query dissemination
40
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 40 Some References (1) Semantic Gossiping A Framework for Semantic Gossiping Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth SIGMOD Record, 31(4), December 2002. The Chatty Web: Emergent Semantics through Gossiping Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, International World Wide Web Conference (WWW 03). Probabilistic Message-Passing in Peer-Data Management Systems Philippe Cudré-Mauroux, Karl Aberer, and Andras Feher International Conference on Data Engineering (ICDE 06). Self-Organizing Semantics Start making sense: The Chatty Web approach for global semantic agreements, Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth, Journal of Web Semantics, 1 (1), December 2003. Emergent Semantics Principles and Issues Karl Aberer, Philippe Cudré-Mauroux and Aris M. Ouksel (editors) Tiziana Catarci Mohand-Said Hacid, Arantza Illarramendi, Vipul Kashyap, Massimo Mecella, Eduardo Mena, Erich J. Neuhold, Olga De Troyer, Thomas Risse, Monica Scannapieco, Fèlix Saltor, Luca de Santis, Stefano Spaccapietra, Steffen Staab and Rudi Studer International Conference on Database Systems for Advanced Applications (DASFAA 04).
41
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 41 Some References (2) Semantic Interoperability In the Large A Necessary Condition For Semantic Interoperability In The Large Philippe Cudré-Mauroux and Karl Aberer International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 04). Analyzing Semantic Interoperability in Bioinformatic Database Networks Philippe Cudré-Mauroux, Julien Gaugaz, Adriana Budura and Karl Aberer Semantic Network Analysis (SNA 05). GridVine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth and Tim van Pelt International Semantic Web Conference (ISWC 04). Semantic Overlay Netwoks (tutorial) Karl Aberer and Philippe Cudré-Mauroux International Conference on Very Large Data Bases (VLDB 05). … more references at http://lsirpeople.epfl.ch/pcudre/
42
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 42 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.