©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia 08.20.04 Philippe Cudré-Mauroux Distributed.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
By Ahmet Can Babaoğlu Abdurrahman Beşinci.  Suppose you want to buy a Star wars DVD having such properties;  wide-screen ( not full-screen )  the extra.
©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Chatty Web: Emergent Semantics Through Gossiping WWW2003 Karl Aberer,
By Murat Şensoy Ontology Alignment by Murat Şensoy
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
©2004, Philippe Cudré-Mauroux Sharing Pictures in Peer-DBMS MSRA, Image Retrieval Meeting Philippe Cudré-Mauroux Distributed Information Systems Laboratory.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Chatty Web approach for global semantic agreements MMGPS Workshop,
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
©2004, Philippe Cudré-Mauroux Exploiting Localized Metadata in Decentralized Settings Microsoft Research Asia Philippe Cudré-Mauroux Distributed.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
ODBASE A Necessary Condition for Semantic Interoperability in the Large Philippe Cudré-Mauroux and Karl Aberer School of Computer and Communication.
1 ISWC GridVine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth School of Computer.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Logics for Data and Knowledge Representation
PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Network Ontology Ramesh Subbaraman Soumya Sen UPENN, TCOM 799.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Christoph F. Eick University of Houston Organization 1. What are Ontologies? 2. What are they good for? 3. Ontologies and.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Semantic and geographic information system for MCDA: review and user interface building Christophe PAOLI*, Pascal OBERTI**, Marie-Laure NIVET* University.
The Semantic Web By: Maulik Parikh.
Database Management:.
Piotr Kaminski University of Victoria September 24th, 2002
Database Design Hacettepe University
Information Networks: State of the Art
Presentation transcript:

©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia Philippe Cudré-Mauroux Distributed Information Systems Laboratory (LSIR) Swiss Federal Institute of Technology, Lausanne (EPFL)

©2004, Philippe Cudré-Mauroux Outline I.Classical Information Integration (overview) –Global Schema –Multidatabase Language Approach –Federated Databases II.Information Integration in the Large –Context: The Semantic Web –Shared ontologies –The Chatty Web III.State of the Art in Ontology Alignment (overview) IV.Semantic Integration in a Large-Scale Image Sharing Scenario

©2004, Philippe Cudré-Mauroux I. Classical Information Integration Goal: providing a uniform access to multiple heterogeneous information sources More than data exchange (e.g., ASCII, EDI, XML) Old problem, difficult, well-know (partial) solutions

©2004, Philippe Cudré-Mauroux Global Schema Integration Merge multiple databases into one global database Performed by human expert Time consuming and error prone Local autonomy lost Static solution Book(ISBN, Title, Price, Author) Author(Name, ISBN) Livre(ISBN, Prix, Titre) Auteur(Prenom, Nom, ISBN) Book(ISBN, Title) Author(Name, ISBN) S1 S2

©2004, Philippe Cudré-Mauroux Multidatabase Language Approach No attempt at integrating schemas Language (e.g., MSQL) used to integrate information sources at run-time Simple example: Not transparent Heavy burden on (expert) users Global queries subject to local changes Use S1, S2 Select Titre From S1.Book, S2.Livre Where S1.Book.ISBN = S2.Livre.ISBN

©2004, Philippe Cudré-Mauroux Federated Databases Idea: Each information source exports a schema specifying shared relations Tight-coupling: –Global schema integration on all export schema (cf. global schema integration) Loose-coupling: –Dynamic add / drop, e.g., by creating views (logical relations)

©2004, Philippe Cudré-Mauroux GAV (Global as View) Global (mediated) schema is expressed as a view on local schemas Book(ISBN, Title, Author) […] Book(ISBN, Title) Author(Name, ISBN) Create VIEW Book As Select ISBN, Title, Author From S1.Book, S1.Author Where Book.ISBN = Author.ISBN Mediated Schema S1 S2 S3

©2004, Philippe Cudré-Mauroux LAV (Local as View) Local schemas are expressed as a view on global schema Book(ISBN, Title, Author) […] Book(ISBN, Title) Author(Name, ISBN) Create VIEW S1.Book As Select ISBN, Title From Book Mediated Schema S1 S2 S3

©2004, Philippe Cudré-Mauroux LAV / GAV (cont.) Transparent access to heterogeneous databases in the federation Local autonomy is (usually) preserved Query processing through query reformulation Requires global agreement on the mediated schema (tight semantic coupling) Does not scale well

©2004, Philippe Cudré-Mauroux II. Information Integration in the Large Goal: providing a uniform access to many heterogeneous information sources Traditional approaches are inadequate –Lack of adaptability –Lack of transparency –Lack of scalability Hot research area

©2004, Philippe Cudré-Mauroux Some Applications Agent Communication Web services integration Information retrieval from heterogeneous databases Catalog matching P2P information sharing Personal information delivery Vertical information publishing

©2004, Philippe Cudré-Mauroux General Context: The Semantic Web Providing machine-processable data to the Web

©2004, Philippe Cudré-Mauroux RDF/RDFS 2’ Overview RDF triple: RDF Schemas –Classes of resources –Classes of properties –Constraints on the subject (domain) or object (range) –Subclassing Extensible! –Full-fledged ontological language: OWL Subject Object Property

©2004, Philippe Cudré-Mauroux Example: CreativeCommons <rdf:RDF xmlns=" xmlns:dc=" xmlns:rdf=" Compilers in the Key of C A lovely classical work on compiling code. Yo-Yo Dyne Gnomophone 1842 audio/mpeg

©2004, Philippe Cudré-Mauroux Semantic Interoperability in The Semantic Web Common ontologies provide for shared context –Requires global agreement! Intractable standardization effort! Back to stage 1… Two Plausible solutions: –Agreed-upon corpuses of basic concepts IEEE SUMO Stanford TAP … –Local federation of ontologies fostering global interoperability EPFL Chatty Web U. Washington Piazza … Complementary approaches

©2004, Philippe Cudré-Mauroux The Chatty Web A lab in Trondheim species EMBLChange site at Cambridge Swissprot site at Geneva A lab at MIT organism Query posted at EPFL organism EMBLChange peers species, … SwissProt peers authors, titles, organism, … other peers authors, … organism  authors organism  species species  organism Local translations enabling global agreements Analyzing transitive closures of local mappings

©2004, Philippe Cudré-Mauroux On Translations Links (ontology mappings)

©2004, Philippe Cudré-Mauroux III. State of the Art on Ontology Alignment Problem: Given two ontologies which describe each a set of discrete entities, find the relationships holding between the entities Alignments can then be used to foster interoperability locally Difficult problem (fully automatic solutions?) Active area of research

©2004, Philippe Cudré-Mauroux Local Ontology Alignment Techniques 1.Terminological methods –string-based –language-based Intrinsic Extrinsic Multilingual 2.Structural methods –Internal –External 3.Others –Extensional –Semantic –User Feedback

©2004, Philippe Cudré-Mauroux 1. Terminological Methods String-based: compare labels of entities –(sub-) String equality –Edit distances –Token-based distances (e.g., TF/IDF on substrings) Language-based –Intrinsic Terminological matching with morphological / syntactic analysis (allomorphies) –Extrinsic Use of external resources (e.g., WordNet synsets) –Multilingual methods Matches terms in different languages

©2004, Philippe Cudré-Mauroux 2. Structural Methods Internal (constraint-based): –Data-based domain comparison –Multiplicities / Properties comparison –Similarity between collections External –Mereologic structures –Taxonomic structures –Relations bw similar entities

©2004, Philippe Cudré-Mauroux 3. Other Extensional methods –Extension  set of instances of a class –Jaccard similarity: –Similarity-based extension comparison Semantic Methods –Based on model-theoretic semantics –SAT problem (e.g., subsumption) User Feedback

©2004, Philippe Cudré-Mauroux A Handful of Systems APrompt (Stanford) [T,I,S,U] Cupid (Microsoft research) [T,I,S] Bibster (U. Karlsruhe) [T,I,S] Glue (U. Washington) [E] S-Match (U. Trento) [T,S,M] …  Typically: a mix of techniques [Terminological, Internal structure, external Structure, Extensional, seMantic, User]

©2004, Philippe Cudré-Mauroux IV. Semantic Integration in a Large-Scale Image Sharing Scenario Problem: retrieve a specific image from a large collection of shared images So far: most application mix CB and text analysis –CB image analysis provides a low-level objective representation of an image Good for comparing image features Not so good w.r.t. end-users needs expressed in N.L. –Surrounding text / filenames might sometimes be a high- level subjective view of the image Incomplete, out-of-context description Good w.r.t. N.L. (cf. Google images)

©2004, Philippe Cudré-Mauroux Potential Opportunity Emerging applications make use of high-level, local and semi-contextualized image metadata –Structured metadata (Photoshop Album, XML) –Ontological metadata (RDF, Adobe XMP) –Type-based metadata (Microsoft WinFS) Paradigm shift from the old metadata standards (e.g., keywords, EXIF) –Extensible formats Personal conjecture: –Metadata will be prominent in a few years –Huge opportunity for image retrieval

©2004, Philippe Cudré-Mauroux Structured Metadata Ex.: Photoshop Album Hierarchy of tags Stored in a relational, proprietary, local database Non-exportable

©2004, Philippe Cudré-Mauroux Ontological Metadata (1) Ex.: Extensible Metadata Platform (XMP) Subset of RDF/S Metadata might be embedded into the file Supported by a wide range of Adobe applications –Adobe® Acrobat® –Adobe FrameMaker® –Adobe GoLive® –Adobe Illustrator® –Adobe InCopy® –Adobe InDesign® –Adobe LiveMotion™ –Adobe Photoshop® –Adobe Document Server –Adobe Graphics Server –Version Cue™

©2004, Philippe Cudré-Mauroux Ontological Metadata (2) Ex.: Photoshop XMP schema

©2004, Philippe Cudré-Mauroux Type-Based Metadata (1) New file-system for Longhorn (NTFS +++ ) No more hierarchies (i.e., folders) but metadata Items – Attributes – Relationships – Schemas – Sub-Schemas (extensions) –Déjà vu?

©2004, Philippe Cudré-Mauroux Type-Based Metadata (2) Ex.: image schema in WinFS

©2004, Philippe Cudré-Mauroux Observations So far, all applications using these metadata are local –It is a typical semantic interoperability problem! Efficient, distributed WinFS is not for tomorrow… Image metadata will always be incomplete and subjective #images >> #peers >> #schemas All these formats can be formally described by a subset of Description Logics –Use them all in and in the darkness bind them!

©2004, Philippe Cudré-Mauroux Outline of my Project Objective: large-scale image retrieval framework taking advantage of metadata Outline –Import images –Import metadata –Extract low-level features (thanks to Lei :-) –Store everything in a common, scalable representation –Export data in a shared repository SQL server P2P network (SP2 ?) –Infer Metadata / Schema mappings locally –Cross validate mappings –Cluster peers / images vis-à-vis their subjective views

©2004, Philippe Cudré-Mauroux Specificities Different metadata models Incompleteness of metadata (e.g., WinFS dangling links) Metadata sparseness Few (but widely-used) core-classes Many custom extensions Many resources Low-level representation of the resources Embedded user feedback  Unique application

©2004, Philippe Cudré-Mauroux Finding Mapping Candidates (sketch) [T,I,U] U-Inference based on mutual information (scalable!) schema, metadata Low-level features Low-level features metadata schema, metadata schema, metadata schema feedback metadata, schemas

©2004, Philippe Cudré-Mauroux Cross-Validating Mappings (sketch) [S,M] Cross-validation based on graph partitioning, semantic gossiping or SAT techniques Ref.: Instance-based Schema Matching for Web Databases by Domain-specific Query Probing Jiying Wang, Ji-Rong Wen, Frederick H. Lochovsky, Wei-Ying Ma VLDB 2004

©2004, Philippe Cudré-Mauroux Conclusions Leveraging local metadata produced by end-users –Complex problem! Good heuristics could take years to be developed… Local communications / computations –Scalability Hopefully, better results than keywords / low- level analyses even for simple heuristics –Take advantage of context Images given local semantics Analyze the dynamics of the overall system Objectivity vs. subjectivity of interpretation becomes a measurable quality

©2004, Philippe Cudré-Mauroux References RDF/S: XMP: WinFS: Chatty Web: Ontology Alignment: knowledgeweb D –(send me an to f-pcudre)