09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Uncertainty in Data Integration Ai Jing
2 Introduction A central issue in supporting interoperability is achieving type compatibility. Type compatibility allows (a) entities developed by various.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
TU/e technische universiteit eindhoven Hera: Development of Semantic Web Information Systems Geert-Jan Houben Peter Barna Flavius Frasincar Richard Vdovjak.
CSE 636 Data Integration Data Integration Approaches.
CHAPTER 3: DESCRIBING DATA SOURCES
Information Integration Using Logical Views Jeffrey D. Ullman.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
May 28, 2002 P2P Databases 1 Philip A. Bernstein Microsoft Research Fausto Giunchiglia Univ. of Trento Anastasios Kementsietsidis Univ. of Toronto John.
Data Integration: A Status Report Alon Halevy University of Washington, Seattle BTW 2003.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
1 Database Research at the UW  Faculty: Alon Halevy and Dan Suciu. A dozen Ph.D students  Related faculty: Oren Etzioni, Pedro Domingos, Dan Weld and.
CIA 2003 th International Workshop on Cooperative Information Agents CIA th International Workshop on Cooperative Information Agents DIA: Data Integration.
Institute for Scientific Computing – University of ViennaP.Brezany 1 Databases and the Grid Peter Brezany Institute für Scientific Computing University.
Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005.
CSE 636 Data Integration Overview. 2 Data Warehouse Architecture Data Source Data Source Relational Database (Warehouse) Data Source Users   Applications.
CSE 636 Data Integration Introduction. 2 Staff Instructor: Dr. Michalis Petropoulos Location: 210 Bell Hall Office Hours:
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
What Can Databases Do for Peer-to-Peer Steven Gribble, Alon Halevy, Zachary Ives, Maya Rodrig, Dan Suciu Presented by: Ryan Huebsch CS294-4 P2P Systems.
Methodology Conceptual Database Design
Crossing the Structure Chasm Alon Halevy University of Washington FQAS 2002.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Evaluating Centralized, Hierarchical, and Networked Architectures for Rule Systems Benjamin Craig University of New Brunswick Faculty of Computer Science.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Research Topics in Computing Data Modelling for Data Schema Integration 1 March 2005 David George.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Piazza: Data Management Infrastructure for the Semantic Web Zachary G. Ives University of Pennsylvania CIS 700 – Internet-Scale Distributed Computing February.
Database System Concepts and Architecture
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
1 Maintaining Semantics in the Design of Valid and Reversible SemiStructured Views Yabing Chen, Tok Wang Ling, Mong Li Lee Department of Computer Science.
Agent Model for Interaction with Semantic Web Services Ivo Mihailovic.
CSE 636 Data Integration Overview Fall What is Data Integration? The problem of providing uniform (sources transparent to user) access to (query,
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Querying Structured Text in an XML Database By Xuemei Luo.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Dimitrios Skoutas Alkis Simitsis
Google Fusion Tables: Web-Centered Data Management and Collaboration Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen, Anno Langen, Jayant Madhavan,
Presented by Jiwen Sun, Lihui Zhao 24/3/2004
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Semantic Overlay Networks in P2P systems A. Crespo, H. Garcia-Molina Speaker: Pavel Serdyukov Tutor: Jens Graupmann.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Raluca Paiu1 Semantic Web Search By Raluca PAIU
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Data Integration Approaches
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
1 Integrating Databases into the Semantic Web through an Ontology-based Framework Dejing Dou, Paea LePendu, Shiwoong Kim Computer and Information Science,
Towards Peer-to-Peer Semantic Web: A Distributed Environment for Sharing Semantic Knowledge on the Web Madhan Arumugam, Amit Sheth, and I. Budak Arpinar.
Managing Data Resources File Organization and databases for business information systems.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Database Management:.
Lecture 16: Probabilistic Databases
Data Model.
Presentation transcript:

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G. Ives, Peter Mork, Igor Tatarinov. Speaker: Sergey Chernov Tutor: Jens Graupmann

09/12/2003Peer-to-Peer Information Systems – WS 03/042 Outline INTRODUCTION. SEMANTIC WEB PIAZZA: SYSTEM OVERVIEW IMPLEMENTATION DETAILS 3.1 MAPPING LANGUAGE 3.2 QUERY ANSWERING ALGORITHM CONCLUSIONS.

09/12/2003Peer-to-Peer Information Systems – WS 03/043 Introduction ► Goal:  Data Integration and Knowledge Management ► Problem:  Web data lacks machine-understandable semantics ► Solution:  Semantic Web?

09/12/2003Peer-to-Peer Information Systems – WS 03/044 The Semantic Web * ► ► Web sites include structural annotations   You can pose meaningful queries on them.   Ontologies provide the semantic glue.   Internal implementation of web sites left open. ► ► Agents perform tasks:   Query one or more web sites   Perform updates (e.g., set schedules)   Coordinate actions   Trust each other (or not). ► ► I.e., agents operating on a gigantic heterogeneous distributed database. (*View by A. Halevy)

09/12/2003Peer-to-Peer Information Systems – WS 03/045 General requirements ► ► Robust infrastructure for querying   Peer data management systems. ► ► Facilitate mapping between different structures. Need tools for:   Locating relevant structures   Easily joining the semantic web. ► ► Get data into structured form   Should we worry about the legacy web?

09/12/2003Peer-to-Peer Information Systems – WS 03/046 Using views for specifying mappings ► ► Local-As-View (LAV). Data sources can be described as views over the mediated schema. ► ► Global-As-View (GAV). Mediated schema can be described as a set of views over the data sources. Mediated Schema Site B Site A Site C Mediated Schema Site BSite ASite C

09/12/2003Peer-to-Peer Information Systems – WS 03/047 Mapping ► Mapping AB specifies representation of structured data from scheme of node A into scheme of node B Mediated Schema Site B Site A Site C Mapping “AB” Mapping “BA” Mapping “BC” Mapping “CB” Mapping “C-MS” Mapping “MS-C” Mapping “A-MS” Mapping “MS-A”

09/12/2003Peer-to-Peer Information Systems – WS 03/048 Piazza: Peer Data-Management System ► Goal:  Large scale autonomous sharing of structured data ► Peer data management system (PDMS)  Autonomous Peers export data in their own schemas  Pair-wise mappings between peers  Generalization of a Data Integration system  NOT a P2P file sharing system

09/12/2003Peer-to-Peer Information Systems – WS 03/049 Relationship of PDMS to… ► P2P overlay networks (the “Structured World”) ► Data integration systems (no central logical mediated schema) ► Federated databases (scale, ad-hoc nature) ► Distributed databases (no central administration)

09/12/2003Peer-to-Peer Information Systems – WS 03/0410 Representing Data ► ► A spectrum of possibilities:   Relational tables, some integrity constraints   XML: can encode relational, hierarchical ► ► Xquery – emerging standard query language (SQL for XML)   RDF: “XML on drugs”. ► ► Sees only the logic; ignores other aspects.   DAML+OIL ► ► Full-blown Knowledge representation language. ► ► They all have semantics; just different expressive powers. ► ► We keep the data simple. Mappings between data at different peers are more complex.

09/12/2003Peer-to-Peer Information Systems – WS 03/0411 Peer Data Management ► Mappings are query expressions  DbResearcher(x)  Researcher(x),Area(x,DB)  DbResearcher(x), Office(x,DBLab) = DbLabMember(x) DB Projects MIT UW UCB Stanford Area(areaID, name, descr) Project(projID, name, sponsor) ProjArea(projID, areaID) Pubs(pubID, projName, title, venue, year) Author(pubID, author) Member(projName, member) Project(projID, name, descr) Student(studID, name, status) Faculty(facID, name, rank, office) Advisor(facID, studID) ProjMember(projID, memberID) Paper(papID, title, forum, year) Author(authorID, paperID) Area(areaID, name, descr) Project(projID, areaID, name) Pub(pubID, title, venue, year) PubAuthor(pubID, authorID) PubProj(pubID, projID) Member(memID, projID, name, pos) Alumn(name, year, thesis) Members(memID, name) Projects(projID, name, startDate) ProjFaculty(projID, facID) ProjStudents(projID, studID) … Direction(dirID, name) Project(pID, dirID, name) …

09/12/2003Peer-to-Peer Information Systems – WS 03/0412 Piazza mapping language (1) Target: pubs book* title author* name publisher* name Source: authors author* full-name publication* title pub-type {: $a IN document(“source.xml”)\ /authors/author $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” : } { $t } {: $a/full-name :} ► XML/XML Example

09/12/2003Peer-to-Peer Information Systems – WS 03/0413 Piazza mapping language (2) Target: pubs book* title author* name publisher* name Source: authors author* full-name publication* title pub-type ► piazza:id attribute {: $a IN document(“source.xml”)\ /authors/author $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” : } { $t } {: $a/full-name :}

09/12/2003Peer-to-Peer Information Systems – WS 03/0414 Piazza mapping language (3) Target: pubs book* title author* name publisher* name Source: authors author* full-name publication* title pub-type ► Partial mapping {: $a IN document(“source.xml”)\ /authors/author $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” : } PROPERTY $t >=’A’ AND $t < ‘B’ : } [: {: PROPERTY $this IN {“PrintersInc”, “PubsInc”} :} :]

09/12/2003Peer-to-Peer Information Systems – WS 03/0415 Query Answering Algorithm ► Problem  Evaluate query Q at P 1 given a network of mappings ► Reformulate the query over all relevant peers  Chaining of mappings using a combination of query composition and query rewriting ► Q P1 (x) :- DbResearcher(x)  Query Composition ► M: DbResearcher(x)  Researcher(x),Area(x,DB)  Q P2 (x)  Researcher(x),Area(x,DB)  Q P2 (x)  Researcher(x),Area(x,DB)  Query Rewriting ► M: DbResearcher(x), Office(x,DBLab) = DbLabMember(x)  Q P3 (x)  DbLabMember(x)  Q P3 (x)  DbLabMember(x)

09/12/2003Peer-to-Peer Information Systems – WS 03/0416 Query Reformulation (1) Mapping: {: $people=/S1/people :} {: $name=$people/faculty/name/text():} { $name} {: $student=$people/student/text():} { $student } {: $faculty=$people/faculty, $name=$faculty/name/text(), $advisee=$faculty/advisee/text() where $advisee=$student :} { $name } { for $faculty in /S1/people/faculty, $name in $faculty/name/text(), $advisee in $faculty/advisee/text() where $name = “Ullman” return {$advisee} } Query:

09/12/2003Peer-to-Peer Information Systems – WS 03/0417 Query Reformulation (2) { for $faculty in /S1/people/faculty, $name in $faculty/name/text(), $advisee in $faculty/advisee/text() where $name = “Ullman” return {$advisee} } Query: name advisee $name = “Ullman” {$advisee} S1 people faculty S1 people faculty name {$name} student {$student} faculty name advisee $advisee=$student {$name} Query tree pattern: Mapping tree pattern:

09/12/2003Peer-to-Peer Information Systems – WS 03/0418 Query Reformulation (3) Query: name advisee $name = “Ullman” {$advisee} S1 people faculty S1 people faculty name {$name} student {$student} faculty name advisee $advisee=$student {$name} Query tree pattern: Mapping tree pattern: { for $faculty in /S2/people/student, $advisor in $student/advisor/text(), $name in $student/name/text() where $advisor = “Ullman” return { $name } }

09/12/2003Peer-to-Peer Information Systems – WS 03/0419 Reformulation times ► Table 1: The test queries and their respective running times. QueryDescriptionReformulation time# of reformulations Q1XML-related projects.0.5 sec12 Q2 Co-authors who reviewed each other's work. 0.9 sec25 Q3 PC members with a paper at the same conference. 0.2 sec3 Q4 PC chairs of recent conferences + their projects. 0.5 sec24 Q5 Conflicts-of-interest of PC members. 0.7 sec36

09/12/2003Peer-to-Peer Information Systems – WS 03/0420 Current and the Future ► Current status  Demo scenario using XML  Looking at real domains (Bio dbs, NASA dbs) ► Future Work  More efficient reformulation algorithm  Semantic network analysis – eliminate redundant mappings and inconsistent mappings  Query caching to speed up query evaluation

09/12/2003Peer-to-Peer Information Systems – WS 03/0421 Conclusions ► ► Mapping language for mapping between sets of XML source nodes with different document structures ► ► Architecture that uses the transitive closure of mappings to answer queries ► ► Algorithm for query answering over this transitive closure of mappings, which is able to follow mappings in both forward and reverse directions

09/12/2003Peer-to-Peer Information Systems – WS 03/0422 Thank You!

09/12/2003Peer-to-Peer Information Systems – WS 03/0423 Further literature 1. Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Schema Mediation for Large-Scale Semantic Data Sharing 2. Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, Xin (Luna) Dong, Yana Kadiyska, Gerome Miklau, Peter Mork: The Piazza Peer Data Management Project 3. Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Schema Mediation in Peer Data Management Systems 4. Alon Halevy, Oren Etzioni, AnHai Doan, Zachary Ives, Jayant Madhavan, Luke McDowell, Igor Tatarinov: Crossing the Structure Chasm 5. Madhan Arumugam, Amit Sheth, and I. Budak Arpinar: Towards Peer-to-Peer Semantic Web: A Distributed Environment for Sharing Semantic Knowledge on the Web 6. Hendler J., Berners-Lee T., Miller E.: Integrating Applications on the Semantic Web