Download presentation
Presentation is loading. Please wait.
1
Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 18, 2005
2
2 Administrivia Next readings and summaries: Dong and Halevy on Personal Info Management 2 paragraph summary of the problems they focus on, key contributions From Piazza to pizza … and scheduling
3
3 Today’s Trivia Question
4
4 Our Discussion The SW as originally posed: RDF as “semantic” format Also RDFS schema format Ontologies as the standard way of defining concepts Description logics are the way most ontologies are defined (OWL language) Piazza PDMS: Relations and views Query language as mapping language Transitive closure of composition of mappings
5
5 Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility DB Projects UPennUW Stanford IIT Mumbai Data integration: 1 mediated schema, m mappings to sources Peer data management system (PDMS): n mediated “peer schemas,” as few as (n - 1) mappings between them – evaluated transitively m mappings to sources
6
6 Example Rule-Goal Tree Expansion q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w) SameProject(a1,a2,p) Author(a1,w) Author(a2,w) ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2)CoAuthor(a2,a1) S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1) q r0 r1 r3 r2 Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2) S1(a1,p,_), S1(a2,p,_), S2(a2,a1)
7
7 RDF vs. XML RDF explicitly names relationships: (book, title, “ABC”) (book, writtenBy, author) (author, name, “John Smith”) XML does not always: 1. ABC John Smith 2. ABC John Smith titlename book author writtenBy
8
8 RDF vs. XML 2 RDF is subject-neutral (a graph) XML centers around a subject (a tree): 1. ABC John Smith 2. John Smith ABC This may result in duplication of contained objects
9
9 An XML Version of the Semantic Web Data model: XML + Schema Vast volumes of data already in XML (or exported as XML) CAVEAT: not all relationships are labeled in XML (“XML has no semantics.”) Concepts: Views ≈ classes; schemas ≈ ontologies Views define membership via queries; can reason about containment CAVEAT: less expressive than OWL classes Schema mappings: target schema as query over source Sophisticated reasoning about mappings is possible by extending existing data integration techniques Can use mappings in in “forward” and “reverse” directions Allows for “chaining” of mappings to answer queries
10
10 Piazza with XML (WWW03) Goals: Build on XQuery and XML (extended with RDF-style identity, following lead of [Patel-Schneider & Simeon 02]) Remain computationally inexpensive Capture the common mapping types Directional mapping language based on templates {: $var IN document(“doc”)/path WHERE condition :} $var Translates between parts of data instances Restricted subset of XQuery that’s decidable to reason about Supports special annotations and object fusion Can map XML-XML, XML-RDF, RDF-XML (at data level)
11
11 Mapping Example between XML Schemas Target: pubs book* title author* name Source: authors author* full-name publication* title pub-type pub-type name publication author writtenBy title
12
12 Example Piazza Mapping {: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t {$t} {$an}
13
13 Challenges Query reformulation for XML is significantly harder Hierarchy, 1:n schema constraints, ability to map from values to tags, … Redundant paths Can only do ~ the XML equivalent of conjunctive queries See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details
14
14 What about Values? Thus far, we’ve focused on schema mappings Almost as important in the real world: mappings of values to values Proteins to binding sites SSNs to customer IDs etc. The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings In many cases, we only have partial transitive mappings Key idea: divide all of the mappings into partitions, each of which can compute transitive closures separately
15
15 Assessment: The Semantic Web The KB world focuses on expressively capturing concepts The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways) Do either of these seem likely to change the world? What barriers need to be removed?
16
16 From Managing the Web as a Database to Managing Databases of Databases Many common operations in: Data integration Data interchange Schema design Semantic Web Schema maintenance/evolution For instance: Creating a mediated schema Defining mappings between schemas Seeing what’s different between schemas The vision: let’s build a system to manage metadata, not data!
17
17 Metadata Management The challenges: There are lots of metadata representations Different data models; different definition types (e.g., Java classes, XML Schemas, SQL DDL, …) Many of the problems are unsolvable in the abstract e.g., schema matching But maybe we can customize tools for each task And maybe we can get user input to help We want to create a clean, composable model of operators Should be “algebraic” in some sense, with nice properties Operators need to be generic but extensible
18
18 Data vs. Metadata vs. … Data We know what this is Metadata (models) Schemas, types, classes, etc. Metamodels Things like the relational model, O-R model, … Bernstein focuses on managing models, with customization for each metamodel (and perhaps special domains)
19
19 Models A model is a set of objects with identity Objects have at least extended ER-style traits: attributes/properties is-a, has-a relationships loose associations All of these are assumed to have types
20
20 Mappings A mapping describes a correspondence between parts of two models; it may be annotated with information about computing the transformation Emp Emp# Name Address Map ee 1=1= 2≈2≈ Employee EmployeeID FirstName LastName Phone
21
21 The Basic Algebraic Operators Match Basically, schema matching: takes two models and returns a mapping between them Elementary vs. complex match; reliance on morphisms Compose Takes two mappings and composes them Diff Takes a model A, a mapping A B, and returns the part of A that’s not mapped ModelGen Takes model A, creates new model B plus mapping A B Merge Takes models A, B, mapping between them, returns the union C, plus mappings A C, B C
22
22 Model Management in Action
23
23 Schematic of Changes the new parts in S2 that need to be propagated to d2 Dest. w/o deleted items from s1 the XML version of s2
24
24 Actual Operations
25
25 What’s Hard? Match We saw that LSD is far from perfect, and it’s the best out there… Merge Can we make (A merge B) merge C = A merge (B merge C)? (Buneman, Davidson, Kosky 92) With Diff, how do we ensure a well-formed model as the result? They return a copy of the model, plus mappings showing what is actually part of the diff Composition – it isn’t always closed within the mapping language!
26
26 More Challenges What about: Semantics of the meta-model – how do we handle, e.g., constraints? What to do about approximate correspondences? Can we actually make these things generic but expressive enough to be useful? Do you think this vision is feasible?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.