Download presentation
Presentation is loading. Please wait.
1
A S EMANTIC A PPROACH TO D ISCOVERING S CHEMA M APPING Yuan An, Alex Borgida, Renee J. Miller, and John Mylopoulos Presented by: Kristine Monteith
2
O VERVIEW Goal of the paper: Matching schemas with more than just simple element correspondence (e.g. Can we improve on a naïve mapping?)
3
OVERVIEW Approach: Derive a conceptual model for the semantics in a table and match the conceptual model in the source schema to the conceptual model in the target schema e.g. Can we figure out that a source schema like this: can match a target schema like this: hasBookSoldAt(aname,sid)
4
E XAMPLE 1
5
B ASELINE SOLUTION : R EFERENTIAL I NTEGRITY CONSTRAINTS Find correspondences v1: connect person.pname to hasBookAt.aname v2: connect bookstore.sid and hasBookSoldAt.sid Create logical relations using referential constraints S1: person(pname) |X| writes(pname, bid) |X| book(bid) S2: book(bid) |X| soldAt(bid,sid) |X| bookstore(sid) S3: person(name) S4: bookstore(sid) Look at target T1: hasBookSoldAt(aname,sid) Look at each pair of source and target relations and check to see which are “covered”
6
A SK THE USER ABOUT THE FOLLOWING : Doesn’t present an entire tuple to match the target query: hasBookSoldAt(aname,sid)
7
W HAT THIS PAPER SEEKS TO ACCOMPLISH : Generate the following: compose “writes” and “soldAt” to produce a new semantic connection between “person” and “bookstore”
8
A PPROACH : R EPRESENTING S EMANTICS OF S CHEMAS Create a Conceptual Model (CM) graph Create nodes for classes and attributes Create directed edges for relationships and inverses C1 ---ISA--- C2subclasses C ---p--- Drelationships C ---p->-- Dfunctional relationships o Duplicate concept nodes to represent recursive relationships
9
G ENERATING M APPING C ANDIDATES Problem description Inputs: A source relational schema S and a target relational schema T A concept model (G S and G T respectively) associated with each relational schema via table semantic mappings A set of correspondences L linking a set L(S) of columns in S to a set L(T) of columns in T Goal: A pair of expressions which are “semantically similar” in terms of modeling the subject matter
10
M ARKED N ODES The set L(S) of columns gives rise to a set C S of marked class nodes in the graph G S Likewise, the set L(T) gives rise to a set C T of marked class nodes in the graph G T
11
B ASIC A LGORITHM Create conceptual subgraphs find a subgraph D 1 connecting concept nodes in C S, and a subgraph D 2 connecting concept nodes in C T such that D 1 and D 2 are “semantically similar Suggest possible mapping candidates translate D 1 and D 2 into algebraic expressions E 1 and E 2 and return the triple as a mapping candidate
12
C REATING CONCEPTUAL SUBGRAPHS Notice simple matches a node v in C S corresponds to a node u in C T when v and u have attributes that are associated with corresponding columns via the table semantics More complicated rules The connections (v 1,v 2 ) and (u 1,u 2 ) should be “semantically similar” or at least “compatible” (cardinality constraints, relationships like “is-a” or “part of”) Use edges from pre-selected trees Represent “intuitively meaningful” concepts Favor smaller trees (Occam’s razor) Other considerations Favor lossless joins Reject contradictions
13
E XAMPLE Looking for a functional tree with a root corresponding to the anchor Proj
14
E XAMPLE Notice simple matches Find a tree with minimal cost (edges in pre-selected trees don’t contribute to cost) Find a tree containing the most number of edges in the pre-selected trees Project ---controlledBy->-- Department --hasManager->-- Employee
15
M ORE COMPLICATED E XAMPLE Same Answer: Project ---controlledBy->-- Department --hasManager->-- Employee Still looking for low-cost, minimal trees to connect Employee to Project
16
D EALING WITH N - ARY R ELATIONS StoreSells(Person, Product)
17
C ONSIDERATIONS FOR R EIFIED R ELATIONSHIPS A path of length 2 passing through a reified relationship node should be considered to be length 1 The semantic category of a target tree rooted at a reified relationship induces preferences for similarly rooted (minimal) functional trees in the source (cardinality restrictions, number of roles, subclass relationship to top level ontology concept)
18
O BTAINING R ELATIONAL E XPRESSIONS
19
E XPERIMENTAL R ESULTS
20
A VERAGE PRECISION
21
A VERAGE RECALL
22
C ONCLUSIONS Semantic approach performs at least as well as the RIC-based approach on datasets studied These approaches made significant improvements in some cases Many of the datasets did not have complicated schema; a semantic approach didn’t provide as much benefit in those cases
23
S TRENGTHS /W EAKNESSES Strengths Lots of examples Provides a useful solution to a common problem Weaknesses Formalism sometimes made things more complicated rather than more clear Assumes a lot of background knowledge
24
F UTURE W ORK Embed this functionality into pre-existing mapping tools (they suggest Clio since a lot of their work is based off of this) Add negation to semantic representation Investigate more complex semantic mappings
25
Q UESTIONS ???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.