Download presentation
Presentation is loading. Please wait.
1
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University Funded by NSF
2
Information Exchange SourceTarget Information Extraction Schema Matching Leverage this … … to do this
3
Presentation Outline Overview Matching (Direct) Matching (Derived) Matching Algorithm Summary
5
Requirements 1.f is an injective function. 2.f maps obj. sets to obj. sets and rel. sets to rel. sets 3.f respects rel-set arities. 4.f respects referential integrity. 5.f respects types. 6.f respects real-world identity. 7.f ’s coercions are G/S compatible. 8.f respects subset constraints. 9.f respects mutual-exclusion constraints. 10.f respects union constraints
6
User Interaction (IDS Statements) Issue –Explains the issue –Example: units, may need transformation Default –Explains the default option –Example: if no transformation, no conversion Suggestion –Gives a suggestion about how to resolve the issue –Example: if needed, specify the conversion
7
Theorem Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …
8
Target (Graphical View)
9
Target (Textual View)
10
Source Example (Assumed to be Populated)
11
Matching (Direct) Object Sets Relationship Sets
12
Object-Set Type Compatibility 1.type(a) = type(b) 2.type(a) type(b) 3.type(a) type(b) 4.type(a) type(b)
13
type(a) = type(b) Same type –string = string, but Airport Head Of State –Need better matching techniques Same type, different units –Size Nr Sq Km –Need unit conversion Same type, different format –Date Date, but 01/02/2002 Jan 2, 2002 –Need format conversion Same type, same units and format, different assumptions –Altitude Altitude, but altitude of aircraft and spacecraft differ –Need same assumptions Same type, same units and format, same assumption, OIDs
14
type(a) type(b) and type(a) type(b) Real Integer or Video Image –Target has greater discriminating power –Can add.0 or make a video of a single image (?) Integer Real or Image Video –Source has greater discriminating power –Can round off or select one of the frames (?)
15
type(a) type(b) Image String –Mismatch, even if same attribute (e.g. both City) –Types can help discard potential matches String(5) Integer –But suppose the integer is 2 –Might work, but is “2.000” ok?
16
Relationship Match Requirements Referential integrity Constraints –Cardinality –Mandatory/Optional
17
Referential Integrity a b a’ b’ TargetSource... a’’ The types of a, a’, and a’’ can all be different, but not arbitrary. Example: a (String), a’ (Integer), a’’ (Real).
18
Relationship-Set Constraint Compatibility 1.constr(a) constr(b) 2.(constr(a) constr(b)) 3.(constr(a) constr(b)) 4.(constr(a) constr(b))
19
constr(a) constr(b) Person Car owns drives o o o o Person Car ? o o Need more information to resolve: Perhaps “?” is “purchased.”
20
(constr(a) constr(b)) City City Map City City Map ab The target (a) expects many maps, but the source can’t supply them.
21
(constr(a) constr(b)) City City Map City City Map ab The target (a) expects one map, but the source can supply many.
22
(constr(a) constr(b)) City City Map City City Map ab The target (a) expects at least one and potentially many maps, but the source may have none or at most one. o
23
Matching (Derived) Generalization/Specialization Composite Values Derived Relationship Sets Displayable/Nondisplayable Object Sets
24
Generalization/Specialization For a target object set, a source object set may: –have no overlap (just ignore) –have a proper subset (accept or find missing generalization) –have the same values (direct match) –have a proper superset (hard, except for roles) –overlap (like proper subset and proper superset) Consider roles and missing generalizations
25
Roles target: source: City Travel Video CityClip: Video o o o o Video With City Scene Video With City Scene
26
Missing Generalization targetsource City MapCountry MapCity Map: ImageCountry Map: Image Map: Image
27
Composite Values Composite in Source (split) Composite in Target (merge) Examples of Derived Relationships
28
Composite in Source Video Nr HoursNr Minutes Video Time Nr HoursNr Minutes targetsource Note also that we generated a source path.
29
Composite in Source Video Nr HoursNr Minutes Video Nr HoursNr Minutes targetsource
30
Composite in Target Video Nr HoursNr Minutes target Video Time source Time
31
Composite in Target Video target Video Time source Time
32
Displayable/Nondisplayable Object-Set Matches Nondisplayable in Source: find a key Nondisplayable in Target: create a key
33
Nondisplayable in Source targetsource Airport No Key: Discard Match City Airline flys to serves
34
Nondisplayable in Source targetsource Airport No Key: Discard Match City Airline flys to serves
35
Nondisplayable in Source targetsource Airport One Key: Choose it City Airline flys to serves Airport Name
36
Nondisplayable in Source targetsource Airport One Key: Choose it City Airline flys to serves Airport Name
37
Nondisplayable in Source targetsource Airport Two or more Keys: Choose One City Airline flys to serves Airport Name Airport Code
38
Nondisplayable in Source targetsource Airport Two or more Keys: Choose One City Airline flys to serves Airport Name Airport Code
39
Matching Algorithm
41
Sample Match Table
42
Pictorial View of Match Table target source
43
Summary
44
Concluding Remarks QED (the theorem holds) Let f be the generated mapping from target t to source s, populated such that s has a valid interpretation. Let t’ be the submodel of t populated from s by f. Then t’ has a valid interpretation. Proof: the paper is the proof …
45
Pictorial View of Match Table t = target s = source f = the mapping t’ has a valid interpretation t’ = submodel
46
Concluding Remarks QED (the theorem holds) Merge (several sources) –All sources extracted to same view –Union merge Object identity problems Constraint problems Source Modeling (convert to OSM) Framework defined, but not implemented
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.