Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Graph-Based Approach to Learn Semantic Descriptions of Data Sources

Similar presentations


Presentation on theme: "A Graph-Based Approach to Learn Semantic Descriptions of Data Sources"— Presentation transcript:

1 A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jos′e Luis Ambite University of Southern California Information Sciences Institute and Department of Computer Science

2 Outline Introduction An example Problem formulation
Learning Semantic Descriptions Evaluation Conclusion

3 Introduction Data source integration Source descriptions
building a global model constructing source descriptions that specify mappings between the sources and the global model Source descriptions Global-as-view or local-as-view descriptions Semantic model Building a source description of a source: characterize a source in terms of the concepts and properties in the domain ontology. determine the semantic types determines the relationships between attributes in terms of properties in the ontology.

4 In this work… Data sources in the same domain usually provide similar or overlapping data. It should be possible to exploit knowledge of previously modeled sources to learn descriptions for new sources. We can leverage the knowledge of previously described sources to limit the search space and get some hints to hypothesize more plausible candidates.

5 An example s1 = personalInfo(name, birthdate, city, state, workplace)
s2 = getCities(state, city) s3 = businessInfo(company, ceo, city, state) s4 = getEmployees(employer, employee) s5 = postalCodeLookup(zipcode, city, state)

6 s1 = personalInfo(name, birthdate, city, state, workplace)
s2 = getCities(state, city) s3 = businessInfo(company, ceo, city, state) s4 = getEmployees(employer, employee) s5 = postalCodeLookup(zipcode, city, state)

7 Problem formulation A source s(a1, · · · , an): Attributes(s)
A semantic model m: class nodes and data nodes attribute mapping function φ :Attributes(s)→Nodes(m) A source description is a triple (s, m, φ) Input: a domain ontology O, S = {(s1,m1, φ1) · · · , (sk,mk, φk)} a set of source descriptions, a new source ˆs Output: ˆm ,ˆφ , (ˆs, ˆ m, ˆφ) is an appropriate source description. s1 = personalInfo(name, birthdate, city, state, workplace)

8 Learning Semantic Descriptions
Building a Graph from Known Semantic Models Semantic Labeling of Source Attributes Generating Candidate Models Ranking Source Descriptions

9 Building a Graph from Known Semantic Models

10 add the known semantic models to the graph.
traverse the ontology O to find the classes that do not map to any node in the graph but are connected to them through a path in the ontology use the properties defined in O to join the disconnected components assign weights

11 Semantic Labeling of Source Attributes
Label the source attributes with semantic types. use a class as a semantic type for attributes whose values are URIs for instances of a class use a data property/domain pair as a semantic type for attributes containing literal values. e.g. employer:<Organization, name> a supervised machine learning technique Attributes(s4) = {employer, employee} Labels(s4) = {<Organization, name>, <Person, name>}.

12 Generating Candidate Models
Mapping semantic types to the nodes of the graph Good patterns: popular, coherent Compute the minimal tree that connects those nodes a blocking method eliminate the mappings different patterns: high weight

13 Ranking Source Descriptions
Candidates: (ˆs, ˆ m, ˆφ) Cost: the sum of the link weights, e∈ ˆm weight(e). Coherence I = (<x1, y1>, · · · ,<xn, yn>), xi: the size of a group of links sharing a model identifier, yi: is the number of model identifiers shared I = {3, 1}, I = {3, 1} I = {3, 2}, I = {3, 1} zi: (xi >xj) z1 > z2; if (xi >xj) ∨ (xi = xj ∧ yi > yj)]

14 Evaluation 17 data sources containing overlapping data Gold standard:
created source descriptions for them manually using the DBpedia, FOAF, GeoNames, and WGS84 ontologies. Learning for each data sources Training data: 16 other data sourcess Measure: graph edit distance (GED) between the top ranked description and the manually created one (given the correct semantic type for each attribute) .

15 Compare with the results of Karma (a data integration tool that allows users to semi-automatically create source descriptions for sources and services).

16 Conclusion We presented a novel approach to automatically learn the semantic description of a new source given a set of known semantic descriptions as the training set and the domain ontology as the background knowledge. These precise descriptions of data sources makes it possible to automatically integrate the data across sources and provides rich support for source discovery. we plan to investigate the idea of creating a more compact graph by consolidating the overlapping segments of the known semantic models.

17 Thanks.


Download ppt "A Graph-Based Approach to Learn Semantic Descriptions of Data Sources"

Similar presentations


Ads by Google