A Graph-Based Approach to Learn Semantic Descriptions of Data Sources

Slides:



Advertisements
Similar presentations
Mapping Existing Data Sources into VIVO Pedro Szekely, Craig Knoblock, Maria Muslea and Shubham Gupta University of Southern California/ISI.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning Ontologies from RDF Annotations Alexandre Delteil, Catherine Faron-Zucker, Rose Dieng ACACIA project, INRIA, 2004 Sophia Antipolis, France.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.
Semantic Location Based Services for Smart Spaces Kostas Kolomvatsos, Vassilis Papataxiarhis, Vassileios Tsetsos P ervasive C omputing R esearch G roup.
Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Semi-Automatic Generation of Mini-Ontologies from Canonicalized Relational Tables Chris Hathaway.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Querying Structured Text in an XML Database By Xuemei Luo.
Dimitrios Skoutas Alkis Simitsis
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Artificial Intelligence Lecture No. 6 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Corpus Exploitation from Wikipedia for Ontology Construction Gaoying Cui, Qin Lu, Wenjie Li, Yirong Chen The Department of Computing The Hong Kong Polytechnic.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
1 Intelligent Information System Lab., Department of Computer and Information Science, Korea University Semantic Social Network Analysis Kyunglag Kwon.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
A Mixed-Initiative System for Building Mixed-Initiative Systems Craig A. Knoblock, Pedro Szekely, and Rattapoom Tuchinda Information Science Institute.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
COP Introduction to Database Structures
Neighborhood - based Tag Prediction
Machine Learning overview Chapter 18, 21
System for Semi-automatic ontology construction
CS 9633 Machine Learning Concept Learning
Web Ontology Language for Service (OWL-S)
Associative Query Answering via Query Feature Similarity
Social Knowledge Mining
Ontology Partition for Browsing
Result of Ontology Alignment with RiMOM at OAEI’06
Presented by: Prof. Ali Jaoua
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Block Matching for Ontologies
Automated Software Integration
3. Brute Force Selection sort Brute-Force string matching
Ying Dai Faculty of software and information science,
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Information Networks: State of the Art
3. Brute Force Selection sort Brute-Force string matching
Actively Learning Ontology Matching via User Interaction
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

A Graph-Based Approach to Learn Semantic Descriptions of Data Sources Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jos′e Luis Ambite University of Southern California Information Sciences Institute and Department of Computer Science

Outline Introduction An example Problem formulation Learning Semantic Descriptions Evaluation Conclusion

Introduction Data source integration Source descriptions building a global model constructing source descriptions that specify mappings between the sources and the global model Source descriptions Global-as-view or local-as-view descriptions Semantic model Building a source description of a source: characterize a source in terms of the concepts and properties in the domain ontology. determine the semantic types determines the relationships between attributes in terms of properties in the ontology.

In this work… Data sources in the same domain usually provide similar or overlapping data. It should be possible to exploit knowledge of previously modeled sources to learn descriptions for new sources. We can leverage the knowledge of previously described sources to limit the search space and get some hints to hypothesize more plausible candidates.

An example s1 = personalInfo(name, birthdate, city, state, workplace) s2 = getCities(state, city) s3 = businessInfo(company, ceo, city, state) s4 = getEmployees(employer, employee) s5 = postalCodeLookup(zipcode, city, state)

s1 = personalInfo(name, birthdate, city, state, workplace) s2 = getCities(state, city) s3 = businessInfo(company, ceo, city, state) s4 = getEmployees(employer, employee) s5 = postalCodeLookup(zipcode, city, state)

Problem formulation A source s(a1, · · · , an): Attributes(s) A semantic model m: class nodes and data nodes attribute mapping function φ :Attributes(s)→Nodes(m) A source description is a triple (s, m, φ) Input: a domain ontology O, S = {(s1,m1, φ1) · · · , (sk,mk, φk)} a set of source descriptions, a new source ˆs Output: ˆm ,ˆφ , (ˆs, ˆ m, ˆφ) is an appropriate source description. s1 = personalInfo(name, birthdate, city, state, workplace)

Learning Semantic Descriptions Building a Graph from Known Semantic Models Semantic Labeling of Source Attributes Generating Candidate Models Ranking Source Descriptions

Building a Graph from Known Semantic Models

add the known semantic models to the graph. traverse the ontology O to find the classes that do not map to any node in the graph but are connected to them through a path in the ontology use the properties defined in O to join the disconnected components assign weights

Semantic Labeling of Source Attributes Label the source attributes with semantic types. use a class as a semantic type for attributes whose values are URIs for instances of a class use a data property/domain pair as a semantic type for attributes containing literal values. e.g. employer:<Organization, name> a supervised machine learning technique Attributes(s4) = {employer, employee} Labels(s4) = {<Organization, name>, <Person, name>}.

Generating Candidate Models Mapping semantic types to the nodes of the graph Good patterns: popular, coherent Compute the minimal tree that connects those nodes a blocking method eliminate the mappings different patterns: high weight

Ranking Source Descriptions Candidates: (ˆs, ˆ m, ˆφ) Cost: the sum of the link weights, e∈ ˆm weight(e). Coherence I = (<x1, y1>, · · · ,<xn, yn>), xi: the size of a group of links sharing a model identifier, yi: is the number of model identifiers shared I = {3, 1}, I = {3, 1} I = {3, 2}, I = {3, 1} zi: (xi >xj) z1 > z2; if (xi >xj) ∨ (xi = xj ∧ yi > yj)]

Evaluation 17 data sources containing overlapping data Gold standard: created source descriptions for them manually using the DBpedia, FOAF, GeoNames, and WGS84 ontologies. Learning for each data sources Training data: 16 other data sourcess Measure: graph edit distance (GED) between the top ranked description and the manually created one (given the correct semantic type for each attribute) .

Compare with the results of Karma (a data integration tool that allows users to semi-automatically create source descriptions for sources and services).

Conclusion We presented a novel approach to automatically learn the semantic description of a new source given a set of known semantic descriptions as the training set and the domain ontology as the background knowledge. These precise descriptions of data sources makes it possible to automatically integrate the data across sources and provides rich support for source discovery. we plan to investigate the idea of creating a more compact graph by consolidating the overlapping segments of the known semantic models.

Thanks.