Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Learning to Map between Ontologies on the Semantic Web AnHai Doan, Jayant Madhavan, Pedro Domingos, and Alon Halevy Databases and Data Mining group University.
Amit Shvarchenberg and Rafi Sayag. Based on a paper by: Robin Dhamankar, Yoonkyong Lee, AnHai Doan Department of Computer Science University of Illinois,
Efficient Query Evaluation on Probabilistic Databases
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Assuming normally distributed data! Naïve Bayes Classifier.
1 CIS607, Fall 2004 Semantic Information Integration Presentation by Julian Catchen Week 3 (Oct. 13)
Xyleme A Dynamic Warehouse for XML Data of the Web.
Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Enrico Viglino Week 3 (Oct. 12)
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Schema Matching Algorithms Phil Bernstein CSE 590sw February 2003.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5 Data mining : A Closer Look.
Learning to Map between Structured Representations of Data
State of the Art Ontology Mapping By Justin Martineau.
Pedro Domingos Joint work with AnHai Doan & Alon Levy Department of Computer Science & Engineering University of Washington Data Integration: A “Killer.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
AnHai Doan, Pedro Domingos, Alon Halevy University of Washington Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach The LSD Project.
Integration of Overlay UM for Close Domains Based on Domain Ontology Mapping Sergey Sosnovsky
AnHai Doan Pedro Domingos Alon Levy Department of Computer Science & Engineering University of Washington Learning Source Descriptions for Data Integration.
Learning Source Mappings Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems October 27, 2008 LSD Slides courtesy AnHai.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
IMAP: Discovering Complex Semantic Matches between Database Schemas Robin Dhamankar, Yoonkyong Lee, AnHai Doan University of Illinois, Urbana-Champaign.
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Knowledge based Personalization by Wonjung Kim. Outline Introduction Background – InfoQuilt system Personalization in InfoQuilt Related Work Conclusions.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
1 Context-Aware Internet Sharma Chakravarthy UT Arlington December 19, 2008.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Semantic Mappings for Data Mediation
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Multimedia Analytics Jianping Fan Department of Computer Science University of North Carolina at Charlotte.
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Developing GRID Applications GRACE Project
Of 24 lecture 11: ontology – mediation, merging & aligning.
The Role of Semantics and Terminologies in a Service-Oriented Architecture Paul Smits, Michael Lutz European Commission – DG Joint Research Centre Ispra,
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
AnHai Doan, Pedro Domingos, Alon Halevy University of Washington
Associative Query Answering via Query Feature Similarity
Property consolidation for entity browsing
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
International Marketing and Output Database Conference 2005
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Integrating Taxonomies
State of the Art Ontology Mapping
Actively Learning Ontology Matching via User Interaction
Context-Aware Internet
Presentation transcript:

Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy

Glue Identifies Mappings between websites Uses Machine Learning Uses Common Sense Knowledge Domain Constraints

Motivation Data comes from Different Ontologies Answers come from multiple web pages Manual: very tedious, error prone, not very scalable

Outline Overview of GLUE GLUE Architecture Case Studies CGLUE Case Studies Conclusion Assessment

Overview Assumes 2 Ontologies 1-1 Matching Similarity between two Concepts Computing Joint Distribution P(A,B), P(A, ~B), P(~A,B), P(~A,~B) Machine Learning Multistrategy Learning Exploiting Domain Constraints Data Instances

Overview Relaxation Labeler Similarity Estimator Meta Learner M L1L1 LkLk Taxonomy 0 1 Taxonomy 0 2 Joint Distributions Similarity function Similarity Matrix Common knowledge Domain constraints Mappings for Taxonomies …………

Distribution Estimator Meta Learner M Base Learner L 1 ………… Base Learner L k Taxonomy 0 1 Taxonomy 0 2 Joint Distributions

Distribution Estimator R DCA F E t1,t2 t3,t4 t5 t6,t7 t1,t2,t3,t4 t5,t6,t7 Trained Learner L

Distribution Estimator G H B JI s2,s3 s4 s5,s6 s1,s2,s3,s4 s5,s6 L s1

Distribution Estimator s1,s3 s5s6 s2,s4

Multistrategy Learning Base Learners Content Learner Frequency Naïve Bayes Name Learner Full Name Specific and Descriptive Element MetaLearner

Combines the base learners Gives learner weight User Input

Joint Distributions Similarity function Similarity Estimator Similarity Matrix Similarity Estimator

Applies Function From User Jaccard-sim Outputs a matrix between concepts

Where are we? Find Similarities Compute Similarities Satisfy Constraints

Relaxation Labeler Similarity Matrix Common knowledge Domain constraints Mappings for Taxonomies

Constraints Domain-Independent General Knowledge Domain-Dependent Interaction between two nodes Model each as a feature f()

Domain Independent

Relaxation Labeler Searches for best mapping given constraints Labels are influenced by it “neighborhood” Performs local optimization

Local Optimization 1. Assigns initial labels 2. Performs Optimization 3. Uses a formula to change a label 4. Repeat 2-3

Local Optimization Node in taxonomy O 1 Label in taxonomy O 2 Everything we know Other label assignments to all Nodes besides X

Local Optimization

Where are we? Relaxation Labeler Similarity Estimator Meta Learner M L1L1 LkLk Taxonomy 0 1 Taxonomy 0 2 Joint Distributions Similarity function Similarity Matrix Common knowledge Domain constraints Mappings for Taxonomies …………

Case Study University Catalogs Business Profiles For Each one Entire set of data instances Cleaned it up

Results

Improvements Insufficient Training Data Local Optimization Additional Base Learners Ambiguous Best Match

CGLUE

Beam Search Uses structure and data No relaxation labeling (no constraints)

CGLUE Case Study

Improvements Incorporate Domain Constraints Object Identification

Conclusion Semantic Similarity Multistategy Learning Relaxation Labeling CGLUE

Assessment Data Instances Additional Sites? CGLUE Future Work