Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman.

Slides:



Advertisements
Similar presentations
Methods of Enumeration Counting tools can be very important in probability … particularly if you have a finite sample space with equally likely outcomes.
Advertisements

IMAP: Discovering Complex Semantic Matches Between Database Schemas Ohad Edry January 2009 Seminar in Databases.
Arc-length computation and arc-length parameterization
Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.
Ontologies - Design principles Cartic Ramakrishnan LSDIS Lab University of Georgia.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
USC Graduate Student DayColumbia, SCMarch 2006 Presented by: Jingshan Huang Computer Science & Engineering Department University of South Carolina PhD.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Affinity-based Schema Matching Silvana Castano Università di Milano D2I –– Modena, 27 aprile 2001.
Schema Matching Helen Chen CS652 Project 3 06/14/2002.
Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.
BYU 2003BYU Data Extraction Group Combining the Best of Global-as-View and Local-as-View for Data Integration Li Xu Brigham Young University Funded by.
Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
BYU 2003BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
From OSM-L to JAVA Cui Tao Yihong Ding. Overview of OSM.
DASFAA 2003BYU Data Extraction Group Discovering Direct and Indirect Matches for Schema Elements Li Xu and David W. Embley Brigham Young University Funded.
UFMG, June 2002BYU Data Extraction Group Automating Schema Matching for Data Integration David W. Embley Brigham Young University Funded by NSF.
A Probabilistic Classifier for Table Visual Analysis William Silversmith TANGO Research Project NSF Grant # and Greetings Prof. Embley!
Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
Scheme Matching and Data Extraction over HTML Tables from Heterogeneous Sources Cui Tao March, 2002 Founded by NSF.
Discovering Direct and Indirect Matches for Schema Elements Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.
BYU Data Extraction Group Automating Schema Matching David W. Embley, Cui Tao, Li Xu Brigham Young University Funded by NSF.
1 A Tool to Support Ontology Creation Based on Incremental Mini-ontology Merging Zonghui Lian.
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
fleckvelter gonsity (ld/gg) hepth (gd) burlam falder multon repeat: 1.understand table 2.generate mini-ontology 3.match with growing.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
EXAMPLE 1 Apply the distributive property
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
Presented to: By: Date: Federal Aviation Administration Enterprise Information Management SOA Brown Bag #2 Sam Ceccola – SOA Architect November 17, 2010.
Predicting Highly Connected Proteins in PIN using QSAR Art Cherkasov Apr 14, 2011 UBC / VGH THE UNIVERSITY OF BRITISH COLUMBIA.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Ontology Matching Basics Ontology Matching by Jerome Euzenat and Pavel Shvaiko Parts I and II 11/6/2012Ontology Matching Basics - PL, CS 6521.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
IARPA-BAA Question Period: 22 Dec 09 – 2 Feb 10 Proposal Due Date: 16 Feb 10.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
A geo-ontology to support the semantic integration of geoinformation from the National Spatial Data Infrastructure Authors: Paulo J. A. Gimenez, Mastering.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Revelytix SICoP Presentation DRM 3.0 with WordNet Senses in a Semantic Wiki Michael Lang February 6, 2007.
Describing Data Using Numerical Measures. Topics.
Exploitation of Structural Similarity in Semi-Structured Bioinformatics Data for Efficient Storage Construction Dongkyoo Shin Sejong.
A Declarative Similarity Framework for Knowledge Intensive CBR by Díaz-Agudo and González-Calero Presented by Ida Sofie G Stenerud 25.October 2006.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
A Hybrid Match Algorithm for XML Schemas Ray Dos Santos Aug 21, 2009 K. Claypool, V. Hegde, N. Tansalarak UMass – Lowell - ICDE ‘06.
Data Modelling and Cleaning CMPT 455/826 - Week 8, Day 2 Sept-Dec 2009 – w8d21.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.
1.2 Data Classification Qualitative Data consist of attributes, labels, or non-numerical entries. – Examples are bigger, color, names, etc. Quantitative.
Tuning using Synthetic Workload Summary & Future Work Experimental Results Schema Matching Systems Tuning Schema Matching Systems Formalization of Tuning.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Statistical Schema Matching across Web Query Interfaces
The number of times a number or expression (called a base) is used as a factor of repeated multiplication. Also called the power. 5⁴= 5 X 5 X 5 X 5 = 625.
2-4 The Distributive Property
METADATA SERVICES’ NEW CAPACITIES
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Automating Schema Matching for Data Integration
INSTRUCTOR: MRS T.G. ZHOU
Presentation transcript:

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration Li Xu David W. Embley David Jackman

Background Problem: Attribute matching Techniques Data values Data-dictionary information Structural properties Ontologies Terminological relationships

Approach Target Schema T Source Schema S Framework Individual Facet Matching; Combining Multiple Facets; Iteration.

Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles

Individual Facet Matching Terminological relationships Data value characteristics Target-specific, regular-expression matches

Terminological Relationships Names of Attributes T : A S : B WordNet C4.5 Decision Tree Feature selection f0: Same word f1: Synonym f2: Sum of the distances of A and B to a common hypernym root f3: Number of different common hypernym roots of A and B f4: Sum of the number of senses of A and B

WordNet Rule The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

WordNet Confidences

Data-Value Characteristics C4.5 Decision Tree Features [LC94] Numeric data Mean, variation, coefficient variation, standard deviation; Alphanumeric data String length, numeric ratio, space ratio.

Value-Characteristics Confidences

Expected Data Values Target Schema T Data frame Source Schema S Data instances Hit Ratio = N’/N (A, B) N’ : number of B data instances consistent with specifications of A data frame; N: number of B data instances.

Expected-Values Confidences

Combined Confidences Threshold:

Final Confidences

Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F % F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%

Future Work Additional facets of metadata More sophisticated combinations Additional application domains Automating feature selection

Questions?