Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou.

Slides:



Advertisements
Similar presentations
IMAP: Discovering Complex Semantic Matches Between Database Schemas Ohad Edry January 2009 Seminar in Databases.
Advertisements

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Nadia Andreani Dwiyono DESIGN AND MAKE OF DATA MINING MARKET BASKET ANALYSIS APLICATION AT DE JOGLO RESTAURANT.
Konstanz, Jens Gerken ZuiScat An Overview of data quality problems and data cleaning solution approaches Data Cleaning Seminarvortrag: Digital.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Aki Hecht Seminar in Databases (236826) January 2009
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dayi Zhou Week 4 (Oct. 19)
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Article by: Farshad Hakimpour, Andreas Geppert Article Summary by Mark Vickers.
Finding Hidden Correlations and Filtering out Incorrect Matchings with Compatibility Detection across Web Query Interfaces Lei Lei June 11, 2004 June 11,
1 Statistical Schema Matching across Web Query Interfaces Bin He , Kevin Chen-Chuan Chang SIGMOD 2003.
Research Project Mining Negative Rules in Large Databases using GRD.
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Business Intelligence Instructor: Bajuna Salehe Web:
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
DECISION SUPPORT SYSTEM ARCHITECTURE: The data management component.
Automatic Schema Matching Seminar on Databases and the Internet Yaron Naveh January 2006.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Light-weight Domain-based Form Assistant: Querying Web Databases On The Fly Authors:Z. Zhang, B. He, K. C.-C. Chang (Univ. of Illinois at Urbana-Champaign)
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Methodology – Monitoring and Tuning the Operational System.
A Scalable Pattern Mining Approach to Web Graph Compression with Communities Greg Buehrer and Kumar Chellapilla Microsoft Live Labs.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Using Lexical Knowledge to Evaluate the Novelty of Rules Mined from Text Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, Joydeep Ghosh Presented.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
Statistical Schema Matching across Web Query Interfaces
Kyriaki Dimitriadou, Brandeis University
Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona
CSc4730/6730 Scientific Visualization
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
CSE 635 Multimedia Information Retrieval
Stratified Sampling for Data Mining on the Deep Web
Data Mining: Introduction
Toward Large Scale Integration
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou 10/2005

DCM Problem: schema matching Complex matchings across different deep web data sources Most existing techniques focus on 1:1 matching Solution: a correlation mining approach Motivation grouping attributes -> co-present E.g. first name, last name Synonym-> Negative correlated E.g. Departing, from Considering both positive correlation and negative correlation Instead of matching 2 schema at a time, matching all the schemas at the same time

Formal Schema Matching Problem Note: A schema is viewed as a transaction, which is a set of items.

DCM framework Automatic data preparation Correlation mining

Correlation Measure Contingency table Co-presence, co-absence, only one present

Correlation Measure (cont.) The sparseness problem High co-absence Rare Attribute Problem False negatively correlation H-measure Frequent Attribute Problem False positive correlation

Matching Discovery Measure correlations between two groups C min Minimal value of pairwise correlation measurement Positively(negative) correlated Measure for positive(negative) correlation The C min is greater than some threshhold

Matching Selection Rank the discovered matchings Maximal measurement value -> rank Top-k to break the tie If still tie, choose the one with richer semantic information

Data Preparation Form extraction Type recognition Syntactic merging Name-based merging Domain-based merging

Experiments Database TEL-8: 447 deep web sources in 8 domain BAMM: 211 deep web sources in 4 domain Metrics Target accuracy Target question Given any attribute, find its synonyms, hyponyms, and hypernyms

Target Accuracy

Comparing H-measurement and Jaccard