Finding Hidden Correlations and Filtering out Incorrect Matchings with Compatibility Detection across Web Query Interfaces Lei Lei June 11, 2004 June 11,

Slides:



Advertisements
Similar presentations
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Kien A. Hua Division of Computer Science University of Central Florida.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.
CSC321: Neural Networks Lecture 3: Perceptrons
Object Fusion in Geographic Information Systems Catriel Beeri, Yaron Kanza, Eliyahu Safra, Yehoshua Sagiv Hebrew University Jerusalem Israel.
Evaluating Search Engine
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Search Engines and Information Retrieval
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dayi Zhou Week 4 (Oct. 19)
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou.
Highly Dynamic Destination- Sequenced Distance-Vector Routing (DSDV) for Mobile Computers C. E. Perkins & P. Bhagwat Presented by Paul Ampadu.
Partitioning Search-Engine Returned Citations for Proper-Noun Queries Reema Al-Kamha.
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
1 Statistical Schema Matching across Web Query Interfaces Bin He , Kevin Chen-Chuan Chang SIGMOD 2003.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Graphing a Linear Inequality Graphing a linear inequality is very similar to graphing a linear equation.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Mental Math Computation. Multiply Mentally 84 × 25 What strategy did you use? Why did we choose 84 × 25 instead of 85 × 25?
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Sensys 2009 Speaker:Lawrence.  Introduction  Overview & Challenges  Algorithm  Travel Time Estimation  Evaluation  Conclusion.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Search Engines and Information Retrieval Chapter 1.
CAD/Graphics 2013, Hong Kong Footpoint distance as a measure of distance computation between curves and surfaces Bharath Ram Sundar*, Abhijit Chunduru*,
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
FINDING NEAR DUPLICATE WEB PAGES: A LARGE- SCALE EVALUATION OF ALGORITHMS - Monika Henzinger Speaker Ketan Akade 1.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
VLDB Demo WISE-Integrator: A System for Extracting and Integrating Complex Web Search Interfaces of the Deep Web Hai He, Weiyi Meng, Clement Yu, Zonghuan.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2014 Professor Brandon A. Jones Lecture 26: Singular Value Decomposition.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Objectives Objectives Recommendz: A Multi-feature Recommendation System Matthew Garden, Gregory Dudek, Center for Intelligent Machines, McGill University.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
SQL Based Knowledge Representation And Knowledge Editor UMAIR ABDULLAH AFTAB AHMED MOHAMMAD JAMIL SAWAR (Presented by Lei Jiang)
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CMA Coastline Matching Algorithm SSIP’99 - Project 10 Team H.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.
Sampath Jayarathna Cal Poly Pomona
Statistical Schema Matching across Web Query Interfaces
Linguistic Graph Similarity for News Sentence Searching
Updating SF-Tree Speaker: Ho Wai Shing.
Cross-Ontological Relationships
Web News Sentence Searching Using Linguistic Graph Similarity
MIS 451 Building Business Intelligence Systems
The STP Model for Solving Imprecise Problems
Vocabulary for the CPR Multipurpose national standard
Progress Report Meng-Ting Zhong 2015/9/10.
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Graphing a Linear Inequality
Toward Large Scale Integration
Lecture 6: Feature matching
Presentation transcript:

Finding Hidden Correlations and Filtering out Incorrect Matchings with Compatibility Detection across Web Query Interfaces Lei Lei June 11, 2004 June 11, 2004

Introduction Deep Web scales rapidly Deep Web scales rapidly Proliferating sources with structured Info. Proliferating sources with structured Info. Vocabulary Converge to small size Vocabulary Converge to small size Dynamic Queries instead of URLs Dynamic Queries instead of URLs

Complex Matching Traditional methods focus on 1:1 matching Traditional methods focus on 1:1 matching Query shemas form Complex Matchings Query shemas form Complex Matchings M:n M:n

Web Query Interfaces Web Query Interfaces Web Query Interfaces Attribute Group

Problems to solve Relations are complicated and multi-ary Relations are complicated and multi-ary How to Judge the Relations of Synonyms? How to Judge the Relations of Synonyms? How to pick out incorrect matchings? How to pick out incorrect matchings?

Statement Find out the hidden synonyms and build correlations to solve m:n matching problem Find out the hidden synonyms and build correlations to solve m:n matching problem Filter out false matchings and partially incorrect ones with the three step “ compatibility detection ”. Filter out false matchings and partially incorrect ones with the three step “ compatibility detection ”.

MGSsd and Improved Model Original Hidden Model from MGSsd Original Hidden Model from MGSsd

Find Hidden Synonyms Assume existence of hidden synonyms Assume existence of hidden synonyms Correlations between synonyms Correlations between synonyms Function: HC(bi,bj) Function: HC(bi,bj) Apply HC directly Apply HC directly

Example Synonyms on air booking domain Synonyms on air booking domain Set a Threshold Set a Threshold HC (b2,b4)

Compatibility Detection Not all raw matching are correct Not all raw matching are correct Clean partially correct or inaccurate ones Clean partially correct or inaccurate ones Three Steps: Three Steps: Transitivity Check Transitivity Check Examine Confidence Examine Confidence Subsumption Subsumption

Compatibility Detection(Cont.) Raw Matching Results Raw Matching Results 1.Check Transitivity 1.Check Transitivity 2. Choose Confidence 3. Subsumption

Evaluation Using Recall and Precision Using Recall and Precision Compare with MSGsd data Compare with MSGsd data Perform Correlation and Compatibility on matching results from other researches Perform Correlation and Compatibility on matching results from other researches

Contributions m:n mapping rather than only 1:1 mapping m:n mapping rather than only 1:1 mapping Present a hidden synonym approach to statistically compute the correlation between synonym groups Present a hidden synonym approach to statistically compute the correlation between synonym groups Develop the “ Compatibility Detection ” approach to refine the raw mapping data Develop the “ Compatibility Detection ” approach to refine the raw mapping data Suitable and efficient as the Web scales Suitable and efficient as the Web scales

Future Work Figure out the HC Function Figure out the HC Function “ Minimum ” is feasible “ Minimum ” is feasible Distinguish Trivial Difference in Confidence Distinguish Trivial Difference in Confidence Set up a proper threshold Set up a proper threshold Space Complexity Space Complexity Type Subsumption Type Subsumption Departing: datetime Departing: datetime Departing: string Departing: string

Questions ? Questions ?