MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

-- MetaQuerier Mid-flight -- Toward Large-Scale Integration: Building a MetaQuerier over Databases on the Web Kevin C. Chang Joint work with: Bin He, Zhen.
Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
BIOMEDICAL DATA INTEGRATION BASED ON METAQUERIER ARCHITECTURE GROUP MEMBERS -NAIEEM KHAN -EUSUF ABDULLAH MIM -M SAMIULLAH CHOWDHURY ADVISOR : KHONDKER.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web Mining Research: A Survey
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou.
Mining in the Middle: From Search to Integration on the Web Kevin C. Chang Joint with : the UIUC and Cazoodle Teams Mining Integration Search.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Public Conversations Architecture Clustering Results Conversation Map Conclusion CEES: Intelligent Access to Public Conversations William Lee,
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Fundamentals of Information Systems, Fifth Edition
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Light-weight Domain-based Form Assistant: Querying Web Databases On The Fly Authors:Z. Zhang, B. He, K. C.-C. Chang (Univ. of Illinois at Urbana-Champaign)
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Dynamic Hypermedia Generations through a Mediator using CRM and Web Service Jen-Shin Hong National ChiNan University,Taiwan
Information Integration Across Heterogeneous Sources: Where Do We Stand and How to Proceed? Aditya Telang Sharma Chakravarthy, Yan Huang.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Deep Web Exploration Dr. Ngu, Steven Bauer, Paris Nelson REU-IR This research is funded by the NSF REU program AbstractOur Submission Technique Results.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
1 Context-Aware Internet Sharma Chakravarthy UT Arlington December 19, 2008.
1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
Large-Scale Deep Web Integration: Exploring and Querying Structured Data on the Deep Web Kevin C. Chang Tutorial in SIGMOD’06.
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Web-based Data Mining for Quenching Data Analysis Aparna S. Varde, Makiko Takahashi, Mohammed Maniruzzaman, Richard D. Sisson Jr. Center for Heat Treating.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Context-Aware Wrapping: Synchronized Data Extraction Shui-Lung Chuang, Kevin.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Vertical Integration Across Biological Scales A New Framework for the Systematic Integration of Models in Systems Biology University College London CoMPLEX.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
SAP BI – The Solution at a Glance : SAP Business Intelligence is an enterprise-class, complete, open and integrated solution.
Developing an Enquirer Carlos Rivero. Contents Deep Web Data Islands IntegraWeb Conclusions.
Data mining in web applications
Statistical Schema Matching across Web Query Interfaces
Information Networks: State of the Art
Toward Large Scale Integration
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang

MetaQuerier 2 The previous Web: things are just on the surface

MetaQuerier 3 The current Web: Getting “deeper” with non- trivial access

MetaQuerier 4 MetaQuerier: Exploring and integrating deep Web Explorer source discovery source modeling source indexing Integrator source selection schema integration query mediation FIND sources QUERY sources db of dbs unified query interface Amazon.com Cars.com 411localte.com Apartments.com

MetaQuerier 5 Toward large scale integration We are facing very different “large scale” scenarios! Many sources on the Web, order of 10 5 Such integration must be dynamic and ad-hoc: Dynamic discovery:  Sources are dynamically changing On-the-fly integration:  Queries are ad-hoc and need different sources

MetaQuerier 6 Our proposal: MetaQuerier for the deep Web MetaExplorer: April  IIS CAREER: Dynamic Ad-hoc Information Integration across the Internet MetaIntegrator: August  IIS ITR: Shallow Integration over the Deep Web: A Holistic Approach  This talk: midterm report – Lessons learned!

MetaQuerier 7 Lesson #1: Be careful with what you propose. Because you may actually get it.

MetaQuerier 8 The challenge boils down to – How to deal with “ deep ” semantics across a large scale? “Semantics” is the key in integration! How to understand a query interface?  Where is the first condition? What’s its attribute? How to match query interfaces?  What does “author” on this source match on that? How to translate queries?  How to ask this query on that source?

MetaQuerier 9 Lesson #2: Think not only the right techniques but also the right goals. “As needs are so great, compromise is possible.” -- Carey and Haas

MetaQuerier 10 Our goals defined Domain-based integration  Sources in the same domain are simpler to integrate  Such sources are useful to integrate Semi-transparent integration  Bring users to the right sources  Help users to interact as automatically as possible

MetaQuerier 11 Lesson #3: Send your scouts. Survey the frontier before you go to the battle.

MetaQuerier 12 Our survey found… Challenge reassured:  450,000 online databases  1,258,000 query interfaces  307,000 deep web sites  3-7 times increase in 4 years Insight revealed:  Web sources are not arbitrarily complex  “Amazon effect” – convergence and regularity naturally emerge

MetaQuerier 13 “Amazon effect” in action… Attributes converge in a domain! Constraint patterns converge even across domains!

MetaQuerier 14 Lesson #4: The challenge may as well be an opportunity. Large scale is not only a challenge but also an opportunity.

MetaQuerier 15 Shallow observable clues:  ``underlying'' semantics often relates to the ``observable'' presentations in some way of connection. Holistic hidden regularities:  Such connections often follow some implicit properties, which will reveal holistically across sources Large-scale itself presents opportunity -- Shallow integration across holistic sources Semantics: (to be discovered) Presentations (observed) Reverse Analysis Some Way of Connection Hidden Regularities

MetaQuerier 16 Some evidences for holistic integration Evidence 1: [SIGMOD04] Query Interface Understanding Hidden-syntax parsing Evidence 2: [SIGMOD03, KDD04] Matching Query Interfaces Hidden-model discovery attributeoperatorvalue

MetaQuerier 17 Evidences for holistic integration Evidence 1: [SIGMOD04] Query Interface Understanding by Hidden-syntax parsing Evidence 2: [SIGMOD03, KDD04] Query Interfaces Matching by Hidden-model discovery Query Capabilities Visual Patterns Hidden Syntax (Grammar) Syntactic Composer Syntactic Analyzer Attribute Matchings Attribute Occurrences Hidden Generative Model Statistic Generator Statistic Analyzer

MetaQuerier 18 Putting together: The MetaQuerier system Database Crawler Database Crawler MetaQuerier Interface Extraction Interface Extraction Source Clustering Source Clustering Schema Matching Schema Matching The Deep Web Back-end: Semantics Discovery Front-end: Query Execution Query Translation Query Translation Source Selection Source Selection Grammar Type Patterns Result Compilation Result Compilation Deep Web Repository Unified InterfacesSubject DomainsQuery CapabilitiesQuery Interfaces Query Web databasesFind Web databases

MetaQuerier 19 Lesson #5: Use undergraduates. Then it might be possible to build systems at schools.

MetaQuerier 20 Conclusion: Toward large scale integration Status: Completed or in progress  Deep Web survey [SIGMOD-Record Sep’04]  Query-interface understanding [SIGMOD’04]  Schema matching [SIGMOD’03, KDD’04]  Source clustering [CIKM’04]  Query translation [VLDB-IIWeb’04]  Shallow, holistic integration approach [VLDB-IIWeb’04, SIGMOD-Record Dec’04] Current focus:  System integration for building an integration system

MetaQuerier 21 Thank You! For more information: Welcome to see our demo tomorrow!