Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Integration for Digital Libraries

Similar presentations


Presentation on theme: "Information Integration for Digital Libraries"— Presentation transcript:

1 Information Integration for Digital Libraries
August 10, 2000 Prof. Sang Ho Lee Soongsil University Seoul, Korea

2 Information integration
Provision of integrated access to multiple, distributed, heterogeneous databases and other information sources Mediator approach More up-to-date data No need to copy data Query needs can be unknown Data warehouse approach High query performance Can operate when sources unavailable Extra information at warehouse Modify, summarize (store aggregates), add historical information

3 Mediator Approach Client Wrapper Mediator Source

4 Data Warehouse Approach
Client Client Query & Analysis Warehouse Metadata Integration Source Source Source

5 Web Searching Practice
Approx. 800 million indexable Web pages (Feb. 1999) Low coverage of the Web No engine indexing more than 16% of indexable web pages Out of date New pages take months to be indexed Low metadata use 34% use “keywords” or “description” metatags 0.3% use the Dublin Core metadata standard Simple queries Most queries use 1-3 search words Poor relevancy ranking and precision

6 Meta Search engines USA Korea
SavvySearch ( MetaCrawler ( Ask Jeeves ( ProFusion ( Mamma ( Ixquick ( Korea Wakano ( Ms. DaChanni ( Over 3000 metasearch engines around the world

7 Operation Flow and Technical Issues
User query Decompose and format queries Send queries and get results Post processing (ranking, clustering, etc.) Output result

8 Current Practice of Metasearch Engines
Tend to a least-common-denominator interface Not utilize function of individual sources completely Covers general area, not a specific area Little utilization of domain knowledge Little consideration to personal profiles

9 Proposed Research Topics (1)
Theme: focused on mediator-based integration techniques (in particular, metasearch engines) Intelligent wrapper techniques To extract, combine, and reconcile information for external sources Exploit user profiles and utilize function of each sources as much as possible Should be flexible and adaptable, as external sources change Several approaches Formal language based, machine learning based, heuristic based, extended CFG based, …

10 Proposed Research Topics (2)
Efficiency issues How to cache results and queries, to provide a fast response to users How to do parallelism when accessing external sources

11

12 Research/Development Strategies
Categorize objects and develop specialized search mechanism for each category Build a working system to experiment theories Experiment new ranking methods Google, Goto, …


Download ppt "Information Integration for Digital Libraries"

Similar presentations


Ads by Google