Information Integration for Digital Libraries

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Searching Options and Result Sets Sara Randall Endeavor Information Systems October 30, 2003.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Wincite Introduces Knowledge Notebooks A new approach to collecting, organizing and distributing internal and external information sources and analysis.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 Pertemuan 20 Searching Mechanisms Matakuliah: M0284/Teknologi & Infrastruktur E-Business Tahun: 2005 Versi: >
A Topic Specific Web Crawler and WIE*: An Automatic Web Information Extraction Technique using HPS Algorithm Dongwon Lee Database Systems Lab.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Search engines. The number of Internet hosts exceeded in in in in in
Internet Resources Discovery (IRD) Meta-Search Engines (MSEs)
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
Overview of Search Engines
What is a search engine? A program that indexes documents, then attempts to match documents relevant to a user's search requests. The term search engine.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Databases & Data Warehouses Chapter 3 Database Processing.
Searching “Search results are only as good as the query you pose and how you search. There is no silver bullet”
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
Search Engines and Information Retrieval Chapter 1.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Master Thesis Defense Jan Fiedler 04/17/98
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Searching Tutorial By: Lola L. Introduction:  When you are using a topic, you might want to use “keyword topics.” Using this might help you find better.
Search Engines.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Developing GRID Applications GRACE Project
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Learning how to search on the web “If all you ever do is all you’ve ever done, then all you’ll ever get is all you’ve ever got.” (author unknown)
Lecture 4 Access Tools/Searching Tools. Learning Objectives To define access tools To identify various access tools To be able to formulate a search strategy.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Information Retrieval in Practice
Metasearch Thanks to Eric Glover NEC Research Institute.
Search Engines.
A Shopping Agent for the WWW
Federated & Meta Search
IST 497E Information Retrieval and Organization
What is a Search Engine EIT, Author Gay Robertson, 2017.
International Marketing and Output Database Conference 2005
Haystack: an Adaptive Personalized Information Retrieval System
Information Retrieval and Web Design
Presentation transcript:

Information Integration for Digital Libraries August 10, 2000 Prof. Sang Ho Lee Soongsil University Seoul, Korea shlee@computing.soongsil.ac.kr

Information integration Provision of integrated access to multiple, distributed, heterogeneous databases and other information sources Mediator approach More up-to-date data No need to copy data Query needs can be unknown Data warehouse approach High query performance Can operate when sources unavailable Extra information at warehouse Modify, summarize (store aggregates), add historical information

Mediator Approach Client Wrapper Mediator Source

Data Warehouse Approach Client Client Query & Analysis Warehouse Metadata Integration Source Source Source

Web Searching Practice Approx. 800 million indexable Web pages (Feb. 1999) Low coverage of the Web No engine indexing more than 16% of indexable web pages Out of date New pages take months to be indexed Low metadata use 34% use “keywords” or “description” metatags 0.3% use the Dublin Core metadata standard Simple queries Most queries use 1-3 search words Poor relevancy ranking and precision

Meta Search engines USA Korea SavvySearch (www.savvysearch.com) MetaCrawler (www.go2net.com/search.html) Ask Jeeves (www.askjeeves.com) ProFusion (www.profusion.com) Mamma (www.mamma.com) Ixquick (www.ixquick.com) Korea Wakano (www.wakano.co.kr) Ms. DaChanni (www.mochanni.com) Over 3000 metasearch engines around the world

Operation Flow and Technical Issues User query Decompose and format queries Send queries and get results Post processing (ranking, clustering, etc.) Output result

Current Practice of Metasearch Engines Tend to a least-common-denominator interface Not utilize function of individual sources completely Covers general area, not a specific area Little utilization of domain knowledge Little consideration to personal profiles

Proposed Research Topics (1) Theme: focused on mediator-based integration techniques (in particular, metasearch engines) Intelligent wrapper techniques To extract, combine, and reconcile information for external sources Exploit user profiles and utilize function of each sources as much as possible Should be flexible and adaptable, as external sources change Several approaches Formal language based, machine learning based, heuristic based, extended CFG based, …

Proposed Research Topics (2) Efficiency issues How to cache results and queries, to provide a fast response to users How to do parallelism when accessing external sources

Research/Development Strategies Categorize objects and develop specialized search mechanism for each category Build a working system to experiment theories Experiment new ranking methods Google, Goto, …