University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Chapter 5: Introduction to Information Retrieval
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Lesson 2 Technology: Federated Searching Explained.
Web Search – Summer Term 2006 VII. Selected Topics - Metasearch Engines [1] (c) Wolfgang Hürst, Albert-Ludwigs-University.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
The University of Kansas Vitalseek Dr. Susan Gauch.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
Overview of Search Engines
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.
Databases & Data Warehouses Chapter 3 Database Processing.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
A Web Crawler Design for Data Mining
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
State of the KUMC Jameson Watkins Director, Internet Development Our Topics Updated stats New KU design Search engines: how they.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation).
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Subtask 1.8 WWW Networked Knowledge Bases August 19, 2003 AcademicsAir force Arvind BansalScott Pollock Cheng Chang Lu (away)Hyatt Rick ParentMark (SAIC)
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Sigir’99 Inside Internet Search Engines: Spidering and Indexing Jan Pedersen and William Chang.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
CP3024 Lecture 12 Search Engines. What is the main WWW problem?  With an estimated 800 million web pages finding the one you want is difficult!
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Developing GRID Applications GRACE Project
Definition, purposes/functions, elements of IR systems Lesson 1.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
Information Retrieval in Practice
Metasearch Thanks to Eric Glover NEC Research Institute.
Visual Information Retrieval
Search Engine Architecture
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Presentation transcript:

University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas

Introduction Information overload on the Web Many possible search engines Need intelligent help to –select best information sources –customize results –browse the Web –handle non-textual information

University of Kansas ProFusion: Searching the Web Many search engines –different spiders –different retrieval algorithms –different results Which to use? –differs depending on query –generally want information from more than one

University of Kansas Distributed Agent Approach ProFusion is an Agent-based meta-search engine which communicates with multiple, distributed search engines – Routes user queries to most appropriate search engines Communicates in parallel Fuses results returned

University of Kansas Architecture Knowledge Sources –no private index –meta-knowledge about strengths of search engines with respect to a collection of categories –lexicon which associates word with the same collection of categories

University of Kansas Architecture (cont.) Agents –one Broker Agent which controls search routes query to most appropriate search agents fuses information returned –one Faciliator Agent per search engine which communicates with it –one User Information Filtering Agent which identifies new information for registered users

University of Kansas

Dispatch Agent: Query Routing for each word in query –use lexicon to map from word -> categories –use meta-knowledge to map from categories -> top three search engines if no query word are in dictionary, use default best three

University of Kansas Dispatch Agent: Fusing Results rank order results –normalize scores for all retrieved urls search engines report match values differently –multiply score by confidence factor for each search engine average value of performance over 13 categories –rank order based on result remove duplicates and broken links

University of Kansas Search Agent encapsulate knowlege for each underlying search engine in a “competence module” map from standard query representation to specific syntax for each search engine connect to, and receive results from, search engines parse result page and extract contents into standard format (URL, weight, title, summary...) normalize weights

University of Kansas

Learning Agent: Adaptation adapt to network load –monitor and set individual time-out values adapt to broken search engines –identify down search engines –prevent them from being selected –invoke guarding agent to periodically check status

University of Kansas Adaptation (cont.) adapt to changing search engine protocol –generic pattern matching grammar for parsing search engine results adapt to changing search engine performance –automatically calibrate quality of search engine results in each category –adjust confidence factors based observations of user behavior (which item in ranked list they select first)

University of Kansas User Agents: Personalized Search Users may register personal queries with ProFusion to be automatically re-run on a periodic basis Query results are presented in three categories –new –relevant –possibly relevant

University of Kansas ProFusion: Current Thrusts index own collection to support searching personal collection characterize personal collection with respect to personal taxonomy –basis of browsing contents of personal collection incorporate user’s feedback to filter out and prioritize new results

University of Kansas Extension: Distributed Search currently, spiders collect all information centrally –lots of traffic, disk space, overloaded sites –“supermarket” approach dispatch queries to “best” sites –“specialty store” approach challenges –identify the best sites for each query

University of Kansas Distributed Search: Site Agents index own site to support local searches characterize site with respect to global taxonomy –meta-knowledge for routing queries to this site –basis of browsing contents of a specific site

University of Kansas Distributed Search: Brokers collect meta-information from Site Agents route queries to most appropriate sites for distributed processing browse Web via meta-knowledge (taxonomy of sites/pages automatically collated from collected meta-information)

University of Kansas Discovering Video Information VISION: Video Indexing for SearchIng Over Networks –create a database of video clips indexed by their associated closed captions –locate related information via Web searching to augment video clips Goals: entirely automatic, real time

University of Kansas

Summary many sources of information need a consistent interface to locate information regardless of –where it is –what format it is in one source is not enough –locate and fuse information from multiple sources