Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Chapter 5: Introduction to Information Retrieval
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan
Internet Resources Discovery (IRD) Search Engines Quality.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Web Mining Research: A Survey
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
1 Information Retrieval and Web Search Introduction.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Search engines. The number of Internet hosts exceeded in in in in in
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Web Searching. Web Search Engine A web search engine is designed to search for information on the World Wide Web and FTP servers The search results are.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Artificial intelligence project
Content-Based Image Retrieval
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
1 Search Engines Emphasis on Google.com. 2 Discovery  Discovery is done by browsing & searching data on the Web.  There are 2 main types of search facilities.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Data Mining By Dave Maung.
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Search Engine Architecture
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Data Mining for Web Intelligence Presentation by Julia Erdman.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Deep Web Exploration Dr. Ngu, Steven Bauer, Paris Nelson REU-IR This research is funded by the NSF REU program AbstractOur Submission Technique Results.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Visual Information Retrieval
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Web Search
Information Retrieval and Web Search
Multimedia Information Retrieval
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Search Engine Architecture
Web Mining Research: A Survey
Information Retrieval and Web Design
Information Retrieval and Web Design
Information Retrieval and Web Search
Presentation transcript:

Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003

Agenda Asnwerer Brilliant Conclusion

Answerer Answerer Co.

Answerer 1st generation: Yahoo! keyword matching technology 2nd generation: AskJeeves Uses multi-keyword search technology Shows the proximate questions Shows sites which might contain the sought information

Answerer 3rd generation: Answerer Uses more complex natural language analyzer and AI data mining and inferencing technology Gives exacts answers to the questions asked along with the sites which might contain more information Yahoo => AskJeeves => Answerer

Example Input Question: Who is the chairman of Microsoft Corporation? Output Result: Chairman of Microsoft Corporation is Bill Gates + related sites

Brilliant Microsoft Research China

1 st Generation Keyword based Problem: a simple keyword may not be able to convey complex search semantics a user wishes to express. Returning many irrelevant documents and eventually, disappointed users. Examples: Yahoo!, MSN,...

2 nd Generation FAQ-based Extracting FAQs and manually indexes these questions and their answers Users asked to confirm one or more rephrased questions in order to find their answers A few very precise results as answer

2 nd generation (cont.) Limited domain application such as web- base technical support Prime example: AskJeeves

3 rd Generation (Brilliant) Dealing with Concepts Accepting natural language queries Extracting syntactic as well as semantic information Robustness: Partial parsing whenever possible Interact with user for conformation the concept when facing ambiguity

Hypothesis Concept-space coverage hypothesis: A small subset of concepts can cover most user queries Track this small subset and use semi- automated methods to index the concepts precisely Results in a search engine that satisfies most user most of the time

Hypothesis (cont.) To support their hypothesis: They took a one-day log from MSN.com query log and manually mapped queries to pre-defined concept categories distinct queries are taken that represent 418,248 queries on Sep 4, 1999 and are classified.

Hypothesis (cont.) Example of concepts: “Finding computer and Internet related products and services” “Finding movies and toys on the Internet” and so on.

Hypothesis (cont.) Both keyword and concept distribution obey the pattern that the first few popular categories will cover most of the queries The concept distribution converges much faster that the keyword distribution Shows that their hypothesis stands at least for MSN.com query log data

Answer Question List Keywords Question String User InterfaceNLPMeta Search FAQ MatchingAnswer FindingUser Interface FAQ Database Template Database Answer Database Tools Crawler Keywords (and other…) Search Result Feedback Question Log Log Writer Dictionary Web Sites/Pages Architecture Overview

Parsing Parsing natural language based on grammatical knowledge obtained through analysis of query log data Processing query logs for the purpose of obtaining new question templates with indexed answers supports relevance-feedback

Robust Parsing Robust parsing to handle ill-formed inputs Robust parser attempts to overcome extra- grammatically by: Ignoring the un-parsable words and fragments conducing a search with maximal subset of the original input

LEAP The rule to travel from one place to another TravelPath => from |... ;... } Place { Beijing | Shanghai |... ; }

Example “How to go from Beijing to Shanghai?” LEAP parser returns the following result: How to go from place to place place

NL-Processor

Question Matching Mapping from the question space to the concept space (using concept-FAQ table) Mapping form the FAQ space to the template space Mapping form the template space to the answer space

Query Log Mining System is purely data-driven How to find the frequently asked questions from large amount of user questions? Statistical query co-occurrence analysis Clustering Classification

Multimedia Search Initially limited to just image based search Query by either text or example User interface issues Similarity measures –Color space measures –Feature space measures

User's Query Search Engine Image Database/ WWW Query by content Retrieval Content-based Image/Video Retrieval

Face-based Retrieval

Conclusion In the future, the knowledge of mankind will be really unmanageable by current approaches. Future users want precise answers to their questions and not millions of relevant or irrelevant web pages

Conclusion (cont.) I think the next generation of search engines will be a mixture of QA systems and current keyword- based SEs such as Google. This strictly depends on future developments of AI & IR & NLP techniques Future search engines wont be just machines. They will read a web page, understand it and answer our questions intelligently like humans or maybe better!