Search Engine Architecture

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
1 Information Retrieval and Web Search Introduction.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Overview of Search Engines
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Introduction to Information Retrieval Hongning Wang
Search Engines and Information Retrieval Chapter 1.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Search Engine Architecture
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Relevance Feedback Hongning Wang
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Information Retrieval in Practice
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Visual Information Retrieval
Search Engine Architecture
Information Retrieval (in Practice)
Introduction to Information Retrieval
Chapter Five Web Search Engines
Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Web Search
Information Retrieval and Web Search
Relevance Feedback Hongning Wang
Information Retrieval
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
WIRED Week 2 Syllabus Update Readings Overview.
Thanks to Bill Arms, Marti Hearst
IR Theory: Evaluation Methods
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Recuperação de Informação
Information Retrieval and Web Search
Introduction to Search Engines
Presentation transcript:

Search Engine Architecture Hongning Wang CS@UVa

Recap: why information retrieval Information overload “It refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information.” - wiki CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Recap: IR v.s. DBs Information Retrieval: Unstructured data Semantics of objects are subjective Simple keyword queries Relevance-drive retrieval Effectiveness is primary issue, though efficiency is also important Database Systems: Structured data Semantics of each object are well defined Structured query languages (e.g., SQL) Exact retrieval Emphasis on efficiency CS@UVa CS4501: Information Retrieval

Classical search engine architecture “The Anatomy of a Large-Scale Hypertextual Web Search Engine” - Sergey Brin and Lawrence Page, Computer networks and ISDN systems 30.1 (1998): 107-117. Crawler and indexer Citation count: 17,463 (as of September 6, 2017) Query parser Document Analyzer Ranking model CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval User input Result display Result post-processing Query parser Ranking model Domain specific databases Crawler & Indexer Document analyzer & auxiliary database CS@UVa CS4501: Information Retrieval

Abstraction of search engine architecture Indexed corpus Crawler Research attention Ranking procedure Evaluation Feedback Doc Representation Doc Analyzer Query Rep (Query) User Ranker Index results Indexer CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Core IR concepts Information need “an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need” – wiki An IR system is to satisfy users’ information need Query A designed representation of users’ information need In natural language, or some managed form CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Core IR concepts Document A representation of information that potentially satisfies users’ information need Text, image, video, audio, and etc. Relevance Relatedness between documents and users’ information need Multiple perspectives: topical, semantic, temporal, spatial, and etc. One sentence about IR - “rank documents by their relevance to the user’s information need” CS@UVa CS4501: Information Retrieval

Key components in a search engine Web crawler An automatic program that systematically browses the web for the purpose of Web content indexing and updating Document analyzer & indexer Manage the crawled web content and provide efficient access of web documents CS@UVa CS4501: Information Retrieval

Key components in a search engine Query parser Compile user-input keyword queries into managed system representation Ranking model Sort candidate documents according to it relevance to the given query Result display Present the retrieved results to users for satisfying their information need CS@UVa CS4501: Information Retrieval

Key components in a search engine Retrieval evaluation Assess the quality of the returned results Relevance feedback Propagate the quality judgment back to the system for search result refinement CS@UVa CS4501: Information Retrieval

Key components in a search engine Search query logs Record users’ interaction history with search engine User modeling Understand users’ longitudinal information need Assess users’ satisfaction towards search engine output CS@UVa CS4501: Information Retrieval

Recap: abstraction of search engine architecture Indexed corpus Crawler Research attention Ranking procedure Evaluation Feedback Doc Representation Doc Analyzer Query Rep (Query) User Ranker Index results Indexer CS@UVa CS4501: Information Retrieval

Discussion: Browsing v.s. Querying Browsing – what Yahoo did before The system organizes information with structures, and a user navigates into relevant information by following a path enabled by the structures Works well when the user wants to explore information or doesn’t know what keywords to use, or can’t conveniently enter a query (e.g., with a smartphone) Querying – what Google does A user enters a (keyword) query, and the system returns a set of relevant documents Works well when the user knows exactly what query to use for expressing her information need CS@UVa CS4501: Information Retrieval

Pull vs. Push in Information Retrieval Pull mode – with query Users take initiative and “pull” relevant information out from a retrieval system Works well when a user has an ad hoc information need Push mode – without query Systems take initiative and “push” relevant information to users Works well when a user has a stable information need or the system has good knowledge about a user’s need CS@UVa CS4501: Information Retrieval

Discussion: is Yelp a search engine? CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval What you should know Basic workflow and components in an IR system Core concepts in IR Browsing v.s. querying Pull v.s. push of information CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Today’s reading Introduction to Information Retrieval Chapter 19: Web search basics CS@UVa CS4501: Information Retrieval