Search Engine Architecture

Slides:

Advertisements

Similar presentations

Chapter 5: Introduction to Information Retrieval

Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.

Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.

Information Retrieval in Practice

Search Engines and Information Retrieval

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)

Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

Information Retrieval in Practice

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.

1 Information Retrieval and Web Search Introduction.

Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )

Overview of Search Engines

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Introduction to Information Retrieval Hongning Wang

Search Engines and Information Retrieval Chapter 1.

Aardvark Anatomy of a Large-Scale Social Search Engine.

Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.

Search Engine Architecture

Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.

IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.

Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.

Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.

Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.

1 Information Retrieval LECTURE 1 : Introduction.

Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.

Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.

Information Retrieval

Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.

Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

Relevance Feedback Hongning Wang

Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)

Information Retrieval in Practice

Information Retrieval in Practice

Information Storage and Retrieval Fall Lecture 1: Introduction and History.

Visual Information Retrieval

Search Engine Architecture

Information Retrieval (in Practice)

Introduction to Information Retrieval

Chapter Five Web Search Engines

Information Retrieval and Web Search

Search Engine Architecture

Information Retrieval and Web Search

Information Retrieval and Web Search

Relevance Feedback Hongning Wang

Information Retrieval

Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology

WIRED Week 2 Syllabus Update Readings Overview.

Thanks to Bill Arms, Marti Hearst

IR Theory: Evaluation Methods

CSE 635 Multimedia Information Retrieval

Introduction to Information Retrieval

Introduction to Information Retrieval

Information Retrieval and Web Design

Information Retrieval and Web Design

Recuperação de Informação

Information Retrieval and Web Search

Introduction to Search Engines

Presentation transcript:

Search Engine Architecture Hongning Wang CS@UVa

Recap: why information retrieval Information overload “It refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information.” - wiki CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Recap: IR v.s. DBs Information Retrieval: Unstructured data Semantics of objects are subjective Simple keyword queries Relevance-drive retrieval Effectiveness is primary issue, though efficiency is also important Database Systems: Structured data Semantics of each object are well defined Structured query languages (e.g., SQL) Exact retrieval Emphasis on efficiency CS@UVa CS4501: Information Retrieval

Classical search engine architecture “The Anatomy of a Large-Scale Hypertextual Web Search Engine” - Sergey Brin and Lawrence Page, Computer networks and ISDN systems 30.1 (1998): 107-117. Crawler and indexer Citation count: 17,463 (as of September 6, 2017) Query parser Document Analyzer Ranking model CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval User input Result display Result post-processing Query parser Ranking model Domain specific databases Crawler & Indexer Document analyzer & auxiliary database CS@UVa CS4501: Information Retrieval

Abstraction of search engine architecture Indexed corpus Crawler Research attention Ranking procedure Evaluation Feedback Doc Representation Doc Analyzer Query Rep (Query) User Ranker Index results Indexer CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Core IR concepts Information need “an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need” – wiki An IR system is to satisfy users’ information need Query A designed representation of users’ information need In natural language, or some managed form CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Core IR concepts Document A representation of information that potentially satisfies users’ information need Text, image, video, audio, and etc. Relevance Relatedness between documents and users’ information need Multiple perspectives: topical, semantic, temporal, spatial, and etc. One sentence about IR - “rank documents by their relevance to the user’s information need” CS@UVa CS4501: Information Retrieval

Key components in a search engine Web crawler An automatic program that systematically browses the web for the purpose of Web content indexing and updating Document analyzer & indexer Manage the crawled web content and provide efficient access of web documents CS@UVa CS4501: Information Retrieval

Key components in a search engine Query parser Compile user-input keyword queries into managed system representation Ranking model Sort candidate documents according to it relevance to the given query Result display Present the retrieved results to users for satisfying their information need CS@UVa CS4501: Information Retrieval

Key components in a search engine Retrieval evaluation Assess the quality of the returned results Relevance feedback Propagate the quality judgment back to the system for search result refinement CS@UVa CS4501: Information Retrieval

Key components in a search engine Search query logs Record users’ interaction history with search engine User modeling Understand users’ longitudinal information need Assess users’ satisfaction towards search engine output CS@UVa CS4501: Information Retrieval

Recap: abstraction of search engine architecture Indexed corpus Crawler Research attention Ranking procedure Evaluation Feedback Doc Representation Doc Analyzer Query Rep (Query) User Ranker Index results Indexer CS@UVa CS4501: Information Retrieval

Discussion: Browsing v.s. Querying Browsing – what Yahoo did before The system organizes information with structures, and a user navigates into relevant information by following a path enabled by the structures Works well when the user wants to explore information or doesn’t know what keywords to use, or can’t conveniently enter a query (e.g., with a smartphone) Querying – what Google does A user enters a (keyword) query, and the system returns a set of relevant documents Works well when the user knows exactly what query to use for expressing her information need CS@UVa CS4501: Information Retrieval

Pull vs. Push in Information Retrieval Pull mode – with query Users take initiative and “pull” relevant information out from a retrieval system Works well when a user has an ad hoc information need Push mode – without query Systems take initiative and “push” relevant information to users Works well when a user has a stable information need or the system has good knowledge about a user’s need CS@UVa CS4501: Information Retrieval

Discussion: is Yelp a search engine? CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval What you should know Basic workflow and components in an IR system Core concepts in IR Browsing v.s. querying Pull v.s. push of information CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Today’s reading Introduction to Information Retrieval Chapter 19: Web search basics CS@UVa CS4501: Information Retrieval