Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Search – Summer Term 2006 IV. Web Search - Crawling (part 2) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Information Retrieval in Practice
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS URL RESOLVER (CF.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Presentation of Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page (1997) Presenter: Scott White.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
Web Search – Summer Term 2006 IV. Web Search - Crawling (c) Wolfgang Hürst, Albert-Ludwigs-University.
© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page.
Web Search – Summer Term 2006 VII. Selected Topics - The Hilltop Algorithm (c) Wolfgang Hürst, Albert-Ludwigs-University.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Web Search – Summer Term 2006 VII. Selected Topics - Metasearch Engines [1] (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Information Retrieval
Web Search – Summer Term 2006 V. Web Search - Page Repository (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
1 Intelligent Crawling Junghoo Cho Hector Garcia-Molina Stanford InfoLab.
1 The anatomy of a Large Scale Search Engine Sergey Brin,Lawrence Page Dept. CS of Stanford University.
Overview of Search Engines
Web Search – Summer Term 2006 VI. Web Search - Ranking (c) Wolfgang Hürst, Albert-Ludwigs-University.
Lecture 12 IR in Google Age. Traditional IR Traditional IR examples – Searching a university library – Finding an article in a journal archive – Searching.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Anatomy of a search engine Design criteria of a search engine Architecture Data structures.
CSM06 Information Retrieval Lecture 4: Web IR part 1 Dr Andrew Salway
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Search Xin Liu. 2 Searching the Web for Information How a Search Engine Works –Basic parts: 1.Crawler: Visits sites on the Internet, discovering Web pages.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Kevin Mauricio Apaza Huaranca San Pablo Catholic University.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Search Engine Architecture
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
WIRED Week 4 Syllabus Review Readings Overview - Web IR Chapter - Brin & Page - Google - Kobayashi & Takeda – Overview Search Engine Optimization Assignment.
Web Search – Summer Term 2006 VII. Web Search - Indexing: Structure Index (c) Wolfgang Hürst, Albert-Ludwigs-University.
The anatomy of a Large-Scale Hypertextual Web Search Engine.
WIRED Week 6 Syllabus Review Readings Overview Search Engine Optimization Assignment Overview & Scheduling Projects and/or Papers Discussion.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
(Big) data accessing Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Information Retrieval in Practice
Search Engine Architecture
Search Engine Architecture
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Thanks to Bill Arms, Marti Hearst
Data Mining Chapter 6 Search Engines
Search Engine Architecture
Presentation transcript:

Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab: (c) Wolfgang Hürst, Albert-Ludwigs-University

IR vs. Web Search INFORMATION QUERY DATA / DOCUMENTS INFORMATION NEED The web is huge. Very huge. Big variety in dataBig variety in users Doc. authors don't cooperate (spam,...) Users don't cooperate (short queries,...) The no. of users is huge. Very huge... but basic conditions & characteristics differ significantly Initial problem is similar to traditional IR...

Classic IR vs. Web Search: Documents Hugh amount of data, continuous growth, high rate of change Hugh variability and heterogeneity - Quality, credibility and reputation of the source - Static vs. dynamic docs - Different media types (text, pics, audio, video) - Different formats (HTML, Flash, PDF,...) - Miscellaneous topics - Continuous text vs. note form / keywords - Different languages, encoding Spam and advertisements Web-specific characteristics - Hypertext, linking - Broken links - Unstructured, not always conform with standards Redundancy (syntactic and semantic) Distributed (need to collect them automatically) Different popularity and access frequency

Classic IR vs. Web Search: Users Different needs and aims, e.g. users might want - to learn s.th. ("informational") - to go to a particular site ("navigational") - to do s.th., e.g. shopping, download,... ("transactional") - to do other, miscellaneous things, e.g. finding hubs, "exploratory search",... Different premises, qualifications, languages,... Different network connection / bandwidths Imprecise, unspecific queries Short, ambiguous, inexact, incorrect, no usage of operators or special syntax Classic IR vs. Web Search: Bottom line Different characteristics that cause lots of problems But there's also good news: We can take advantage of some of these characteristics (e.g. links, statistics,...)

References [1] A. ARASU, J. CHO, H. GARCIA-MOLINA, A. PAEPCKE, S. RAGHAVAN: "SEARCHING THE WEB", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, VOL 1/1, AUG Chapter 1 (Introduction, general architecture) [2] S. BRIN, L. PAGE: "THE ANATOMY OF A LARGE-SCALE HYPERTEXTUAL WEB SEARCH ENGINE", WWW 1998 Chapter 1 (Introduction), Chapter 4.1 (Google Architecture Overview)

General Web Search Engine Architecture CLIENT QUERY ENGINE RANKING CRAWL CONTROL CRAWLER(S) USAGE FEEDBACK RESULTS QUERIES WWW COLLECTION ANALYSIS MOD. INDEXER MODULE PAGE REPOSITORY INDEXES STRUCTUREUTILITYTEXT (CF. [1] FIG. 1)

INDEX Recap: IR System & Tasks Involved INFORMATION NEEDDOCUMENTS User Interface PERFORMANCE EVALUATION QUERY QUERY PROCESSING (PARSING & TERM PROCESSING) LOGICAL VIEW OF THE INFORM. NEED SELECT DATA FOR INDEXING PARSING & TERM PROCESSING SEARCHING RANKING RESULTS DOCS. RESULT REPRESENTATION

The Google Search Engine Founded 1998 (1996) by two Stanford students Originally academic / research project that later became a commercial tool Distinguishing features (then!?): - Special (and better) ranking - Speed - Size

Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS DOC INDEXLINKS BARRELS URL RESOLVER LEXICON P AGE R ANK (CF. [2], FIG. 1)

Schedule Web Search : - Introduction - Crawling - Page Repository - Indexing - Ranking (PageRank, HITS) - Exercises for web search basics - Advanced / additional web search topics In parallel : - Programming project (Lucene)

References [1] A. ARASU, J. CHO, H. GARCIA-MOLINA, A. PAEPCKE, S. RAGHAVAN: "SEARCHING THE WEB", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, VOL 1/1, AUG Chapter 1 (Introduction, general architecture) [2] S. BRIN, L. PAGE: "THE ANATOMY OF A LARGE-SCALE HYPERTEXTUAL WEB SEARCH ENGINE", WWW 1998 Chapter 1 (Introduction), Chapter 4.1 (Google architecture overview)