Discussion Class 9 Google.

Slides:

Advertisements

Similar presentations

Chapter 5: Introduction to Information Retrieval

Advertisements

The Inside Story Christine Reilly CSCI 6175 September 27, 2011.

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

1 Discussion Class 3 The Porter Stemmer. 2 Course Administration No class on Thursday.

“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS

1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.

1 Discussion Class 2 A Vector Space Model for Automated Indexing.

1 Discussion Class 11 Click through Data as Implicit Feedback.

1 CS 430 / INFO 430 Information Retrieval Lecture 2 Searching Full Text 2.

The PageRank Citation Ranking “Bringing Order to the Web”

Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)

1 Discussion Class 4 Latent Semantic Indexing. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others.

1 CS/INFO 430 Information Retrieval Lecture 17 Web Search 3.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

1 CS 430: Information Discovery Lecture 21 Web Search 3.

1 Discussion Class 10 Informedia. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment.

1 Discussion Class 12 User Interfaces and Visualization.

1 Discussion Class 3 Inverse Document Frequency. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for.

1 Discussion Class 2 A Vector Space Model for Automated Indexing.

1 Discussion Class 6 Crawling the Web. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to.

1 Discussion Class 8 The Google File System. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others.

1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.

1 Final Discussion Class User Interfaces. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to.

1 Discussion Class 1 Three Information Retrieval Systems.

Information Retrieval

HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.

Databases & Data Warehouses Chapter 3 Database Processing.

Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα

Presented By: - Chandrika B N

Using Hyperlink structure information for web search.

INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.

The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.

1 Discussion Class 4 The Dublin Core Metadata Initiative.

Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.

GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.

“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.

1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.

1 CS 430: Information Discovery Lecture 5 Ranking.

1 Discussion Class 1 Three Information Retrieval Systems.

1 Discussion Class 1 Inverted Files. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to comment.

1 CS 430: Information Discovery Lecture 20 Web Search Engines.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

1 Discussion Class 2 A Vector Space Model for Automated Indexing.

Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.

Presentation by: Rebecca Chambers WebDuck Designs

IST 516 Fall 2011 Dongwon Lee, Ph.D.

Federated & Meta Search

SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Web & Databases Dania Bilal IS 530 Fall 2006.

Information retrieval and PageRank

Data Mining Chapter 6 Search Engines

Discussion Class 7 Lucene.

Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.

Introduction to Information Retrieval

The Search Engine Architecture

Relevance Feedback and Query Modification

Information Retrieval and Web Design

Discussion Class 3 Stemming Algorithms.

Introduction to information retrieval

Discussion Class 7 User Requirements.

Discussion Class 8 User Interfaces.

Presentation transcript:

Discussion Class 9 Google

Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment. When answering: Stand up. Give your name. Make sure that the TA hears it. Speak clearly so that all the class can hear. Suggestions: Do not be shy at presenting partial answers. Differing viewpoints are welcome.

Question 1: Indexing the Web Who are the authors of this paper? (b) The authors criticize conventional ranking methods, based on vector similarity. What are their criticisms? Do you agree with them? (c) Why not use standard full-text indexing with tf.idf weighting?

Question 2: Ranking The authors of the paper state that their objective is to maximize precision. (a) What do they mean by "precision"? (b) What assumptions does this imply about users and their wishes? How does their view of relevance differ from the conventional view? How well would you expect Google to perform in the TREC ad hoc track?

Question 3: PageRank Algorithm Traditional text search engines rank hits by the similarity of each document to a query. How does PageRank rank the hits returned by a query? What is the concept behind PageRank? What other ranking methods does Google use?

Question 4: Anchor Text What is anchor text? How does Google use anchor text to index a web page? What are the computational challenges in this approach?

Question 5: Scaling Much of the article is about scalability. (a) How many pages were they indexing when they wrote the article? How many today? How many queries does the system handle every day? (b) What is their strategy for scalability? Where do you think the limitations lie? (c) How did they manage to implement such a large-scale (and ever changing) system with a small technical staff?

Question 6: Spamming "There are even numerous companies which specialize in manipulating search engines for profit." (a) Explain this statement. (b) How does Google overcome this problem? (c) Why are the authors unenthusiastic about using metadata for indexing the web?

Question 7: Implementation (a) What is the function of the Google lexicon? How is it stored? (b) What is the function of the hit list? How is it stored? (c) How can a Google search find a web page that has never been indexed?