Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer.

Slides:

Advertisements

Similar presentations

Searching for Information Search engines vs. subscription services.

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.

“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS

Information Retrieval in Practice

Combining Systems and Databases: A Search Engine Retrospective Reviewed By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer.

2/25/2004 The Google Cluster Architecture February 25, 2004.

Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)

Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:

Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.

© nCode 2000 Title of Presentation goes here - go to Master Slide to edit - Slide 1 Anatomy of a Large-Scale Hypertextual Web Search Engine ECE 7995: Term.

The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page.

Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.

SLIDE 1IS 240 – Spring 2010 Prof. Ray Larson University of California, Berkeley School of Information Principles of Information Retrieval.

The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.

Web Search – Summer Term 2006 V. Web Search - Page Repository (c) Wolfgang Hürst, Albert-Ludwigs-University.

Overview of Search Engines

BTREE Indices A little context information What’s the purpose of an index? Example of web search engines Queries do not directly search the WWW for data;

PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.

CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.

Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.

Electronic CommerceNonhlanhla Shongwe  Introduction  Mission statement  Product  Business model  SWOT Analysis  Conclusion.

Using Hyperlink structure information for web search.

The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.

Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.

Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.

Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

The Bits Bazaar Vast amounts of information scattered across the world. Access within reach of millions of people without editors. Search engines provide.

Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.

Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?

The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.

Web Search Algorithms By Matt Richard and Kyle Krueger.

Search Engine Architecture

The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.

GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.

CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.

Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.

Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

Google PageRank Algorithm

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

The anatomy of a Large-Scale Hypertextual Web Search Engine.

The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.

Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.

CS 440 Database Management Systems Web Data Management 1.

The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)

Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.

Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)

A presentation on ElasticSearch

Search Engine Optimization

Information Retrieval in Practice

Cluster-Based Scalable

Search Engine Architecture

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Improving searches through community clustering of information

SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.

Methods and Apparatus for Ranking Web Page Search Results

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Anatomy of a Search Search The Index:

Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.

Indexing 4/11/2019.

Presentation transcript:

Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Problem Statement:  How Search Engines (SEs) should have been designed.  How to leverage the Database principles in designing Data- Intensive (DI) applications without necessarily using the same semantics.

Importance of the paper in the current context:  Search Engines have become an important part of life for billions of people. It is intriguing how SE’s manage magnanimous amount of data of the order of 3B documents and increasing every second.  Behind-the-scenes challenges of designing SEs in terms of: Ranking Documents Ranking Query Results Availability Freshness of Data  Discusses data-intensive applications in the wake of SEs.  Finally this paper invokes thought as to how scalable these models can be as in the case of SE’s the data on the internet is increasing every second.

Contributions:  It gives numbers for various search engine parameters like: No of documents, Data stored, No of queries etc.. etc..  Discusses the challenges of designing a SE.  What principles of DBMS can be (should be) applied when designing DI applications like search engines: Top-Down Design Data Independence Declarative Query Language

Contributions (contd.):  Why SE’s did not use DBMS in the first place?  Why SE’s can not be implemented as DBMS in true sense: Speed: DBMS are slow Cost: DBMS are not cost-effective given the magnitude of the data High-Availability vs. Consistency: DBMS prefer consistency in antithesis to SE’s Update: The model of updating data in SE’s is entirely different from databases

Contributions (contd.): New Design  Uses static databases and large degree of offline work to build and rebuild static databases.  Overview of SE design: Crawl, Index, Serve Query (read-only)  Scoring of documents and words  Making a Query Plan  Query Implementation Access Methods and Physical Operators Optimize queries to maximize the through-put of the system Providing redundancy using clustering Compression and other optimizations Updation of data Fault Tolerance

Contributions (contd.):  SE challenges different from traditional DBMS: Personalization Logging Query rewriting Phrase queries

Validations:  Author experience on developing Inktomi search engine to come up with improved search engine design.  Author also studied the working of various modern Search Engines like Google, Alta-vista, Infoseek.

Assumptions:  Following are the assumptions that author has made while writing this paper: DI applications are essentially like SE’s and hence should be no different when it comes to utilizing database principles. While scoring the document, author assumes that shorter the length of the document, the higher the score it should be assigned Updates to the systems can always happen offline. It assumes that documents from one site are evenly distributed across the cluster nodes for load balancing.

Conclusion of the paper:  Data-intensive systems should employ the principles of databases.  Many systems are a good fit for DBMS principles (though may not use the same artifacts): Logging System Google File Systems Batch Aware distributed file system

Additional information that can be re- written/added if written today:  More emphasis and details on Logging: Companies like Google earn their moolah using advertising (of the order of billion of dollars)  How the following factors should affect the design of a SE: Probability of Click Attacks Privacy/Copyright concerns while crawling the web Generic Search vs. Search against a particular domain like law or image search  Comparisons of the design proposed with one current popular search engine.

References:   “The Anatomy of a Large-Scale Hypertextual Web Search Engine” (1998) by Sergey Brin, Lawrence Page

Thanks!!