Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Systems and Databases: A Search Engine Retrospective Reviewed By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer.

Similar presentations


Presentation on theme: "Combining Systems and Databases: A Search Engine Retrospective Reviewed By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer."— Presentation transcript:

1 Combining Systems and Databases: A Search Engine Retrospective Reviewed By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

2 Overview:  Problem Problem Statement Why is the problem important Why is the problem hard  Approaches Contributions of the paper Assumptions  Validations  Rewrite

3 Problem Statement:  Given: Current Search Engines and DBMS  Find An efficient search engine design including  A schema to store data  A query language  Implementation of query mechanism Ways to leverage the Database principles in designing Data- Intensive (DI) applications without necessarily using the same semantics.  Objectives Make use of database principles Efficient (Fast) Cost effective Scalable  Constraints Highly available

4 Why is the problem important?  Widespread use of Search Engines (SEs).  Amount of data to be searched is increasing every second Hence requires scalable design Glimpse of real-data (2005)  # of documents - 3Billion  Data – 10TB  Queries/day – 150 Million

5 Why is the problem hard?  As mentioned in [2], very little research is done in the area of search engines.  The documents/items to be searched are of the order of several billions.  “Search Items” changes over the time from plain text to multimedia these days.

6 Contributions:  Discusses the challenges of designing a SE. Ranking Documents Ranking Query Results Availability Freshness of Data  What principles of DBMS can be (should be) applied when designing DI applications like search engines: Top-Down Design Data Independence Declarative Query Language

7 Contributions (contd.):  Why SE’s can not be implemented as DBMS in true sense: MetricDatabasesSearch Engines SemanticsACIDACID doesn’t hold here SpeedSlowNeeds to be fast CostNot cost effective Amount of Data handled is huge High Availability vs. consistency Consistency is preferred High Availability is preferred UpdatesRegular updatesAt-will Batch updates

8 Key Concepts:  Ranking and scoring of Documents Word vs. Property Matching Query Q = {w1, w2, w3…. wk} Score(Q,d)  Quality(d) +  Score(wi, d)  Where Quality(d) is the quality of document independent of query words

9 Key Concepts: Proposed Design for SE’s Overview of SE design: Crawl, Index, Serve Query (read-only)  Scoring of documents and words  Making a Query Plan  Query Implementation Access Methods and Physical Operators Query Optimizer – Map the logical query, exploit caching, minimize the number of joins. Query Execution (on Clusters) Compression and other optimizations

10 Key Concepts: Proposed Design for SE’s (Contd.) Updation of data  Nodes are independent, Only whole tables updated, Query Atomic updation  Updation using crawling and Indexing Atomic Updates updates  Realtime Deletion and updates  System-wide Updates Fault Tolerance  Goal is High Availability  Disk Faults, Follower Faults, Master Faults  Graceful Degradation and Disaster recovery

11 Key Concepts (contd.):  Other topics in SEs that are different from DBMS: Personalization  Cookies or Database Logging Query rewriting Phrase queries

12 Test the concept: Q: How does “Query Optimizer” for Search Engines compare with traditional DBMS. A:  Both use Abstract logical query plan  SEs use Top-down Query Optimizer where as Databases use bottom-up

13 Assumptions:  The proximity of words is not considered in the overall score for a document. We do not agree with this.  While scoring the document, author assumes that shorter the length of the document, the higher the score it should be assigned, this is not true always.  Search queries are read only which is a valid assumption.  DI applications are essentially like SE’s and hence should be no different when it comes to utilizing database principles. This might not be true always.

14 Validations:  Author experience with Informix database for building a SE  Author experience on developing Inktomi search engine to come up with improved search engine design.  Working of various modern Search Engines like Google, Alta-vista, Infoseek.

15 Conclusion of the paper:  Data-intensive systems should employ the principles of databases.  Many systems are a good fit for DBMS principles (though may not use the same artifacts): Logging System Google File Systems Batch Aware distributed file system

16 Revisions if re-written today:  More emphasis and details on Logging: Companies like Google earn their moolah using advertising (of the order of billion of dollars)  How the following factors affect the design of a SE: Click Attacks Privacy/Copyright concerns while crawling the web Generic Search vs. Search against a particular domain like law/image search/multimedia search  Comparisons of the design proposed with one current popular search engine.

17 References:  [1] E.A. Brewer, Combining Systems and databases: A Search Engine Retrospective, Readings in Database Systems, J. M. Hellerstein and M. Stonebraker eds. (2005)Combining Systems and databases: A Search Engine Retrospective  [2] Sergey Brin, Lawrence Page “The Anatomy of a Large-Scale Hypertextual Web Search Engine” (1998)The Anatomy of a Large-Scale Hypertextual Web Search Engine  [3]Daniela Florescu, Alon Levy, Alberto Mendelzon, Database Techniques for the World-Wide Web: A Survey (1998), SIGMOD Record.  [4]Charles Frankel, Michael J. Swain, Vassilis Athitsos, WebSeer: An Image Search Engine for the World Wide Web - (1997), ACM. WebSeer: An Image Search Engine for the World Wide Web - (1997)  http://en.wikipedia.org/wiki/Searchengin e http://en.wikipedia.org/wiki/Searchengin e  http://searchenginewatch.com/ http://searchenginewatch.com/

18 Q’n’A and Thanks!!


Download ppt "Combining Systems and Databases: A Search Engine Retrospective Reviewed By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer."

Similar presentations


Ads by Google