Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS URL RESOLVER (CF. [2], FIG. 1) DOC INDEXLEXICON P AGE R ANK LINKS BARRELS
General Web Search Engine Architecture CLIENT QUERY ENGINE RANKING CRAWL CONTROL CRAWLER(S) USAGE FEEDBACK RESULTS QUERIES WWW COLLECTION ANALYSIS MOD. INDEXER MODULE PAGE REPOSITORY INDEXES STRUCTUREUTILITYTEXT (CF. [1] FIG. 1)
References [1] A. ARASU, J. CHO, H. GARCIA-MOLINA, A. PAEPCKE, S. RAGHAVAN: "SEARCHING THE WEB", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, VOL 1/1, AUG Chapter 1 (Introduction, general architecture) [2] S. BRIN, L. PAGE: "THE ANATOMY OF A LARGE-SCALE HYPERTEXTUAL WEB SEARCH ENGINE", WWW 1998 Chapter 1 (Introduction), Chapter 4.1 (Google architecture overview)
Schedule Web Search : - Introduction - Crawling - Page Repository - Indexing - Ranking (PageRank, HITS) - Exercises for web search basics - Advanced / additional web search topics In parallel : - Programming project (Lucene)