استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Natural Language Processing WEB SEARCH ENGINES August, 2002.
1 Presented By Avinash Gutte Under The Guidance of Mrs. Hemangi Kulkarni Department of Computer Engineering Pimpri-Chinchwad College of Engineering, Pune.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Overview of Search Engines
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Search Engine Optimization
Databases & Data Warehouses Chapter 3 Database Processing.
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Wasim Rangoonwala ID# CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when,
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
A Web Crawler Design for Data Mining
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Anatomy of a search engine Design criteria of a search engine Architecture Data structures.
Crawling Slides adapted from
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
Basic Search Engine Optimization. What is SEO?  SEO is an abbreviation for search engine optimization.
Web Search Algorithms By Matt Richard and Kyle Krueger.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Search Engines.
Computer Science 1000 Information Searching II Permission to redistribute these slides is strictly prohibited without permission.
A Training Manual By Sapience Infosolutions. Did You Know…(Revelation)  What is a Search Engine?  What is Google?  How Google Works?  What is Web.
Understanding Search Engines. Basic Defintions: Search Engine Search engines are information retrieval (IR) systems designed to help find specific information.
Search Engines By: Faruq Hasan.
1 University of Qom Information Retrieval Course Web Search (Spidering) Based on:
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Week 1 Introduction to Search Engine Optimization.
The anatomy of a Large-Scale Hypertextual Web Search Engine.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Design and Implementation of a High- Performance Distributed Web Crawler Vladislav Shkapenyuk, Torsten Suel 실시간 연구실 문인철
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engine Optimization
Search Engine Optimization
Search Engines and Search techniques
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Text Based Information Retrieval
Search Engines & Subject Directories
The Anatomy of a Large-Scale Hypertextual Web Search Engine
IST 497 Vladimir Belyavskiy 11/21/02
What is a Search Engine EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Search Engines & Subject Directories
Search Engines & Subject Directories
Presentation transcript:

استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture

How Google Works Google runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Parallel processing is a method of computation in which many calculations can be performed simultaneously, significantly speeding up data processing. Google search has three distinct parts: 1. Googlebot, a web crawler that finds and fetches web pages. 2. The indexer that sorts every word on every page and stores the resulting index of words in a huge database. 3. The query processor, which compares your search query to the index and recommends the documents that it considers most relevant.

1. Googlebot, Google’s Web Crawler Googlebot is Google’s web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer. It functions much like your web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Google’s indexer. Googlebot finds pages in two ways: 1.through an add URL form, 2.through finding links by crawling the web.

Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google’s index database. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms. stop words: Google ignores (doesn’t index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters), some punctuation and multiple spaces. 2. Google’s Indexer

The query processor has several parts: 1.including the user interface (search box) 2.the “engine” that evaluates queries and matches them to relevant documents 3.the results formatter PageRank: is Google’s system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower PageRank. Google considers over a hundred factors in computing a PageRank and determining which documents are most relevant to a query, including: popularity of the page the position and size of the search terms within the page the proximity of the search terms to one another on the page 3. Google’s Query Processor

GoogleBot techniques deep crawling technique : When Googlebot fetches a page, it culls all the links appearing on the page and adds them to a queue for subsequent crawling. Because of their massive scale, deep crawls can reach almost every page in the web. Because the web is vast, this can take some time, so some pages may be crawled only once a month. fresh crawls : To keep the index current, Google continuously recrawls popular frequently changing web pages at a rate roughly proportional to how often the pages change. The combination of the two types of crawls allows Google to both make efficient use of its resources and keep its index reasonably current.

deceiving tactics Google rejects those URLs submitted through its Add URL form that it suspects are trying to deceive users by employing tactics such as: including hidden text or links on a page stuffing a page with irrelevant words (Keyword stuffing) Meta tag stuffing cloaking using sneaky redirects creating doorways, domains, or sub-domains with substantially similar content sending automated queries to Google and linking to bad neighbors cloaking: refers to any of several means to serve a page to the search-engine spider that is different from that seen by human users. code swapping: optimizing a page for top ranking and then swapping another page in its place once a top ranking is achieved.

Gateway or Doorway pages Doorway pages are Web pages designed and built specifically to draw search engine visitors to your website. They are standalone pages designed only to act as doorways to your site.

Google’s Query diagram

Results Page For the sake of efficiency, Google searches only the first 101 kilobytes (approximately 17,000 words) of a web page and the first 120 kilobytes of a pdf file.

Cached Pages Google takes a snapshot of each page it examines and caches (stores) that version as a back-up. The cached version is what Google uses to judge if a page is a good match for your query. This is useful if the original page is unavailable because of: Internet congestion A down, overloaded, or just slow website The owner’s recently removing the page from the Web Note: Since Google’s servers are typically faster than many web servers, you can often access a page’s cached version faster than the page itself.

Cached Pages Note: Google indexes a page (adds it to its index and caches it) frequently if the page is popular (has a high PageRank) and if the page is updated regularly. The new cached version replaces any previous cached versions of the page.

News Headlines When Google finds current news relating to your query, Google includes up to three headlines that link to news stories above your search results.

با سپاس از توجه شما دانشکده فنی مهندسی دانشگاه بیرجند زمستان 1387