Search Engine and SEO Presented by Yanni Li
Various Components of Search Engine
History Meta Tag - a hypertext markup language to show the properties of the webpage or website Meta Tag - a hypertext markup language to show the properties of the webpage or website However, it's soon found that ranking of search results have a huge benefit space, some webmasters abused Meta Tags by including irrelevant keywords to artificially increase type impressions for their websites and increase their ad revenues However, it's soon found that ranking of search results have a huge benefit space, some webmasters abused Meta Tags by including irrelevant keywords to artificially increase type impressions for their websites and increase their ad revenues
What is SEO? Search engine optimization (SEO) is the process of improving the volume or quality of traffic to a web site from search engines via "natural" or un-paid search results. Search engine optimization (SEO) is the process of improving the volume or quality of traffic to a web site from search engines via "natural" or un-paid search results. SEO has developed into a profession. SEO has developed into a profession. Before starting, the first thing needs to understand is how SEs rank websites. Before starting, the first thing needs to understand is how SEs rank websites.
SE Ranks Documents by Scores Generally, SE rank documents by their estimation of the usefulness of a document for a user query.Most SE systems assign a numeric score to every document and rank documents by this score. Generally, SE rank documents by their estimation of the usefulness of a document for a user query.Most SE systems assign a numeric score to every document and rank documents by this score. Different SEs use different scoring mechanisms. Different SEs use different scoring mechanisms. Google make heavy use of the structure present in hypertext. Google make heavy use of the structure present in hypertext.
Google ( 1 ) The simplest case is a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font...), each of which has its own type-weight. The simplest case is a single word query. In order to rank a document with a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font...), each of which has its own type-weight.
Google ( 2 ) The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. Google take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. Google take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document.
Two Kinds of SEO White Hat SEO White Hat SEO -- conforms to the search engines' guidelines and involves no deception -- conforms to the search engines' guidelines and involves no deception --create content for users and search engines --create content for users and search engines Black Hat SEO Black Hat SEO --tend to deceive search engine --tend to deceive search engine ---content a search engine indexes and ranks ---content a search engine indexes and ranks isn’t the same as the content a user will see. isn’t the same as the content a user will see.
Some White Hat SEOs Domain Selection Domain Selection -choose a domain that has keywords Design friendly webpages Design friendly webpages -- don’t like too much flash, java script... --make the site easy and fast to crawl. Write a suitable length of the article Write a suitable length of the article -too short won’t have a high rank -too long loose keyword density low rank users tend to shut down the article at the first glance users tend to shut down the article at the first glance Write Compact theme of each article Write Compact theme of each article --long article, covering a number of different topics whose relevance are not high, won’t rank very well in search engine.
Some Black hat SEOs Doorway pages Doorway pages --automatically generates a large number of keywords pages --from these pages automatically shifted to the home page Cloaked pages Cloaked pages Keyword stuffing Keyword stuffing Link Spam Link Spam -set up multiple web pages pointing to a target web page to boost the latter’s total in-links. -easy to build a new webpage, so this spam is growing rapidly.
Battle between SE and Spammer Search EngineSpammer Meta TagIrrelevant Keywords Term FrequencyKeyword Stuffing Link Analysis... Link Spam...
References [1]Christopher D. Manning Prabhakar Raghavan. Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press. Cambridge, [2] Sergey Brin, Lawrence Page. The Anatomy of a Large-Scale Hyper textual WebSearch Engine.
Thank You !