Courtney Forsmann IT Help Desk Manager Lewis-Clark State College October 1, 2014
Types of Search Engines Crawler-Based Engines Google Human-Powered Directories Open Directory Hybrid Search Engines or Mixed Results MSN/LookSmart – AOL/Google Meta Search Engines Dogpile
Components of a Crawler-Based Search Engine Spider Software Index Software Query Software
Spiders Special software robots Build lists of the words found on Web sites A.k.a web crawlers Google’s web crawler is known as Googlebot
Web Crawling Starts by looking at heavily used servers and popular pages Indexes the words on its pages and follows every link found within the site Every search engine has a different method of indexing Some spiders look at an HTML page and note two things: The words within the page Where the words were found
Meta Tags Allows web creator to specify keywords and concepts which the page will be indexed Good as it clarifies the meaning of words with double meanings Danger is that a malicious owner might add meta tags that find very popular topics but having nothing to do with the actual content of the page Spiders verify that the meta tags and page content correlate Note: Google does not index the keyword meta tags
Index Catches everything the Spider can throw at it Constantly being updated as the spider collects more information Assign a weight to words and increase if it is found in the heading, meta tag, or numerous times Uses algorithm to rank the page. Each search engine uses a different algorithm which is why you get different results on Bing, Google, Yahoo, etc.
PageRank Formula created by Google’s founders Larry Page and Sergey Brin Rates a web page’s importance by looking at how many outside links point to it, and how important those links are Combine multiple factors to produce each page’s overall score Find your website’s PageRank by downloading Google’s toolbar and visiting your website.
Query This is what you see when you go to a search engine Doesn’t search the web – it checks the records that have been created by its own index software. And those records have been made possible by the raw data the spiders collect
How Google Processes a Query
How to get your page recognized Google provides free Webmaster Tools Must have a Google account in order to utilize See how Google crawls and indexes your site Find out how Google search queries drive traffic to your site
How to get your page recognized Most search engines have a page to submit websites Free but no guarantee is will be added to Google Open Directory – submit your website Link to other sites and have them link back to your site Facebook and Twitter
How to get your page recognized Pay per Click You’re charged only if someone clicks your ad, not when your ad is displayed. Display your ad on someone else’s website then pay the owner when someone clicks your ad. Google AdWords Microsoft AdCenter
Stomp on those Spiders Robots meta tag Sites that redirect before showing content Not only will it not be indexed but it could get your site banned See html html
Questions?