SEO 101 presented by Chris Silver Smith, Lead Strategist, Netconcepts.

Slides:



Advertisements
Similar presentations
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Advertisements

SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture.
Search Engine Optimization (SEO)
Presented By: Jeanne Foulon, President October 7, 2010 Presented By: Jeanne Foulon, President October 7, 2010 Expanding Your Website’s Reach New Jersey.
© 2008 Chris Silver Smith Netconcepts SEO 101 presented by Chris Silver Smith, Lead Strategist, Netconcepts.
Search Engines & Search Engine Optimization (SEO) Presentation by Saeed El-Darahali 7 th World Congress on the Management of e-Business.
Mark Branom Continuing Studies CS 22. Agenda  What is SEO?  The Seven Steps to SEO  Get Your Site Fully Indexed  Get Your Pages Visible  Build Links.
The process of increasing the amount of visitors to a website by ranking high in the search results of a search engine.
© 2008 Stephan M Spencer Netconcepts SEO 101 presented by Stephan Spencer, Founder & President, Netconcepts.
Search engine marketing MARK 430. After today’s class you will be able to:  Distinguish between search engine optimization and search engine advertising.
Presentation by Jon Payne President, Ephricon Web Marketing Search Engine Marketing Presentation by Jon Payne President, Ephricon Web Marketing What is.
Search Engine Optimization (SEO)
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
SEO from the Ground Up! Jack Roberts President and CEO of Peak Positions.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
Todd Friesen April, 2007 SEO Workshop Web 2.0 Expo San Francisco.
SEO Webinar - With Neil Palmer of IM3.co.uk In Partnership with Huddlebuy How do I improve my website traffic with SEO? Covering: What is SEO? Why is SEO.
An introductory presentation Webinar: Search Engine Optimisation.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide Search Engine Optimization.
© 2008 Stephan M Spencer Netconcepts Website/SEO 101 presented by Stephan Spencer, Founder & President, Netconcepts.
SEO Part 1 Search Engine Marketing Chapter 5 Instructor: Dawn Rauscher.
Search Engine Marketing Shelly Brown Director of Web Services Southwest Baptist University.
Presented by Karen Porter UM School of Business Administration & ImpactOnlineMarketing.com The Importance of SEO (Search Engine Optimization) ImpactOnlineMarketing.com.
Not a Member of the eMA Join at this seminar and receive $25.00 off any membership category Associate Professional Corporate Details at the registration.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
© 2006 Stephan M Spencer Netconcepts Search Engine Marketing by Stephan Spencer President, Netconcepts.
Search Engine Optimization ext 304 media-connection.com The process affecting the visibility of a website across various search engines to.
Search Engines & Search Engine Optimization (SEO).
Crawling Slides adapted from
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
 Search Engine Search Engine  Steps to Search for webpages pertaining to a specific information Steps to Search for webpages pertaining to a specific.
Search Engine Optimization (SEO) …a brief introduction Billy Howard,
Search Engine Optimization & Pay Per Click Advertising
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
Search Engine Optimization 101 What is SEM? SEO? How can I use SEO on my blogs and/or my personal web space?
Lecture 6 Title: Web Planning, Designing, Developing for E-Marketing By: Mr Hashem Alaidaros MKT 445.
Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture.
Search Engines By: Faruq Hasan.
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
Presented by Karen Porter UM School of Business Administration & ImpactOnlineMarketing.com Keywords: Research & Optimization ImpactOnlineMarketing.com.
Ten Tips for Search Engine Marketing Stephan Spencer President, Netconcepts
Week 1 Introduction to Search Engine Optimization.
Internet Marketing Strategies Proposal for Lucas Color Cards.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
Search Engine Optimization (SEO) Presentation By Celina Jonesi Small Business Seo – KG Tech.
© 2005 Stephan M Spencer Netconcepts Search Engine Optimisation: Black Art or Sweet Science?
Search Engine Marketing Science Writers Conference 2009.
CHAPTER 16 SEARCH ENGINE OPTIMIZATION. LEARNING OBJECTIVES How to monitor your site’s traffic What are the pros and cons of keyword advertising within.
Presentation by Sunitha SEO Company in India- KG Tech
How To Market Disaster Restoration Services in The Internet Era
Search Engine Optimization
SEARCH ENGINE OPTIMIZATION.
Search Engine Marketing
Search Engine Optimization
Search Engine Optimization (SEO)
SEARCH ENGINE OPTIMIZATION SEO. What is SEO? It is the process of optimizing structure, design and content of your website in order to increase traffic.
SEARCH ENGINE OPTIMIZATION. P RESENTATION O VERVIEW  Search Engine Basics  What is SEO?  Key Concepts  Why is Search Engine marketing important? 
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Objective % Explain concepts used to create websites.
Getting Google to Love Your Website:
Search Engine Optimization (SEO)
Maximizing Exposure for Your Non-Profit
Search Engine Marketing
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Objective Explain concepts used to create websites.
Presentation transcript:

SEO 101 presented by Chris Silver Smith, Lead Strategist, Netconcepts

Today’s Agenda  What is SEO?  The Seven Steps to Higher Rankings –Get Your Site Fully Indexed –Get Your Pages Visible –Build Links & PageRank –Leverage Your PageRank –Encourage Clickthrough –Track the Right Metrics –Avoid Worst Practices

Part 1: What Is SEO?

Everything Revolves Around Search

Search Engine Optimization  6 times more effective than a banner ad  Delivers qualified leads  80% of Internet user sessions begin at the search engines (Source: Internetstats.com)  55% of online purchases are made on sites found through search engine listings (Source: Internetstats.com)

SEO is NOT Paid Advertising  SEO – “Search Engine Optimization” – seeks to influence rankings in the “natural” (a.k.a. “organic”, a.k.a. “algorithmic”) search results  PPC – paid search advertising on a pay-per- click basis. The more you pay, the higher your placement. Stop paying = stop receiving traffic.  SEM – encompasses both SEO and PPC

Natural Paid

Google Listings – Your Virtual Sales Force  Savvy retailers making 6-7 figures a month from natural listings  Savvy MFA (Made for AdSense) site owners making 5-6 figures per month  Most sites are not SE-friendly  Google friendliness = friendly to other engines  First calculate your missed opportunities

Not doing SEO? You’re Leaving Money on the Table  Calculate the missed opportunity cost of not ranking well for products and services that you offer? # of people searching for your keywords x engine share (Google = 60%) x expected click- through rate average conversion rate average transaction amount xx  E.g.10,000/day x 60% x 10% x 5% x $100 = $3,000/day

Most Important Search Engines  Google (also powers AOL Search, Netscape.com, iWon, etc.) – 60.29% market share (Source: Hitwise)  Yahoo! (also powers Alltheweb, AltaVista, Hotbot, Lycos, CNN, A9) – 22.58%  Live Search (formerly MSN Search) – 11.56%  Ask – 3.63%

People Are Googling… Are You There?

What Are Searchers Looking For?  Keyword Research –“Target the wrong keywords and all your efforts will be in vain.”  The “right” keywords are… –relevant to your business –popular with searchers

Keyword Research  Tools to check popularity of keyword searches –WordTracker.com –Trellian’s KeywordDiscovery.com –Google’s Keyword Tool –Google Trends –Google Suggest

WordTracker.com  Pros –Based on last 60 days worth of searches –Singular vs plural, misspellings, verb tenses all separated out –Advanced functionality: keyword “projects”, import data into Excel, synonyms, …  Cons –Requires subscription fee ($260/year) –Data is from a small sample of Internet searches (from the minor search engines Dogpile and MetaCrawler) –Contains bogus data from automated searches –No historical archives

Keyword Popularity – According to WordTracker

Trellian’s KeywordDiscovery.com  Pros –Full year of historical archives –Data is from a large sample of Internet searches (9 billion searches compiled from 37 engines) –Singular vs plural, misspellings, verb tenses all separated out –Can segment by country –Advanced functionality: keyword “projects”, import data into Excel, synonyms, …  Cons –Access to the historical data requires subscription fee (~$30/month) –Contains bogus data from automated searches

Keyword Popularity – According to KeywordDiscovery

Google AdWords Keyword Tool  Pros –Free! (Must have an AdWords account though) –Data is from a large sample of Internet searches (from Google) –Singular vs plural, misspellings, verb tenses all separated out –Can segment by country –Synonyms  Cons –No hard numbers  Augment this tool with other free Google tools: –Google Suggest (labs.google.com/suggest) –Google Trends (

Keyword Popularity – According to Google AdWords Keyword Tool

Keyword Popularity – According to Google Trends

Keyword Popularity – According to Google Suggest

Keyword Research  Competition for that keyword should also be considered –Calculate KEI Score (Keyword Effectiveness Indicator) = ratio of searches over number of pages in search results –The higher the KEI Score, the more attractive the keyword is to target (assuming it’s relevant to your business)

SEO. A Moving Target.  A lot is changing… –Personalization & customization –Vertical search services (Images, Video, News, Maps, etc.) –“Blended Search” – aka “Universal Search”  Fortunately, the tried-and-true tactics still work… –Topically relevant links from important sites –Anchor text –Keyword-rich title tags –Keyword-rich content –Internal hierarchical linking structure –The whole is greater than the sum of the parts

The Search Engines Are Your Friend  Sitemaps.org  Webmaster tools (e.g. Google Webmaster Central, Live Search Webmaster Center, Yahoo Site Explorer)  Rel=“nofollow” tag  Meta NOODP tag  Speaking at search engine conferences  Publishing blogs dedicated to helping webmasters  Participating on SEO forums  “Weather reports”

SEO Can Dramatically Improve Traffic/Sales Non-optimal site with few pages indexed Optimized site with many pages indexed Few visits/sales per dayMany visits/sales per day! $ =

Part 2: Seven Steps to High Rankings

Begin The 7 Steps 1) Get Your Site Fully Indexed 2) Get Your Pages Visible 3) Build Links & PageRank 4) Leverage Your PageRank 4) Encourage Clickthrough 6) Track the Right Metrics 7) Avoid Worst Practices

1) Get Your Site Fully Indexed  Search engines are wary of “dynamic” pages - they fear “spider traps”  Avoid stop characters (?, &, =) ‘cgi-bin’, session IDs and unnecessary variables in your URLs; frames; redirects; pop-ups; navigation in Flash/Java/Javascript/pulldown boxes –If not feasible due to platform constraints, can be easily handled through proxy technology (e.g. GravityStream)  The better your PageRank, the deeper and more often your site will be spidered

 Lack of links down into all content pages; Common Barriers to Spidering: ! Non-textlink sitewide navigation; Search-form-only; Javascript/Java-only (ie: dynamic menus); Flash apps / Splash pages; Non-optimal URL formats of pages; URL Framed Content

1) Get Your Site Fully Indexed (cont’d)  Page # estimates are wildly inaccurate, and include non-indexed pages (e.g. ones with no title or snippet)  Misconfigurations (in robots.txt, in the type of redirects used, requiring cookies, etc.) can kill indexation  Keep your error pages out of the index by returning 404 status code  Keep duplicate pages out of the index by standardizing your URLs, eliminating unnecessary variables, using 301 redirects when needed

Results in “Supplemental Hell”

Example of Non-Optimal Site Design: PR 8 site has META refresh redirect on homepage – spiders not redirected to destination page: Also, indexed “shell” page doesn’t contain any text content!

Not Spider-Friendly  GET --> 302 Moved Temporarily  GET --> 302 Moved Temporarily  GET =http%3A%2F%2Fwww.bananarepublic.com%2Fbrowse%2 Fhome.do&CookieSet=Set --> 302 Moved Temporarily  GET --> 200 OK

2) Get Your Pages Visible  100+ “signals” that influence ranking  “Title tag” is the most important copy on the page  Home page is the most important page of a site  Every page of your site has a “song” (keyword theme)  Incorporate keywords into title tags, hyperlink text, headings (H1 & H2 tags), alt tags, and high up in the page (where they’re given more “weight”)  Eliminate extraneous HTML code  “Meta tags” are not a magic bullet  Have text for navigation, not graphics

Pretty good title

Not so good title – where’s the phrase “credit card”?

Dynamic Site Leaves Session IDs in URLs: Titles not specific to page (“jewelry”) Main text on page just nav labels – Little text/keyword content

Good link text and body copy

No link text or body copy

Take a peek under the hood

The “meta tags”

Unnecessarily bloated HTML

3) Build Links and PageRank  “Link popularity” affects search engine rankings  PageRank™ - Links from “important” sites have more impact on your Google rankings (weighted link popularity)  Google offers a window into your PageRank –PageRank meter in the Google Toolbar (toolbar.google.com) –Google Directory (directory.google.com) category pages –3rd party tools like SEOChat.com’s “PageRank Lookup” & “PageRank Search”  Scores range from 0-10 on a logarithmic scale  Live Search and Yahoo have similar measures to PageRank™

Google’s Toolbar – with handy PageRank Meter

Google Directory – listings are organized by PageRank

Conduct any Google query and get results organized by PageRank

4) Leverage Your PageRank  Your home page’s PageRank gets distributed to your deep pages by virtue of your hierarchical internal linking structure (e.g. breadcrumb nav)  Pay attention to the text used within the hyperlink (“Google bombing”)  Don’t hoard your PageRank  Don’t link to “bad neighborhoods”

Ideal internal site linking hierarchies: Homepages often will be highest-ranking site pages since they typically have most inbound links. Good link trees inform search engines about which site pages are most important. *Sitemaps can also be used to tell SEs about pages, & to define relative priority.

4) Leverage Your PageRank  Avoid PageRank dilution –Canonicalization ( vs. domain.com) –Duplicate pages: (session IDs, tracking codes, superfluous parameters) –In general, search engines are cautious of dynamic URLs (with ?, &, and = characters) because of “spider traps” Rewrite your URLs (using a server module/plug-in) or use a hosted proxy service (e.g. GravityStream) See

Duplicate pages

Googlebot got caught in a “spider trap”

Search engine spiders turn their noses up at such URLs

Thus, important content doesn’t make it into the search engine indices

 Zipf’s Law applies - you need to be at the top of page 1 of the search results. It’s an implied endorsement.  Synergistic effect of being at the top of the natural results & paid results  Entice the user with a compelling call-to-action and value proposition in your descriptions  Your title tag is critical  Snippet gets built automatically, but you CAN influence what’s displayed here 5) Encourage Clickthrough

Where do searchers look? (Enquiro, Did-it, Eyetools Study)

Search listings – 1 good, 1 lousy

6) Track the Right Metrics  Indexation: # of pages indexed, % of site indexed, % of product inventory indexed, # of “fresh pages”  Link popularity: # of links, PageRank score (0 - 10)  Rankings: by keyword, “filtered” (penalized) rankings  Keyword popularity: # of searches, competition, KEI (Keyword Effectiveness Indicator) scores  Cost/ROI: sales by keyword & by engine, cost per lead

Indexation tool –

Link popularity tool –

Avoid Worst Practices  Target relevant keywords  Don’t stuff keywords or replicate pages  Create deep, useful content  Don't conceal, manipulate, or over-optimize content  Links should be relevant (no scheming!)  Observe copyright/trademark law & Google’s guidelines

Spamming in Its Many Forms…  Hidden or small text  Keyword stuffing  Targeted to obviously irrelevant keywords  Automated submitting, resubmitting, deep submitting  Competitor names in meta tags  Duplicate pages with minimal or no changes  Spamglish  Machine generated content

Spamming in Its Many Forms…  Pagejacking  Doorway pages  Cloaking  Submitting to FFA (“Free For All”) sites & link farms  Buying up expired domains with high PageRanks  Scraping  Splogging (spam blogging)

BMW.de hosted many “doorway pages” like this one

“Sneaky redirect” sent searchers to this page

Not Spam, But Bad for Rankings  Splash pages, content-less home page, Flash intros  Title tags the same across the site  Error pages in the search results (eg “Session expired”)  "Click here" links  Superfluous text like “Welcome to” at beginning of titles  Spreading site across multiple domains (usually for load balancing)  Content too many levels deep

What Next? Conduct an SEO Audit!  Is your site fully indexed?  Are your pages fully optimized?  Could you be acquiring more PageRank?  Are you spending your PageRank wisely?  Are you maximizing your clickthrough rates?  Are you measuring the right things?  Are you applying “best practices” in SEO and avoiding all the “worst practices”?

Review Errors/Messages in Webmaster Tools

Content Opt Opportunities via Webmaster Tools

Check Robots.txt Exclusions in Webmaster Tools

Case Study: Homestead.com  What worked –Comprehensive SEO & usability audit –Intensive on-site training sessions with their IT and marketing teams –6 months of support  What didn’t work –No significant changes to the look of the home page were allowed for political reasons, significantly reducing the options available

Case Study: Homestead.com  Results –Within 8 weeks of launch of some preliminary optimization work, on page 1 for “website hosting” in Google –With our audit as a blueprint, later that year launched an internally built site redesign which landed them the #1 Google position for “website hosting” –Consistently held #1 position for 2 years

In Summary  Focus on the right keywords  Have great keyword-rich content  Build links, and thus your PageRank™  Spend that PageRank™ wisely within your site  Measure the right things  Continually monitor and benchmark

Further Reading  blogs.cnet.com/seosearchlight  google.com/support/webmasters/bin/answer.py?answer=35769    googlewebmastercentral.blogspot.com  blog.outer-court.com    

Q&A! Special White Papers Available: Image Search Optimization Local Search Optimization Tactics New Link Building Paradigms Online Marketing Tips for Universities Tips & Tricks for Local Search Ads

Search Engine Optimization (SEO)

Agenda What is a Search Engine? Examples of popular Search Engines Search Engines statistics Why is Search Engine marketing important? What is a SEO Algorithm? Steps to developing a good SEO strategy Ranking factors Basic tips for optimization

Examples popular Search Engines

How Do Search Engines Work? Mechanics of a typical search

Results & ads returned ranked

Category of first result

Result for phrase query

How Do Search Engines Work?  Spider “crawls” the web to find new documents (web pages, other documents) typically by following hyperlinks from websites already in their database  Search engines indexes the content (text, code) in these documents by adding it to their databases and then periodically updates this content  Search engines search their own databases when a user enters in a search to find related documents (not searching web pages in real-time)  Search engines rank the resulting documents using an algorithm (mathematical formula) by assigning various weights and ranking factors

Search on the Web  Corpus: The publicly accessible Web: static + dynamic  Goal: Retrieve high quality results relevant to the user’s need –(not docs!)  Need –Informational – want to learn about something –Navigational – want to go to that page –Transactional – want to do something (web-mediated) Access a service Downloads Shop –Gray areas Find a good hub Exploratory search “see what’s there” Low hemoglobin United Airlines Tampere weather Mars surface images Nikon CoolPix Car rental Finland Abortion morality

Search Engines as Info Gatekeepers  Search engines are becoming the primary entry point for discovering web pages.  Ranking of web pages influences which pages users will view.  Exclusion of a site from search engines will cut off the site from its intended audience.  The privacy policy of a search engine is important.

100+ Billion Searches / Month

Search Engine Wars  The battle for domination of the web search space is heating up!  The competition is good news for users!  Crucial: advertising is combined with search results!  What if one of the search engines will manage to dominate the space?

Yahoo!  Synonymous with the dot-com boom, probably the best known brand on the web.  Started off as a web directory service in 1994, acquired leading search engine technology in  Has very strong advertising and e-commerce partners

Lycos!  One of the pioneers of the field  Introduced innovations that inspired the creation of Google

Google  Verb “google” has become synonymous with searching for information on the web.  Has raised the bar on search quality  Has been the most popular search engine in the last few years.  Had a very successful IPO in August  Is innovative and dynamic. Google.com is registered as a domain on September 15, The name—a play on the word "googol," a mathematical term for the number represented by the numeral 1 followed by 100 zeros—reflects Larry and Sergey's mission to organize a seemingly infinite amount of information on the web.

Live Search ( was: MSN Search)  Synonymous with PC software.  Remember its victory in the browser wars with Netscape.  Developed its own search engine technology only recently, officially launched in Feb  May link web search into its next version of Windows.

Important? 80% of consumers find your website by first writing a query into a box on a search engine (Google, Yahoo, Bing) 90% choose a site listed on the first page 85% of all traffic on the internet is referred to by search engines The top three organic positions receive 59% percent of user clicks. Cost-effective advertising Clear and measurable ROI Operates under this assumption: More (relevant) traffic + Good Conversions Rate = More Sales/Leads

Experiment with query syntax  Default is AND, e.g. “computer chess” normally interpreted as “computer AND chess”, i.e. both keywords must be present in all hits.  “+chess” in a query means the user insists that “chess” be present in all hits.  “computer OR chess” means either keywords must be present in all hits.  “”computer chess”” means that the phrase “computer chess” must be present in all hits.

The most popular search keywords AltaVista (1998)AlltheWeb (2002)Excite (2001) sexfree appletsex pornodownloadpictures mp3softwarenew chatuknude

Free Keyword Research Tools – &__o=te&ideaRequestType=KEYWORD_IDEAS#search.none &__o=te&ideaRequestType=KEYWORD_IDEAS#search.none – Keyword Tool and Traffic Estimator to identify competitive phrases and search frequencies – – Compare search patterns across specific regions, categories, time frames and properties

Web search Users  Ill-defined queries –Short length –Imprecise terms –Sub-optimal syntax (80% queries without operator) –Low effort in defining queries  Wide variance in –Needs –Expectations –Knowledge –Bandwidth  Specific behavior –85% look over one result screen only –mostly above the fold –78% of queries are not modified 1 query/session –Follow links – “the scent of information”...

How far do people look for results?

Architecture of a Search Engine The Web Ad indexes Web spider Indexer Indexes Search User

Web Crawling 108

Q: How does a search engine know that all these pages contain the query terms? A: Because all of those pages have been crawled 109

Crawling picture Web URLs frontier Unseen Web Seed pages URLs crawled and parsed Sec

Motivation for crawlers  Support universal search engines (Google, Yahoo, MSN/Windows Live, Ask, etc.)  Vertical (specialized) search engines, e.g. news, shopping, papers, recipes, reviews, etc.  Business intelligence: keep track of potential competitors, partners  Monitor Web sites of interest  Evil: harvest s for spamming, phishing…  … Can you think of some others?… 111

A crawler within a search engine 112 Web Text indexPageRank Page repository googlebot Text & link analysis Query hits Ranker

One taxonomy of crawlers  Many other criteria could be used: –Incremental, Interactive, Concurrent, Etc. 113

Basic crawlers  This is a sequential crawler  Seeds can be any list of starting URLs  Order of page visits is determined by frontier data structure  Stop criterion can be anything

Graph traversal (BFS or DFS?)  Breadth First Search –Implemented with QUEUE (FIFO) –Finds pages along shortest paths –If we start with “good” pages, this keeps us close; maybe other good stuff…  Depth First Search –Implemented with STACK (LIFO) –Wander away (“lost in cyberspace”) 115

Universal crawlers  Support universal search engines  Large-scale  Huge cost (network bandwidth) of crawl is amortized over many queries from users  Incremental updates to existing index and other data repositories 116

Large-scale universal crawlers  Two major issues: 1.Performance Need to scale up to billions of pages 2.Policy Need to trade-off coverage, freshness, and bias (e.g. toward “important” pages) 117

Large-scale crawlers: scalability  Need to minimize overhead of DNS lookups  Need to optimize utilization of network bandwidth and disk throughput (I/O is bottleneck)  Use asynchronous sockets –Multi-processing or multi-threading do not scale up to billions of pages –Non-blocking: hundreds of network connections open simultaneously –Polling socket to monitor completion of network transfers 118

Universal crawlers: Policy  Coverage –New pages get added all the time –Can the crawler find every page?  Freshness –Pages change over time, get removed, etc. –How frequently can a crawler revisit ?  Trade-off! –Focus on most “important” pages (crawler bias)? –“Importance” is subjective 119

Web coverage by search engine crawlers This assumes we know the size of the entire the Web. Do we? Can you define “the size of the Web”?

Maintaining a “fresh” collection  Universal crawlers are never “done”  High variance in rate and amount of page changes  HTTP headers are notoriously unreliable –Last-modified –Expires  Solution –Estimate the probability that a previously visited page has changed in the meanwhile –Prioritize by this probability estimate 121

Do we need to crawl the entire Web?  If we cover too much, it will get stale  There is an abundance of pages in the Web  For PageRank, pages with very low prestige are largely useless  What is the goal? –General search engines: pages with high prestige –News portals: pages that change often –Vertical portals: pages on some topic  What are appropriate priority measures in these cases? Approximations? 122

Complications  Web crawling isn’t feasible with one machine –All of the above steps distributed  Malicious pages –Spam pages –Spider traps – incl dynamically generated  Even non-malicious pages pose challenges –Latency/bandwidth to remote servers vary –Webmasters’ stipulations How “deep” should you crawl a site’s URL hierarchy? –Site mirrors and duplicate pages  Politeness – don’t hit a server too often Sec

ROBOT.TXT 124 your guide for the search engines

What is robots.txt? It’s a file in the root of your website that can either allow or restrict search engine robots from crawling pages on your website.

How does it work? Before a search engine robot crawls your website, it will first look for your robots.txt file to find out where you want them to go. There are 3 things you should keep in mind:  Robots can ignore your robots.txt. Malware robots scanning the web for security vulnerabilities, or address harvesters used by spammers, will not care about your instructions.  The robots.txt file is public. Anyone can see what areas of your website you don’t want robots to see.  Search engines can still index (but not crawl) a page you’ve disallowed, if it’s linked to from another website. In the search results it’ll then only show the url, but usually no title or information snippet. Instead, make use of the robots meta tag for that page.

What to put in your robots.txt file  User-agent: This is the line where you define which robot you’re talking to. It’s like saying hello to the robot: User-agent: * (Googlebot - Google, Slurp – Yahoo)  Disallow: This tells the robots what you don’t want them to crawl on your site: Disallow: / (do not crawl anything on my site) /images/  Allow This tells the robots what you want them to crawl on your site. Allow: /

What to put in your robots.txt file  (Asterisk / wildcard *) With the * symbol, you tell the robots to match any number of any characters. Very useful for example when you don’t want your internal search result pages to be indexed. Disallow: *contact* (do not crawl any urls containing the word contact)  $ (Dollar sign / ends with) The dollar sign tells the robots that it is the end of the url. Disallow: *.pdf$  # (Hash / comme You can add comments after the “#” symbol, either at the start of a line or after a directive.

What to put in your robots.txt file  Crawl-Delay This directive asks the robot to wait a certain amount of seconds after each time it’s crawled a page on your website.. Crawl-delay: 5  Request-rate: Here you tell the robot how many pages you want it to crawl within a certain amount of seconds. The first number is pages, and the second number is seconds. Request-rate: 1/5 # load 1 page per 5 seconds  Visit-time: It’s like opening hours, i.e. when you want the robots to visit your website. This can be useful if you don’t want the robots to visit your website during busy hours (when you have lots of human visitors). Visit-time: # only visit between 21:00 (9PM) and 05:00 (5AM) UTC (GMT)

Test your page

SEO 131 Search engine optimization

What is SEO?  SEO = Search Engine Optimization –Refers to the process of “optimizing” both the on- page and off-page ranking factors in order to achieve high search engine rankings for targeted search terms. –Refers to the “industry” that has been created regarding using keyword searching a a means of increasing relevant traffic to a website

What is a SEO Algorithm?  Top Secret! Only select employees of a search engines company know for certain  Reverse engineering, research and experiments gives SEOs (search engine optimization professionals) a “pretty good” idea of the major factors and approximate weight assignments  The SEO algorithm is constantly changed, tweaked & updated  Websites and documents being searched are also constantly changing  Varies by Search Engine – some give more weight to on-page factors, some to link popularity

A good SEO strategy:  Research desirable keywords and search phrases (WordTracker, Overture, Google AdWords)WordTrackerOvertureGoogle AdWords  Identify search phrases to target (should be relevant to business/market, obtainable and profitable)  “Clean” and optimize a website’s HTML code for appropriate keyword density, title tag optimization, internal linking structure, headings and subheadings, etc.  Help in writing copy to appeal to both search engines and actual website visitors  Study competitors (competing websites) and search engines  Implement a quality link building campaign  Add Quality content  Constant monitoring of rankings for targeted search terms

Ranking factors  On-Page Factors (Code & Content) #3 - Title tags #5 - Header tags #4 - ALT image tags #1 - Content, Content, Content (Body text) #6 - Hyperlink text #2 - Keyword frequency & density  Off-Page Factors #1 Anchor text #2 - Link Popularity (“votes” for your site) – adds credibility

What a Search Engine Sees  View > Source (HTML code)

Pay Per Click PPC ads appear as “sponsored listings” Companies bid on price they are willing to pay “per click” Typically have very good tracking tools and statistics Ability to control ad text Can set budgets and spending limits Google AdWords and Overture are the two leaders Google AdWordsOverture

PPC vs. “Organic” SEO Pay-Per-Click“Organic” SEO results in 1-2 days easier for a novice or one little knowledge of SEO ability to turn on and off at any moment generally more costly per visitor and per conversion fewer impressions and exposure easier to compete in highly competitive market space (but it will cost you) Ability to generate exposure on related sites (AdSense) ability to target “local” markets better for short-term and high-margin campaigns results take 2 weeks to 4 months requires ongoing learning and experience to achieve results very difficult to control flow of traffic generally more cost-effective, does not penalize for more traffic SERPs are more popular than sponsored ads very difficult to compete in highly competitive market space ability to generate exposure on related websites and directories more difficult to target local markets better for long-term and lower margin campaigns

Keys to Successful SEO Strategy 1. Do not underestimate the importance of keyword research 2. Be sure to include the proper tags in your page coding 3. You must have optimized content! (3-5 uses of keyword per 250 words) 4. Use content marketing

Keyword Selection Marketing/Brand Relevance Search Frequency Competition Optimization Opportunity How closely does the keyword match your product/service offering, messaging, goals and objectives? How much competition (large, authority sites) is there for the particular keyword? Is there already a logical place on the site to optimize for the particular keyword? How many people are searching on the particular keyword?