© 2009 Stephan M Spencer Netconcepts Duplicate Content & The Canonical Tag By Stephan Spencer, President &

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

Search Engine Optimization (SEO) Guideline Powered by DonorCommunity TM DonorCommunity eLearning Series v1.2, February 2012 Search Engine Optimization.
PHP Meetup - SEO 2/12/2009. Where to Focus? Ensuring the findability of content Ensuring content is well understood by search engines Maximizing the importance.
Performing a Technical SEO Audit. Audit SEO - plan de actiune Overview Gather Data Analyze Present Results.
SEO Best Practices with Web Content Management Brent Arrington, Services Developer, Hannon Hill Morgan Griffith, Marketing Director, Hannon Hill 2009 Cascade.
© 2008 Stephan M Spencer Netconcepts SEO Mistakes Most Bloggers Make By Stephan Spencer, Founder & President,
Marketer to Developer Translation SMX West February 9, 2009.
© 2008 Stephan M Spencer Netconcepts Unraveling URLs and Demystifying Domains presented by Stephan Spencer,
Why we need to think about SEO before and during project design rather than trying to bolt it on afterwards Why we need to think about.
The process of increasing the amount of visitors to a website by ranking high in the search results of a search engine.
The Trouble With Content Management Systems is... Stephan Spencer & Eric Enge Co-Authors of The Art of SEO.
Search Engine Optimization March 23, 2011 Google Search Engine Optimization Starter Guide.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
DIRECT MARKETING Saket Kandoi Tanja Janjilovic Katarina Matkovic Jusa Neza Mihelcic Jessica Dávila Kaja Vidic IT4Everybody.
Todd Friesen April, 2007 SEO Workshop Web 2.0 Expo San Francisco.
 What I hate about you things people often do that hurt their Web site’s chances with search engines.
The Technical SEO Audit Rick Ramos | seOveflow. Introduction  SEO is search engine usability.  Why do you need an audit?  How nimble are your development.
SEO Webinar - With Neil Palmer of IM3.co.uk In Partnership with Huddlebuy How do I improve my website traffic with SEO? Covering: What is SEO? Why is SEO.
By Raza / Faisal By: Raza Usmani Faisal Khan. What is SEO? It is the process of affecting the visibility of a website or a web page in a search engine's.
© 2008 Stephan M Spencer Netconcepts SEO Tactics & Metrics for the Long Tail By Stephan Spencer, Founder &
Wordpress SEO. Your Own Website If you want your own website, we have designed Wordpress website templates that you can purchase that have pretty much.
© 2009 Stephan M Spencer Netconcepts 301 Redirect: How Do I Love You, Let Me Count the Ways presented by Stephan.
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Search Engine Optimization: Understanding the Engines & Building Successful Sites Zohaib Ahmed Google Analytics Individual Qualified March 2012.
SEO. Self Exploding Organs SEO Search Engine Optimisation By Joey Cannon.
Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide May 16 – 18, 2007 Copyright 2007, Data Access Worldwide Search Engine Optimization.
© 2008 Stephan M Spencer Netconcepts Bot Herding presented by Stephan Spencer, Founder & President, Netconcepts.
IPUB 100 Lesson 2 Instructor Mark Lamontagne Homework Review.
Blog Monetization: Soup to Nuts Stephan Spencer, Founder & President, Netconcepts.
Wasim Rangoonwala ID# CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when,
Real World Examples – Part II 7/26/2013Miro Remias, Sr. Solution Architect.
© 2005 Stephan M Spencer Netconcepts RSS, Blogs and Search Marketing: Leveraging the Power of RSS.
© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes.
© 2006 Stephan M Spencer Netconcepts Site Architecture and Internal Linking By Stephan Spencer, Founder &
Web Optimization. So how does your site get into a search engine? 1 A search engine obtains your URL either by you submitting your site directly to the.
© 2006 Stephan M Spencer Netconcepts Search Engine Marketing by Stephan Spencer President, Netconcepts.
آموزش طراحی وب سایت جلسه پانزدهم – بهینه سازی برای موتور جستجو تدریس طراحی وب برای اطلاعات بیشتر تماس بگیرید تاو شماره تماس: پست.
© 2007 Stephan M Spencer Netconcepts Web Site Monetization Make Money While You Sleep.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
Search Engine Optimization. Search Engines ≈50% your new users are from a search engine ≈50% are returning users Many repeat viewers will return using.
Design, Development & SEO: Building Search-Friendly Websites Justin Briggs SEO Manager at Paramore|Redd.
© 2008 Stephan M Spencer Netconcepts Best-Kept Secrets for Search Marketing Success Stephan Spencer, Founder.
1 Know the Game 2 SEO-Friendly Architecture Example of Good Organization & Content: This Presentation ● NOT reliant on Flash or JavaScript ● Keyword-rich.
keyword research – corporate training – private coaching Argh! We’ve Been Duped! Dan Thies, SEO Research Labs.
“How to INDEX and Rank Your Post in Google in Under 60 Seconds!” ~by Brian Cain.
 顾惠明  江苏苏州农村  理工男  从事 IT15 年 ( 2000-now ) 搞相关流量.
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
Ten Tips for Search Engine Marketing Stephan Spencer President, Netconcepts
© 2007 Stephan M Spencer Netconcepts SEO Best Practices for Bloggers.
© 2007 Stephan M Spencer Netconcepts Making Blogging and RSS Pay Off Driving Traffic and Sales Through Optimized.
Don’t look at Me!. There are situation when you don’t want search engines digging through some files or indexing some pages. You create a file in the.
Week 1 Introduction to Search Engine Optimization.
 SEO Terms A few additional terms Search site: This Web site lets you search through some kind of index or directory of Web sites, or perhaps both an.
Successful Site Architecture Matt Bailey SiteLogic
How to Perform Technical SEO Audit
Created By EZ Marketing Tech 1 +1 (347) | |
+ Responsive Technology Performance, efficiency and elegance are the three key elements that make our platform unique. Each of the features in this presentation.
Technical SEO tips for Web Developers Richa Bhatia Singsys Pte. Ltd.
SEO FOR REDESIGN Eric Werner. DON’T WAIT “ We are going to wait until the redesign is complete to work on SEO” No problem unless any of the following.
© 2005 Stephan M Spencer Netconcepts Search Engine Optimisation: Black Art or Sweet Science?
Adding a web site to your online presence...
Search Engine Optimization
CONTENT MANAGEMENT SYSTEM CSIR-NISCAIR, New Delhi
6 /30 Meeting and Deliverables
SEO Mistakes Most Bloggers Make
Search Search Engines Search Engine Optimization Search Interfaces
301 Redirect: How Do I Love You, Let Me Count the Ways
SEO בסביבת ג'ומלה 3 מי אני:.
SEO Hand Book.
Presentation transcript:

© 2009 Stephan M Spencer Netconcepts Duplicate Content & The Canonical Tag By Stephan Spencer, President & CEO, Netconcepts

© 2009 Stephan M Spencer Netconcepts The Canonical Tag  Influences your sitelinks in Google

© 2009 Stephan M Spencer Netconcepts Duplicate Content Mitigation  Is not just about removing competing duplicate pages  It’s about recovering the leaked PageRank too

© 2009 Stephan M Spencer Netconcepts PageRank Leakage  Noindexed or disallowed pages (via robots.txt directives or robots meta tags) still accumulate PageRank  If the page is allowed (via robots.txt) but meta robots noindexed, it also passes PageRank  Thankfully, when obeyed, the canonical tag aggregates PageRank

© 2009 Stephan M Spencer Netconcepts Tools for Collapsing Duplicates  The Canonical Tag –Great new addition to the SEO's arsenal, but not your best weapon –Works best when used in concert with other signals  301 Redirect –A much more absolute/automatically obeyed signal –Use instead of (or in addition to) the canonical tag

© 2009 Stephan M Spencer Netconcepts Tools for Collapsing Duplicates  XML Sitemaps –Include only your canonical versions in your feed –Used as a canonicalization signal by Google  Rel=Nofollow –On links pointing to the noncanonical versions –Nofollowed links aren’t even used for discovery by Google  Meta Robots Nofollow –blocks the flow of PageRank

© 2009 Stephan M Spencer Netconcepts PageRank Leakage Scenarios  Robots.txt disallow the dup page = PageRank is leaked to the duplicate, & it can show up in the SERPs  Meta robots noindex (or Robots.txt noindex) the dup = PageRank is leaked, won’t show up in the SERPs  Rel=nofollow on links to the dup = PageRank can still accumulate through other links & it can still be indexed  Meta robots nofollow the dup = PageRank that accumulates on the dup cannot be passed on

© 2009 Stephan M Spencer Netconcepts PageRank Leakage Scenarios  XML Sitemaps file only includes the canonical version = only used as a hint, dups may still be indexed  Canonical tag pointing to canonical version on all dups = only used as a hint, dups may still be indexed  301 all dups to the canonical version = removes dups, may have unintended side effects (e.g. breaking your site’s sorting capability)  Conditional 301 = removes dups, high risk

© 2009 Stephan M Spencer Netconcepts Canonical Tag Has Serious Limitations  It doesn't work cross-domain –Only within the domain. Cross-subdomain is supported though –This is by design, to thwart the element's use by spammers –Thus you can't use it to reduce dup content to typo domains that you own  It's only a hint, not an absolute directive –Google sometimes chooses not to follow it even though it clearly should –So it's not nearly as strong of a signal as a 301

© 2009 Stephan M Spencer Netconcepts Canonical Tag Misfires  NorthernSafety.com  Wikipedia

© 2009 Stephan M Spencer Netconcepts An Example in the Wild  Many thousands of non-canonical URLs of northernsafety.com are indexed, despite use of the canonical tag  For example, click on the listings on m/products/+inurl:protective-clothing and compare those URLs to what's listed as the canonical URL in the link tag in the HTML source of these pages  Canonical tags have been in place for several months

© 2009 Stephan M Spencer Netconcepts

What To Do?  So if the Canonical Tag can’t (yet) be trusted to work, what to do in addition / instead?  Some scenarios to consider... –Pagination –Faceted navigation –Affiliate or Click-tracked URLs –Near-duplicates –Country-specific versions on the same domain –Manufacturer-supplied product copy

© 2009 Stephan M Spencer Netconcepts Pagination  Excessive pagination dilutes “crawl equity”, causing numerous pages of product listings to not get crawled. Reduce # of pages in pagination system to improve crawlability & indexation  Next/Previous vs. page number list vs. Show All  Consider disallowing “View All” links and forcing spiders through subcat pages (the keyword-rich path). Display as many products per page as possible (max 120) within 150K file size.  Fewer products per subcat = fewer pagination pages to crawl at subcat level for max product indexation  1-3 pages pagination = useful for sending different keyword signals?

© 2009 Stephan M Spencer Netconcepts Faceted Navigation  Faceted navigation, a.k.a. guided navigation, provides clickable product inventory breakdowns, by brand, color, price range, etc. By doing so it creates into a huge number of permutations for the spiders to follow.  Problem exacerbated with clickable, resortable column headings  Nofollow all links leading to low (SEO) value facets, e.g. facets that do price range breakdown, re-sorting and re-pagination  Or collapse near-dup facets (canonical tags or revise link URLs)  Optimize URLs, title tags, etc. of high-value facets in an automated, scalable fashion (e.g. using GravityStream)

© 2009 Stephan M Spencer Netconcepts

Affiliate URLs  Rarely do they help your SEO, because 302 not 301  Run affiliate program in-house; use 301 and/or “canonical” tags. don't 301 conditionally. Canonical tag isn't necessary if doing 301  Third-party affiliate solutions (like Commission Junction) have a vested interest in not “playing ball” –Canonical tag won't help. PageRank lost at the 302.  Examples of affiliate networks that pass the PageRank to the merchant: LinkConnector, DirectTrack

© 2009 Stephan M Spencer Netconcepts Click-Tracked URLs  Here’s how to 301 static URLs with a tracking param appended to its canonical equivalent (minus the param) –RewriteCond %{QUERY_STRING} ^source=[a-z0-9]*$ –RewriteRule ^(.*)$ /$1? [L,R=301]  And for dynamic URLs... –RewriteCond %{QUERY_STRING} ^(.+)&source=[a-z0-9]+(&?.*)$ –RewriteRule ^(.*)$ /$1?%1%2 [L,R=301]

© 2009 Stephan M Spencer Netconcepts Click-Tracked URLs  Need to do some fancy stuff with cookies before 301ing? Invoke a script that cookies the user then 301s them to the canonical URL. –RewriteCond %{QUERY_STRING} ^source=([a-z0-9]*)$ –RewriteRule ^(.*)$ /cookiefirst.php?source=%1&dest=$1 [L]  Note the lack of a R=301 flag above. That’s on purpose. No need to expose this script to the user. Use a rewrite and let the script send the 301 after it’s done its work.

© 2009 Stephan M Spencer Netconcepts Legacy URLs  Got legacy dynamic URLs you’re trying to phase out after switching to static URLs? 301 them... –RewriteCond %{QUERY_STRING} id=([0-9]+) –RewriteRule ^get_product.php$ /products/%1.html? [L,R=301]  Switching to keyword URLs and the script can’t do anything with the keywords if passed as params? Use RewriteMap and have a lookup table as a text file. –RewriteMap prodmap txt:/home/someusername/prodmap.txt –RewriteRule ^/product/([0-9]+)$ ${prodmap:$1} [L,R=301]

© 2009 Stephan M Spencer Netconcepts Legacy URLs  What would the lookup table for the above rule look like? –1001 /products/canon-g10-digital-camera –1002 /products/128-gig-ipod-classic  DBM files are supported too. Faster than text file.  You could use a script that takes the requested input and delivers back its corresponding output. –RewriteMap prodmap prg:/home/someusername/mapscript.pl –RewriteRule ^/product/([0-9]+)$ ${prodmap:$1} [L,R=301]

© 2009 Stephan M Spencer Netconcepts Other Common Issues  Non-www and typo domains –(The example mentioned earlier...) –RewriteCond %{HTTP_HOST} !^www\.example\.com$ [NC] –RewriteRule ^(.*)$ [L,R=301]  HTTPS –(If you have a separate secure server, you can skip this first line) –RewriteCond %{HTTPS} on –RewriteRule ^catalog/(.*) [L,R=301]

© 2009 Stephan M Spencer Netconcepts Other Common Issues  If trailing slash is missing, add it –RewriteRule ^(.*[^/])$ /$1/ [L,R=301] –WordPress handles this by default. Yay WordPress!

© 2009 Stephan M Spencer Netconcepts Conditional 301s?  Risky territory! Read Redirects: Good, Bad & ConditionalRedirects: Good, Bad & Conditional  To selectively redirect bots that request URLs with session IDs to the URL sans session ID: –RewriteCond %{QUERY_STRING} PHPSESSID RewriteCond %{HTTP_USER_AGENT} Googlebot.* [OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [OR] RewriteCond %{HTTP_USER_AGENT} Slurp [OR] RewriteCond %{HTTP_USER_AGENT} Ask\ Jeeves RewriteRule ^(.*)$ /$1 [R=301,L]  browscap.ini provides spiders’ user agents

© 2009 Stephan M Spencer Netconcepts Conditional 301s?  Not necessary. Almost always another way (w/o using user agent or IP)  In the above example, simply 301 everybody – bots and humans alike – and stop appending PHPSESSID –See for more on this. –If you have to keep session IDs for functionality reasons, you could use a script to detect for whether the session has expired, and 301 the URL to the canonical equivalent if it has.

© 2009 Stephan M Spencer Netconcepts Near Duplicates, But Not Quite?  What if you can only optimize one version but not all versions? For example... –Let's say you have implemented a new URL structure and moved content over to the new URLs. The old URLs still pull up the content too, but the templates are different. The new version has better SEO (title tags are more keyword-rich, there are H1 headings, a couple sentences of intro copy, etc.), but it's the same product information. According to Matt Cutts, using the canonical tag to canonicalize the non- optimized version to the optimized version is high risk.

© 2009 Stephan M Spencer Netconcepts Country-specific Versions  Country specific versions on the same domain? Create separate "sites" within Google Webmaster Central for each country-specific directory. Then set the Geographical Targeting within each one.  Google doesn't view country-specific versions as duplicate content; Google's smarter than that.

© 2009 Stephan M Spencer Netconcepts Manufacturer-Supplied Product Copy  Distance yourself from the “thin affiliates”. Augment with substantial amount of unique, valuable content –Customer reviews - trapped/hidden within JavaScript in third- party reviews services like BazaarVoice and PowerReviews –Not “mashups” with Wikipedia, Twitter, & the usual suspects  "Uniquify" content. Not sufficient to shuffle the page's content around! Think about overlapping “shingles” –Scaling? Mechanical Turk, yes. Markov chains, no.  A nail in the coffin: same titles & meta descriptions

© 2009 Stephan M Spencer Netconcepts

Supplemental Hell?  The Supplemental Index still very much exists, and these dups are probably in there.  Does Google leave clues about what it considers to be non-canonical / not favored? –If only the Supplemental Result label were still supported! *sigh* –How about spidering activity? PageRank score? Omitted results? Cached date? Cached link missing?

© 2009 Stephan M Spencer Netconcepts Related Resources  navigating-mess navigating-mess      maximum-seo-impact maximum-seo-impact-12982

© 2009 Stephan M Spencer Netconcepts Thanks!  For a free faceted navigation audit, drop me your business card or your request to  To contact me: