keyword research – corporate training – private coaching Argh! We’ve Been Duped! Dan Thies, SEO Research Labs
keyword research – corporate training – private coaching A (little) about me years of SEO… Once held the #1 ranking on Infoseek for “sex” – for 18 minutes Make up your own joke Published “SEO Fast Start” in 2001 Started SEO Research Labs in Jan Author, SitePoint Search Engine Marketing Kit
keyword research – corporate training – private coaching Topics For Today Getting Duped vs. Duping Yourself Impacts on Traffic Reverse Cloaking & Spider Validation Changing & Rotating Content DMCA & Dupes Challenges to search engines
keyword research – corporate training – private coaching Defining The Problem Duplicate Content –The same content, presented on more than one URL –Most web sites do this to an extent vs. vs. Near-Duplicate –“Nearly the same…” –Search engines look for uniqueness Filtered from index vs. filtered from SERPs
keyword research – corporate training – private coaching Getting Duped vs. Duping Yourself Duping Yourself – See Other Sessions –Duplicate URLs –Shopping sites w/ duplicate product descriptions –Near-empty pages Getting Duped – You Are Here –Screen scrapers & “borrowing” –RSS Feeds (or did you do it to yourself?) –Proxy URLs
keyword research – corporate training – private coaching Impacts on Traffic Specific site: (omitted…) Duped: 10-15% of traffic is organic search De-Duped: 20-25% from organic search Revenue drop… “feelable.” This client is very good at PPC and other marketing, many sites would suffer far worse from a 50% drop in SEO referrals
keyword research – corporate training – private coaching Reverse Cloaking vs. Scrapers Simple user agent detection - If the user-agent is NOT a major SE spider, insert: –Screen scrapers that steal an entire page’s HTML get a page that will not be indexed. –Easily thwarted by someone who cares to, but reduces duplication by scraping substantially
keyword research – corporate training – private coaching Links By Proxy – An Old Trick Fun With Spam: Hack someone else’s site to create a link or redirect to one of your sites – either create a page or craft a URL using XSS attack… then link to it using a proxy URL. Woo-hoo!
keyword research – corporate training – private coaching Public Proxies
keyword research – corporate training – private coaching Proxy URLs As Duplicates Thousands of public anonymous proxy servers Every URL on the web can be duplicated by them Proxy-based duplicates, when linked to, can affect duplicate content filtering –Search Engine Spiders access proxy URLs too! Public proxies pass along the user-agent –IE version of site vs. Mozilla vs. Opera etc. –Googlebot, MSNBot, Slurp, Ask… But proxies use their own IP address –Check logs – do any “Googlebot” IPs resolve to proxies (e.g. webwarper.net)?
keyword research – corporate training – private coaching Spider Validation vs. Proxies When you get a request from a “search engine spider” user agent, check the requesting IP: –If the IP address is “owned” by the search engine, deliver the page –If the IP address is not owned by the search engine, deliver a different page, empty page, or 403 Forbidden –NSLookup is less reliable than checking ARIN’s WHOIS database –Store lists of good vs. bad IPs, to speed processing Yes, it’s really the SE’s bot, but coming to a proxy URL –So, you MUST block the request to avoid duplication –Warning: Danger – Danger – Danger! Use With Caution!
keyword research – corporate training – private coaching But What If They Get Through? Changing & Rotating Content –Testimonials –News & Headlines –Brute Force The most important page on your site is probably the home page, yet it’s probably the least often changed. How much is unique? How often to change? If the page changes every 24 hours, a proxy can only duplicate you for 24 hours + indexing lead time Our client is changing one paragraph of copy every 4 hours – 42 variations per week.
keyword research – corporate training – private coaching Monitoring Dupes Set up monitoring for a “signature SERP” –Text that is unique to your page or pages –Home page duplication is the #1 issue –Use a second signature for internal pages Google Alerts – Roll your own with the Google API – –or
keyword research – corporate training – private coaching Killing Dupes w/ DMCA DMCA, Digital Millenium Copyright Act I am NOT an attorney, lawyer, barrister, solicitor, etc. and this is NOT legal advice Ian McAnerin’s templates: – –Or Google McAnerin DMCA To Hosting Provider (ISP) to remove sites/pages To search engines to remove from index
keyword research – corporate training – private coaching Challenging The Search Engines Duplication by proxy, by theft, etc. is a major issue for webmasters – a drain on resources, and a pain in the… Like search engine spam, much of it is paid for by search engines through contextual ad networks & PPC Identify the originals – is the page in DMOZ? Is it in the Y! Directory? It just might be the original! How many DMCA notices can a search engine afford to process? Why are any URLs from known proxies still indexed after all these years?
keyword research – corporate training – private coaching Contact Information Dan Thies, Free Training Videos: Free Tools: