© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes.

© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes Bill Atchison Chief Web Arachnologist CrawlWall.com “The ‘Bot Stops Here!”

© 2008 CrawlWall.com Intelligence Gathering Motives Why let competitors use your hard work and paid research as low hanging fruit to launch their business? Competitors want the path of least resistance to encroach on your online business SEOs want to make easy money helping competitors rank by leveraging your information SEO tool vendors want to sell your data for a profit and help those competing against you

© 2008 CrawlWall.com Who else gathers intelligence? Competitive Shopping Sites Intelligence gathering Spybots –Copyright Compliance –Branding Compliance –Corporate Security Monitoring –Media Monitoring (mp3, mpeg, etc.) –Myriad of Safe-Site Monitoring solutions Data Aggregators Lawyers! And many more…

© 2008 CrawlWall.com How is Information Collected? Various places used to gather data: –Google, Yahoo! and MSN’s cache pages –Internet Archives –Whois, Domaintools, etc. –SEO tools that directly crawl your site

© 2008 CrawlWall.com NOARCHIVE Bans Cache Eliminate search engine cache to stop covert researchers from gathering data on your meta tags, internal anchor text and outbound links Insert this meta tag in all your web pages: Visit http://www.noarchive.net for more information

© 2008 CrawlWall.com Ban the Internet Archive The Internet Archive (Archive.org) is used to covertly gather both historical and recent site data Block Archive.org in robots.txt: User-agent: ia_archiver Disallow: / Also block in.htaccess as they still tend to crawl and results show up in Alexa cache pages: RewriteCond %{HTTP_USER_AGENT} ^ia_archiver RewriteRule ^.* - [F] Lawyers love Archive.org!

© 2008 CrawlWall.com Private Whois Registration Remove clues about you, your administrative and technical contacts, and how many domains you have. The WHOIS data is easily blocked using proxy domain registrations. Sample WHOIS using DomainsByProxy: MYDOMAINNAME.COM Domains by Proxy, Inc. DomainsByProxy.com xxxx N. Rd., Ste xxx, PMB xxx Scottsdale, Arizona xxxxx United States Another upside, no public WHOIS email addresses for spammers to harvest.

© 2008 CrawlWall.com Whitelist Robots.txt Use robots.txt to tell well behaved ‘bots whether they’re allowed to crawl or not Sample robots.txt file: # allow bots we like User-agent: Googlebot User-agent: Slurp User-agent: Msnbot Disallow: # all other bots banned User-agent: * Disallow: /

© 2008 CrawlWall.com Whitelist.htaccess Badly behaved crawlers that won’t honor robots.txt get stopped at the server with.htaccess. Sample.htaccess code: #allow just search engines we like, we're OPT-IN only BrowserMatchNoCase Google good_pass BrowserMatchNoCase Slurp good_pass BrowserMatchNoCase msnbot good_pass BrowserMatchNoCase Teoma good_pass BrowserMatchNoCase Jeeves good_pass #allow Firefox, MSIE, Opera etc. BrowserMatchNoCase ^Mozilla good_pass BrowserMatchNoCase ^Opera good_pass order deny,allow deny from all allow from env=good_pass

© 2008 CrawlWall.com Verify Spider Identity Make sure the search engines are who they claim using full trip reverse DNS checking, avoid spoofing IP 66.249.73.58 -> crawl-66-249-73-58.googlebot.com -> IP 66.249.73.58 Sample.htaccess code: SetEnvIfNoCase User-Agent "!(Googlebot|msnbot|Teoma)" notRDNSbot Order Deny,Allow Deny from all Allow from env=notRDNSbot Allow from googlebot.com Allow from search.live.com Allow from ask.com

© 2008 CrawlWall.com ‘Bot Blockers for Everything Else There will still be crawlers gathering competitive information that don’t want to get caught that pretend to be human browsers Tools such as robots.txt and.htaccess can’t stop those that don’t want to be stopped Complete your arsenal with a ‘bot blocker script specifically designed to stop the unstoppable crawlers

© 2008 CrawlWall.com Summary Remove Competitive Vulnerabilities: Eliminate Search Engine Cache Pages OPT-OUT of Archive.org OPT-IN only allowed spiders ‘Bot blocker scripts to catch hidden threats Get Better Results: Tighter controls on copyrighted content Improved search engine ranking after thwarting unwanted competition Better server performance for visitors and legit search engine crawls

© 2008 CrawlWall.com Resources Visit the following sites and forums for more details: Robots.txt Forum http://www.webmasterworld.com/robots_txt/ Apache Web Server Forum http://www.webmasterworld.com/apache/ Search Engine Spider Identification Forum http://www.webmasterworld.com/search_engine_spiders/ The NoArchive Initiative http://www.noarchive.net/ The Web Robots Pages http://www.robotstxt.org/

© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes.

Similar presentations

Presentation on theme: "© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes.

Similar presentations

Presentation on theme: "© 2008 CrawlWall.com Competitive Counter-Intelligence Stop Snooping Competitors Techniques for protecting your SEO investment from prying competitive eyes."— Presentation transcript:

Similar presentations

About project

Feedback