Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.

Slides:



Advertisements
Similar presentations
Cloak & Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, Geoffrey M. Voelker University of California, San Diego 1.
Advertisements

AUTOMATED DISCOVERY OF PARAMETER POLLUTION VULNERABILITIES IN WEB APPLICATIONS Marco Balduzzi, Carmen Torrano Gimenez, Davide Balzarotti, and Engin Kirda,
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
SEO PACKAGES. Types of Plans Starter Plan Business Plan Enterprises Plan.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
CIS 1310 – HTML & CSS 13 Web Promotion. CIS 1310 – HTML & CSS Learning Outcomes  Identify Commonly Used Search Engines  Describe Components of a Search.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
SEO Techniques Tech Talk 29 th August 2013 (By PEN Vannak)
SEO from the Ground Up! Jack Roberts President and CEO of Peak Positions.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Search Engine Optimization. Introduction SEO is a technique used to optimize a web site for search engines like Google, Yahoo, etc. It improves the volume.
TwitterSearch : A Comparison of Microblog Search and Web Search
About me: Michael Braems Freelancer Online Marketing AdWords Specialist.
John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Introduction to SEO August 2011 NowSourcing, Inc..
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Search Engine optimization.  Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's.
Economics of Malware: Spam Amir Houmansadr CS660: Advanced Information Assurance Spring 2015 Content may be borrowed from other resources. See the last.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
Promotion & Cataloguing AGCJ 407 Web Authoring in Agricultural Communications.
HOW WEB SERVER WORKS? By- PUSHPENDU MONDAL RAJAT CHAUHAN RAHUL YADAV RANJIT MEENA RAHUL TYAGI.
 What is SEO?  Industry Research  SEO Process  Technical aspects of SEO  Social Media - MySpace Optimization  Measuring SEO success  SEO Tools.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
1 Search Engine Optimization An introduction to optimizing your web site for best possible search engine results.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Not So Fast Flux Networks for Concealing Scam Servers Theodore O. Cochran; James Cannady, Ph.D. Risks and Security of Internet and Systems (CRiSIS), 2010.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
SEO & Analytics The Grey and the Hard Numbers. Introduction  Build a better mouse trap and the world will beat a path to your door  Mouse Trap -> Website.
Search Engines By: Faruq Hasan.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Pamela Drake December 11, 2015 SEARCH ENGINE OPTIMIZATON (SEO)
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
What is WEB SPAM Many slides are from a lecture by Marc Najork: “Detecting Spam Web Pages”
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Uniform Resource Locator URL protocol URL host Path to file Every single website on the Internet has its own unique.
Microsoft Windows 7 - Illustrated Unit G: Exploring the Internet with Microsoft Internet Explorer.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Data mining in web applications
Search Engine Optimization
Chapter 10: Web Basics.
How do Web Applications Work?
Search Engine Optimization(S.E.O)
CCT356: Online Advertising and Marketing
Chapter Five Web Search Engines
Strategies for improving Web site performance
Search Engine Optimisation
By Tommy Koh – SEO GEEK PTE LTD
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Objective % Explain concepts used to create websites.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
All About the Internet.
SEO Hand Book.
Development of Search engine optimization for Crowdfunding site
Presentation transcript:

Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 ADLab, NCU-CSIE 18 th ACM Conference on Computer and Communications Security (CCS 2011)

Outline Introduction Methodology Results Related Work Conclusion 2

Introduction Search Engine Optimization (SEO) “Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search engines via the "natural" or un-paid ("organic" or "algorithmic") search results.” ­­­--- Wikipedia Wikipedia SEO could be used as benign techniques Cloaking Up to 1999 One of the notorious blackhat SEO skills Delivering different content to different user segments ie. Search engine crawlers and normal users 3

Introduction 4 Normal User Search Engine Crawler

Introduction Types of cloaking Repeat Cloaking Cookies or IP tracking User Agent Cloaking User-Agent field in the HTTP request header Referrer Cloaking Referer field in the HTTP header IP Cloaking 5

Introduction This paper… Designs a system, Dagger, to identify cloaking in near real-time Uses this system to Provide a picture of cloaking activity as seen through three search engines(Google, Bing and Yahoo) Characterize the differences in cloaking behavior between undifferentiated “trending” keywords and targeted keywords. Characterize the dynamic behavior of cloaking activity 6

Methodology Dagger consists of five functional components Collecting search terms Fetching search results from search engines Crawling the pages linked from the search results Analyzing the pages crawled Repeating measurements over time 7

Methodology Collecting Search Terms Collecting popular search terms from Google Hot Searches Alexa Twitter Constructing another source of search terms using keyword suggestions from “Google Suggest.” ex: User enter -> viagra 50mg Suggestion -> viagra 50mg cost viagra 50mg canada … 8

Methodology Querying Search Results Submitting the search terms to search engines(Google, Yahoo, and Bing) Google Hot Searches and Alexa each supply 80 terms per 4-hour Twitter supplies 40 Together with 240 additional suggestions based on Google Hot Searches (80 * 3)  Total 440 terms Extracting the top 100 search results for each search term(44,000) Removing whitelist URLs Grouping similar entries (same URL, source, and search term)  average roughly 15,000 unique URLs in each measurement period 9

Methodology Crawling Search Results Web crawler A Java web crawler using the HttpClient 3.x package from Apache Crawling 3 times for each URL Disguised as a normal user using Internet Explorer, clicking through the search result Disguised as the Googlebot Web crawler Disguised as a normal user again, NOT clicking through the search result Dealing with IP cloaking? Fourth crawling using Google Translate More than half of cloaked results do IP cloaking 10

Methodology Detecting Cloaking Removing HTTP error response (average 4% of URLs) Using Text Shingling to filter out nearly identical pages 90% of URLs are near duplicates ( “near duplicates” means 10% or less differences between 2 sets of signatures) Measuring the similarity between the snippet of the search result and the user view of the page Removing noise from both the snippet and the body of the user view Search substrings from the snippet Number of words from unmatched substrings divided by the total number of words from all substrings 1.0 means no match 0.0 means fully match Threshold: 0.33  filter out 56% of the remaining URLs 11

Methodology Detecting Cloaking(cont.) False positives may still exist Examining the DOMs as the final test Computing the sum of an overall comparison and a hierarchical comparison Overall comparison: unmatched tags from the entire page divided by the total number of tags Hierarchical comparison: the sum of the unmatched tags from each level of the DOM hierarchy divided by the total number of tags 2.0 means no match 0.0 means fully match Threshold:

Methodology Detecting Cloaking(cont.) Manual inspection False positive: 9.1% (29 of 317) in Google search 12% (9 of 75) in Yahoo (benign websites but delivering different content to search engines) Advanced browser detection Temporal Remeasurement Dagger remeasures every 4 hours for up to 7 days 13

Results Cloaking Over Time 14

Results 15

Results Sources of Search Terms 16

Results 17

Results 18

Results 19

Results Search Engine Response 20

Results 21

Results 22

Results 23

Results 24

Results Cloaking Duration 25

Results Cloaked Content 26

Results 27

Results Domain Infrastructure 28

Results SEO 29

Conclusion Cloaking is an standard skill of constructing scam pages. This paper examined the current state of search engine cloaking as used to support Web spam. New techniques for identifying cloaking(via the search engine snippets that identify keyword-related content found at the time of crawling) Exploring the dynamics of cloaked search results and sites over time. 30