SURF:SURF: Detecting and Measuring Search Poisoning Long Lu, Roberto Perdisci, and Wenke Lee Georgia Tech and University of Georgia.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

PhishZoo: Detecting Phishing Websites By Looking at Them
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
BLADE: An Attack-Agnostic Approach for Preventing Drive-By Malware Infections, L. Lu et al. BLADE: An Attack-Agnostic Approach for Preventing Drive-By.
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Design and Evaluation of a Real-Time URL Spam Filtering Service
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
National Aeronautics and Space Administration nai.nasa.gov NASA Astrobiology Institute1 NAI Website: Statistics and Content Management Marco Boldt Sr.
WEB SCIENCE: SEARCHING THE WEB. Basic Terms Search engine Software that finds information on the Internet or World Wide Web Web crawler An automated program.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Search Engine Optimization Miss Ellis Mrs. Harvey.
SEO PLAN Presented By Mangesh Dolse. Lead Management Tool( Sample)
By Raza / Faisal By: Raza Usmani Faisal Khan. What is SEO? It is the process of affecting the visibility of a website or a web page in a search engine's.
Apps VS Mobile Websites Which is better?. Bizness Apps Survey Bizness Apps surveyed over 500 small business owners with both a mobile app and a mobile.
Web Spam Detection: link-based and content-based techniques Reporter : 鄭志欣 Advisor : Hsing-Kuo Pao 2010/11/8 1.
Presentation by Kathleen Stoeckle All Your iFRAMEs Point to Us 17th USENIX Security Symposium (Security'08), San Jose, CA, 2008 Google Technical Report.
11 The Ghost In The Browser Analysis of Web-based Malware Reporter: 林佳宜 Advisor: Chun-Ying Huang /3/29.
Web 2.0 for Government Knowledge Management Everyone benefits by sharing knowledge March 24, 2010 Emerging Technologies Work Group Rich Zaziski, CEO FYI.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Adversarial Information Retrieval The Manipulation of Web Content.
1 All Your iFRAMEs Point to Us Mike Burry. 2 Drive-by downloads Malicious code (typically Javascript) Downloaded without user interaction (automatic),
BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection Guofei Gu, Roberto Perdisci, Junjie Zhang, and.
GOOGLE ANALYTICS Destinee Cushing DIG 4104C Spring 2014.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
Security Evaluation of Pattern Classifiers under Attack.
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
2011/11/1 1 Long Lu, Wenke Lee College of Computing Georgia Inst. of Technology Roberto Perdisci Dept. of Computer Science University of Georgia.
MIS 424 Professor Sandvig. Overview  Why Analytics?  Two major approaches:  Server logs  Google Analytics.
Basic Search Engine Optimization. What is SEO?  SEO is an abbreviation for search engine optimization.
Track Your Keyword Rankings and Competitors Rankings - Live.
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,
Cross Site Scripting and its Issues By Odion Oisamoje.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Terminal Services Technical Overview Olav Tvedt TVEDT.info Microsoft Speaker Community
Phishing Website Detection & Target Identification October 30 th, 2015 Samuel Marchal*, Kalle Saari*, Nidhi Singh †, N.Asokan* *Aalto University - † Intel.
Xinyu Xing, Wei Meng, Dan Doozan, Georgia Institute of Technology Alex C. Snoeren, UC San Diego Nick Feamster, and Wenke Lee, Georgia Institute of Technology.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Setting up a search engine KS 2 Search: appreciate how results are selected.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Chapter 1: Internet Marketing Foundations. Chapter Objectives Describe how computers and servers communicate to enable people to interact with webpages.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Smart Way to Increasing Organic Traffic to a Website Created By, Martine
(Big) data accessing Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
● The most common website platform ● User friendly-easy to edit ● Constantly improving-updates, plugins, themes Why WordPress?
Search Engine Optimization (SEO) Presentation By Celina Jonesi Small Business Seo – KG Tech.
Off the Hook: Real-Time Client- Side Phishing Prevention System July 28 th, 2016 University of Helsinki Samuel Marchal*, Giovanni Armano*, Kalle Saari*,
Overview and Strategic Partnership Opportunities.
Chapter Objectives Explain how to test a website before it is published Describe how to publish a website to a web server Identify ways to promote a published.
Detecting Web Attacks Using Multi-Stage Log Analysis
BotTracer: Bot User Detection Using Clustering Method in RecDroid
Prepared by Rao Umar Anwar For Detail information Visit my blog:
PJ SEO Specialists WordPress Web Development and SEO.
A New Phishing Detection Approach
Department of Electrical Engineering
CMP Creating Your Personal and Small Business Web Sites
Presentation transcript:

SURF:SURF: Detecting and Measuring Search Poisoning Long Lu, Roberto Perdisci, and Wenke Lee Georgia Tech and University of Georgia

Search engines 1 SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security

SEO 2 SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security Optimizing website presentation to search crawlers – Emphasizing keyword relevance – Demonstrating popularity Black-hat SEO – Artificially inflating relevance – Dishonest but typically non-malicious

Search poisoning SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 3

Search poisoning Aggressively abusing SEO – Forging relevance – Employing link farm – Redirecting visitors Inadequate countermeasures – IR quality assurance – Designed for less adversarial scenarios – Robust solutions needed 4 SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security

Malicious search user redirection Preserving poisoning infrastructure Filtering out detection traffic Enabling affiliate network SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 5

Observations Analyzed 1,048 search poisoning cases – Ubiquitous cross-site redirections – Poisoning as a service – Variety in malicious applications – Persistence under transient appearances SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 6

Goals Not specific to malicious content hosted on terminal page Generality Cannot be trivially evaded by attackers Robustness Not dependent on proprietary data or special environment Wide deployability SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 7 SURF (Search User Redirection Finder)

SURF overview SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 8 Instrumented Browser Feature Extractor Feature Sources Browser events Network info Search result SURFClassifierSURFClassifier

SURF prototype Instrumented browser – Stripped IE with customizations (~1k SLOC in C#) – Listening and responding to rendering events Feature extractor – Offline execution to facilitate experiments SURF Classifier – Weka’s J48 – Simple, efficient, and easily interpreted SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 9

Detection features Redirection composition Total redirection hops Cross-site redirection hops Redirection consistenc y Chained webpages Landing-to- terminal distance Page rendering errors IP-to-name ratio Poisoning resistance Keyword poisoning resistance Search rank Good rank confidence SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 10

Detection features (1/3) Regular Vs. Malicious search redirection Covering all types of redirections SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 11 Redirection composition Total redirection hops Cross-site redirection hops Redirection consistenc y

Detection features (2/3) SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 12 Chained webpages Landing-to- terminal distance Page rendering errors IP-to-name ratio Webpages involved in redirections Distance = min {geo_dist, org_dist} Premature termination on errors Unnamed malicious hosts

Detection features (3/3) SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 13 Poisoning resistance Keyword poison resistance Search rank Good rank confidence Derived from search keyword and result Poison resistance – Difficulty of poisoning a keyword – Avg {PageRank of top 10 results} Good rank confidence – Poison resistance / search rank

Evaluation Semi-manually labeled datasets – 2,344 samples collected on Oct 2010 – Labeling methods does not overlap detection features SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 14

Evaluation Accuracy – 10-fold cross validation – On average, 99.1% TP, 0.9% FP Generality – Cross-category validation – Oblivious to on-page malicious content Robustness – Simulating compromised features – Evaluating accuracy degradation SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 15

Discussion Unselected features – Evadable or dependent on search-internal data – Domain reputation Deployment scenarios – Regular users, search engines, security vendors. – Enabling community efforts SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 16

Empirical measurements 7-month measurement study ( ~ ) 12 million search results analyzed On a daily basis: SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 17 Retrieve trendy keywords Dispatch search jobs to SURF bots visits each search result and produces logs Feature extraction and classification

Empirical measurements 7-day window – Poisoning lag and poisoned volume – Avg. landing page life time – 1.7 days SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 18

Empirical measurements 7-month window – More than 50% trendy keywords poisoned SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 19

Empirical measurements 7-month window – Unique landing domains observed per week SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 20

Empirical measurements 7-month window – Terminal page variety survey SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 21

Conclusion In-depth study of search poisoning Design and evaluation of SURF Long-term measurement of search poisoning SURF: Detecting and Measuring Search Poisoning 18th ACM Conference on Computer and Communications Security 22