Measuring and Fingerprinting Click-Spam in Ad-Networks

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Ankit Mehta 1. Brief history of online advertising May 1978: First Marketing Spam form DEC Marketing representative. 1994: First banner ad by AT&T on.
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Authors: Mona Gandhi, Markus Jakobsson, Jacob Ratkiewicz (Indiana University at Bloomington) Presented By: Lakshmy Mohanan.
CLICK FRAUD Alexander Tuzhilin By Vinny Rey. Why was the study done? Google was getting sued by advertisers because of click fraud. Google agreed to have.
HTTP: cookies and advertising Concepts to cover:  web page content (including ads) from multiple site: composition at client  cookies  third-party cookies:
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Badvertisements: Stealthy Click-Fraud with Unwitting Accessories Mona Gandhi Markus Jakobsson Jacob Ratkiewicz Indiana University at Bloomington Presented.
Click Fraud Forensics Dean Qudah Pace University DPS 2010.
Supplied on \web site. on January 10 th, 2008 Reducing Risk Through Incremental Malware Detection January 2008.
Lalit Sharma, JIM E-commerce Marketing Communications.
Cloak and Dagger: Dynamics of Web Search Cloaking David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 左昌國 Seminar.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Data Mining By Dave Maung.
Spamscatter: Characterizing Internet Scam Hosting Infrastructure By D. Anderson, C. Fleizach, S. Savage, and G. Voelker Presented by Mishari Almishari.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid.
Chapter Twelve Digital Interactive Media Arens|Schaefer|Weigold Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution.
CATCHING CLICK-SPAM IN SEARCH AD NETWORKS Vacha Dave +, Saikat Guha ★ and Yin Zhang * + University of California, San Diego ★ Microsoft Research India.
Empirical Analysis of Search Advertising Strategies BHANU C. VATTIKONDA, VACHA DAVE, SAIKAT GUHA, ALEX C. SNOEREN.
Week 2- Overview of the internet The construction of a webpage Four Key Elements – how the internet works Elements and Design concepts Introduction to.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
CHAPTER 16 SEARCH ENGINE OPTIMIZATION. LEARNING OBJECTIVES How to monitor your site’s traffic What are the pros and cons of keyword advertising within.
Dec 14, 2014, Harvard University
SEARCH ENGINE OPTIMIZATION.
EMarketing: The Essential Guide to Marketing in a Digital World Online Advertising What you’ll learn The various business objectives you can meet with.
Introduction Adult website business is very big and it has loads of cash. You cannot imagine how much a single famous porn site makes a day. There are.
Chapter 10: Web Basics.
Google Is The Search King
Information Systems for Managers Assignment FACEBOOK
Search Engine Optimization(S.E.O)
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
Native Ads by YeahMobi.
Technologies and Applications
Advertising Agencies and Interactive Media
SNS College of Engineering
Expand your customer audience through Bing Ads
Discover How Your Business Can Benefit from a Facebook Fanpage
Discover How Your Business Can Benefit from a Facebook Fanpage
Web Mining Ref:
Introduction to Web Mining
Sport Clips Google Analytics for Franchisee - June 2017
INTRODUCTION DIGITAL MARKETINGDIGITAL MARKETING IS A WAY OF MARKETING THE BRANDS OR THINGS BY USING DIGITAL MEDIA SUCH AS MOBILE PHONES, INTERNET, COMPUTERS.
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Google Analytics Workshop ICEF Toronto May 12th 2016
Importance of search Google, Bing and Yahoo! dominate search, over 95% combined share. Your company’s appearance at the top of the Search Engine Result.
Google Is The Search King
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Interactive media.
What is a Search Engine EIT, Author Gay Robertson, 2017.
Learn to use: Salesfloor Reporting.
DIGITAL MARKETING COMPANY Digital marketing is a broad term that encapsulates all marketing channels and methods you can use to promote.
Web Mining Department of Computer Science and Engg.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Searching for Truth: Locating Information on the WWW
DIGITAL MARKETING AGENCY Digital Marketing.
Searching for Truth: Locating Information on the WWW
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1
Marketing Strategy.
Introduction to Web Mining
PPC Management Company | PPC Services eNest Services.
Presented By:- Abhinav Shashtri. Index SR.NOTitleSlide No 1Introduction: Build Awareness: Buildup Brand Image: Content Improves Website.
CS 345A Data Mining Lecture 1
Unconstrained Endpoint Profiling (Googling the Internet)‏
Presentation transcript:

Measuring and Fingerprinting Click-Spam in Ad-Networks Vacha Dave*, Saikat Guha, Yin Zhang Presented by Akhil Reddy Katpally

Contents Introduction What is Click Spam? Estimating Click Spam Measuring Click Spam Fingerprinting Click Spam Summary

Online Advertisement Online Advertising is a 43 billion dollar industry and it is growing at around 18% per annum. Main Players in Online Advertising are Ad-Networks Publishers Advertisers Pay-Per Click Advertising Advertiser pays to the Ad-Network for every click on the Ad. Ad-Network shares the revenue with publisher in 30-70 percentage. Publishers gets the major chunk of the revenue and there is a greater possibility of publishers indulging in fraud.

Pay-per Click Google is Publisher and Ad networks All ads displayed here are Advertiser ads

Time-line for serving Ads

Click Spam in Online Advertising What is Click Spam? Click Spam, i.e., fraudulent or invalid click. A person, automated script or computer program imitates a legitimate user of web browser clicking on a ad. Sole purpose is generating charge per click without having actual interest. Who is involved in Click Spam Publishers or Syndicated Search engine Advertisers To deplete the competitors budget. Click Spam Costs Advertisers on the order of hundreds of millions of dollar every year.

Challenges in Identifying Click Spam No ground Truth: A Click is Click-Spam if the user did not intend to Click. It is really difficult to identify which Clicks are Click-Spam. Granularity: Some areas of advertising have less Click-Spam and other areas have more (mortgage ads). It is not possible to aggregate them. No global view: Ad network cannot track user engagement on advertiser site, and the advertiser has no knowledge on user’s engagement with other advertiser.

Do Ad networks identify Click Spam? Every Ad networks have in-house heuristics to identify Click Spam. They provide discounts to the advertisers based on their in-house heuristics. Ad networks do not indicate which clicks were charged and which are not charged in the billing report. There is no transparency in this models and the efficiency of this models are questionable.

Contributions of the paper First methodology to Advertisers to independently measure and compare Click Spam across each ad networks for a specific keywords. Discover sophistication of ongoing click spam attacks and present strategies to ad networks to mitigate them. Conduct a large-scale in-depth measurement study of click-spam today across ten ad networks. Search, contextual, social and mobile ad networks. Show that Click Spam is a problem for Mobile and Social ad networks.

Estimating Click-Spam We estimate fraction of click-spam in total clicks. Designed Bayesian estimation framework. It can provide the fraction of the click-spam. Bayesian framework uses only quantities which are measurable by the advertiser. It will cancel out quantities which are not measurable by the advertiser.

Estimating click spam P(G) is computed as the ratio of the number of gold-standard clicks (g; known) to the number of clicks (n; known). P(Ii) is the ratio of the number of non-click-spam clicks (ii; unknown) to the number of clicks (ni; known) arriving via the interstitial page False Positive: Legitimate users may not make extra effort to reach target ad. False Negative: Uninterested users may still make extra effort to reach target ad.

Estimating Click Spam Limitations of our approach: A key limitation of our approach is that the advertiser must actively measure click spam. Both interstitial ad and control ad harm the user experience.

Measuring Click-Spam in 10 ad networks Signup as 3 advertiser with 10 ad networks. 10 ad networks are Search ad networks: Google search and Bing search Contextual ad networks: Google AdSense and Bing Contextual. Mobile ad networks: Google Mobile, Bing Mobile, Ad Mobile and InMobi. Social Ad networks: Facebook We use 3 keywords for each advertiser High popularity keyword: celebrity Medium popularity keyword: yoga Low-popularity keyword: lawnmower

Continued… We create 4 Ads for each target landing page First Ad directly takes to landing page Other 3 Ads takes to interstitial pages before to the landing page. We create 3 interstitial pages: 5 sec wait, before landing on the page. Ask user to click on the link to continue to the landing page Ask the user to solve CAPTCHA. We create 4 additional Control Ads for each landing page, but with junk ad text.

Continued…. Bluff Add All Ads were shown 26M times. Total of 85k clicks are resulted and 17k is charged for that. Landing pages were fetched by total 33k unique IP addresses located in 190 countries.

Validating the data Since there is no ground truth, we compare the results with the search ads of Google and Bing We Assume that reputed search ad networks are mature enough that their in-house algorithms are able to detect and discount most of the click- spam on their networks. All the web requests made to the Apache server are logged.

Validating using Search Ads A,B and C are different ad networks ran with different environments

Click Spam in Mobile ads A,B,C and D are different mobile ad networks

Estimating contextual spam A,B and C are different Contextual and social ad networks

Fingerprinting Click Spam Manually investigated all the clicks that were received on the control ads. Discovered 7 ongoing click spam attacks. Investigation covers 26% of the traffic our control ads attracted. As expected these ads are fraudulent in nature. Normally Ad networks discount substantially less amount(6-20%).

Graph Clustering We induce a graph that spans all publisher domains we see. For any pair of publishers, we compute a similarity score. We construct a feature vector that consists of various network-level attributes. By assigning weight to each attribute we compute a cosine similarity b/w the Feature vectors of the pairs of the domain.

Malware Driven Click Fraud Responsible malware: TDL Validation: Run malware in Virtual Machine. Can intercept and redirect all requests from popular browsers. Browser specific filtering doesn’t work. Only 1 click per ip address per day. Threshold based filtering doesn’t work. Mimics real user behavior Timing analysis doesn’t work.

Click Fraud using parked domains A parked domain is a domain that is registered but it is not in use. IP holds 500,000 such domains. A user may reach parked domain through many ways like he may have mistyped the website name. Advertiser must pay for that. Auto redirects based on the geo-location.

Icici bank parked domain Finally user reaches to www.icicibank.com

Mobile Click Spam Gamming apps Ant-smasher Place ads close to where users are expected to click. Games like Ant-smasher, milk-the-cow, and others. Ant-smasher Ads are placed where user is expected to click Click Spam is rampant in mobile ads.

Summary Click Spam remains a problem, and probably lot bigger in mobile ads. Proposed an independent mechanism to estimate click spam by the advertiser. Done extensive validation with different set of keywords in different ad networks. Learnt about sophisticated Click Spam attacks like Malware driven attacks Parked domain Click Spam Mobile Click Spam

Any Questions?