Good Word Attacks on Statistical Spam Filters Daniel Lowd University of Washington (Joint work with Christopher Meek, Microsoft Research)

Slides:

Advertisements

Similar presentations

Text Categorization.

Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.

Large-Scale Entity-Based Online Social Network Profile Linkage.

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Text Categorization Karl Rees Ling 580 April 2, 2001.

Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)

Foundations of Adversarial Learning Daniel Lowd, University of Washington Christopher Meek, Microsoft Research Pedro Domingos, University of Washington.

Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.

Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.

On the Hardness of Evading Combinations of Linear Classifiers Daniel Lowd University of Oregon Joint work with David Stevens.

Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.

INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Spam May CS239. Taxonomy (UBE)  Advertisement  Phishing Webpage  Content  Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To:

Adversarial Learning: Practice and Theory Daniel Lowd University of Washington July 14th, 2006 Joint work with Chris Meek, Microsoft Research “If you know.

Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.

Foundations of Adversarial Learning Daniel Lowd, University of Washington Christopher Meek, Microsoft Research Pedro Domingos, University of Washington.

Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

Detection of Internet Scam Using Logistic Regression

Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.

Personalized Spam Filtering for Gray Mail Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Robert McCann Microsoft Corporation.

Countering Spam Using Classification Techniques Steve Webb Data Mining Guest Lecture February 21, 2008.

Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.

Distributed Phishing Attacks Markus Jakobsson Joint work with Adam Young, LECG.

1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

Understanding the Network-Level Behavior of Spammers Best Student Paper, ACM Sigcomm 2006 Anirudh Ramachandran and Nick Feamster Ye Wang (sando)

11 CANTINA: A Content- Based Approach to Detecting Phishing Web Sites Reporter: Gia-Nan Gao Advisor: Chin-Laung Lei 2010/6/7.

Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

BOTNET JUDO Fighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.

Blocking Blog Spam with Language Model Disagreement Gilad Mishne (Amsterdam) David Carmel (IBM Israel) AIRWeb 2005.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

Spam Detection Ethan Grefe December 13, 2013.

SpamIQ Free spam analysis and data mining tool. Objective: Provide ISPs and network operators good analysis tools to analyze and understand spam traffic.

A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.

Text Analytics Teradata & Sabanci University April, 2015.

Leveraging Delivery for Spam Mitigation.

1 CS 391L: Machine Learning Text Categorization Raymond J. Mooney University of Texas at Austin.

An Unsupervised Approach for the Detection of Outliers in Corpora David Guthrie Louise Guthire, Yorick Wilks The University of Sheffield.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

A False Positive Safe Neural Network for Spam Detection Alexandru Catalin Cosoi

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.

Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Spam By Dan Sterrett. Overview ► What is spam? ► Why it’s a problem ► The source of spam ► How spammers get your address ► Preventing Spam ► Possible.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Less is More: Active Learning with Support Vector Machines Paper by Greg Schohn and David Cohn of Just Research Presentation by Gregor Richards.

ItemBased Collaborative Filtering Recommendation Algorithms 1.

GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.

Genre-based decomposition of class noise

Exploiting Machine Learning to Subvert Your Spam Filter

KDD 2004: Adversarial Classification

Asymmetric Gradient Boosting with Application to Spam Filtering

Spam Detection Algorithm Analysis

Presentation transcript:

Good Word Attacks on Statistical Spam Filters Daniel Lowd University of Washington (Joint work with Christopher Meek, Microsoft Research)

Content-based Spam Filtering cheap = 1.0 mortgage = 1.5 Total score = 2.5 From: Cheap mortgage now!!! Feature Weights > 1.0 (threshold) Spam

Good Word Attacks cheap = 1.0 mortgage = 1.5 Stanford = -1.0 CEAS = -1.0 Total score = 0.5 From: Cheap mortgage now!!! Stanford CEAS Feature Weights < 1.0 (threshold) OK

Can we efficiently find a list of “good words”? Types of attacks Passive attacks -- no filter access Active attacks -- test s allowed Metrics Expected number of words required to get median (blocked) spam past the filter Number of query messages sent Playing the Adversary

Filter Configuration Models used Naïve Bayes: generative Maximum Entropy (Maxent): discriminative Training 500,000 messages from Hotmail feedback loop 276,000 features Maxent let 30% less spam through

Comparison of Filter Weights “spammy”“good”

Passive Attacks Heuristics Select random dictionary words (Dictionary) Select most frequent English words (Freq. Word) Select highest ratio: English freq./spam freq. (Freq. Ratio) Spam corpus: spamarchive.org English corpora: Reuters news articles Written English Spoken English 1992 USENET

Passive Attack Results

Active Attacks Learn which words are best by sending test messages (queries) through the filter First-N: Find n good words using as few queries as possible Best-N: Find the best n words

First-N Attack Step 1: Find a “Barely spam” message Threshold Legitimate Spam “Barely spam” Hi, mom! Cheap mortgage now!!! “Barely legit.” mortgage now!!! Original spam Original legit.

First-N Attack Step 2: Test each word Threshold Legitimate Spam Good words “Barely spam” message Less good words

Best-N Attack Key idea: use spammy words to sort the good words. Threshold Legitimate Spam Better Worse

Active Attack Results (n = 100) Best-N twice as effective as First-N Maxent more vulnerable to active attacks Active attacks much more effective than passive attacks

Defenses Add noise or vary threshold Intentionally reduces accuracy Easily defeated by sampling techniques Language model Easily defeated by selecting passages Easily defeated by similar language models Frequent retraining with case amplification Completely negates attack effectiveness No accuracy loss on original spam See paper for more details

Conclusion Effective attacks do not require filter access. Given filter access, even more effective attacks are possible. Frequent retraining is a promising defense. See also: Lowd & Meek, “Adversarial Learning,” KDD 2005