BOTNET JUDO Fighting Spam with Itself By: Pitsillidis, Levchenko, Kreibich, Kanich, Voelker, Paxson, Weaver, and Savage Presentation by: Heath Carroll
The Origins of Spam
Presentation Overview Abstract - What was the intent of the paper? Introduction - current problems faced and methods used to combat them Background - Def: Botnet, Regular Expression, Template-based Spam Approach - How the authors dealt with this problem
Abstract Botnet Judo: Fighting Spam with Itself or ‘Botnet Host Quarantine: What’d we learn?’ Examination of a controlled, isolated, Botnet host. Quick generation of precise and accurate spam filters with ~ 0 false positives
Introduction : Botnets Definition: Botnet - a collection of software agents, or robots, that run autonomously and automatically. The term is most commonly associated with malicious software, but it can also refer to a network of computers using distributed computing software. (en.wikipedia.org/wiki/Botnet) Example: DDoS attack against Blue Security, May 2, 2006
Botnets (cont’d) Common uses of botnets: –Denial-of-service attacks –Adware –Spyware – spam (template, image, etc) –Click fraud –Internet Access number replacement –Fast flux (DNS Url/IP address switching)
SPAM!! –Template Based Spam Botnet uses a RE to produce massive amounts of highly varied spam Harder to [content] filter initially due to varied message makeup –Requires defenders to collect ‘suspect’ spam in order to lobby an effective content-based filter Harder to [sender] filter due to massive host lists –Requires defenders to rely on alternative methods to combat the botnet
SPAM!! Preventative measures: –Anti-virus software –Passive OS fingerprinting –Network based approaches (nullrouting) –Spam filtering –Directed study The last two are covered by this paper
Anti-spam!! Basically 2 different approaches: –Content-based : Filtering based on established heuristics and learning algorithms focused against specific message features Can be highly effective (esp against targeted botnets) Labor intensive to maintain since the basic technique can be countered by chaff and poisoning attacks Hard to maintain low false positives from the filter Blacklisting URLs can also be effective, but needs large up-to-date white-lists to avoid poisoning –Doesn’t do anything if spam doesn’t utilize URLs
Anti-Spam!! (cont’d) –Sender-based Focuses on spam delivery system Assumes sender of spam is likely to repeat sending spam, and not likely to send legitimate messages Basically works by Blacklisting offending senders after the fact Doesn’t work against newest spam Botnets are an effective work-around since the controller distributes his spam over a large number of hosts
Anti-Spam!! (cont’d) Template-based spam filtering: –Suspected Botnet generated spam is examined and deconstructed into a Regular Expression (RE) –Works very well against static botnets, but requires a lot of instances of suspected spam to deconstruct –Useless if controller changes the RE used by the bots
Regular Expressions
Regular Expressions (cont’d) Review:
JUDO!! Generates regular expression signatures to thwart spam Operates by examining the output from quarantined botnet Uses template inference algorithm to generate a set of signatures matching all previous messages
JUDO!! (cont’d) 1.Header Filtering 2.Anchor identification 3.Macro classification Dictionary Micro-anchor Noise 4.Special Tokens 5.Signature Update Second Chance Pre-clustering
Judo - Second Chance Mechanism Used to mitigate the effects of a small training buffer If a message signature fails to match an existing signature –It is re-checked using only anchors –If matched, signature is updated
Judo - Pre-clustering Used to mitigate the effects of overly large training buffers (potentially mixed RE’s) –Skeleton signatures used to sort incoming messages prior to running Judo on them –Similar to second chance mechanism, but with a larger allowable anchor size
Experimental Results Requirements of a good spam filter: –Safe: does not classify legitimate mail as spam Low false positive rate –Effective: correctly identifies the targeted class of spam Low false negative rate
Experimental Results (cont’d) Testing: 4 tiers –Signature safety Signatures from 3 other tiers run against legitimate mail ‘corpora’ to access false positive rate to prevent age bias, they tested the signatures only on the subject and body of the corpora
Experimental Results (cont’d) –Controlled single template inference Generated 5000 instances of spam from a ‘Storm’ bot from templates gained through reverse engineering –1000 for signature generation –4000 for testing false negative rate –Done for each of 10,676 templates (53,380,000 messages) Results: Also, at k = 1000 false positive rate = 0% for all sigs
Experimental Results (cont’d) –Controlled multi-template inference Spam used for testing generated during the Botlab project at the University of Washington 4 bots used: 1 each from Mega-D, Pushido, Rustock, and Srizbi botnets First million messages from each split into training and testing sets, then Judo run chronologically on each test message –True matches determined if a match generated from signature generated from previous test messages –Otherwise counted as false negative
Experimental Results (cont’d) Results: Only false positives from Rustock bot tests
Experimental Results (cont’d) –Real world deployment: 2xXarvester + 2xMega-D + 4xRustock + 6xGheg = 14 bots Messages generated: Ran the test as in multi-template runs
Experimental Results (cont’d) Results: –Worst Case: Rustock again only source of false positives: 1 in 12,500 messages. All others 0 total false positives in corpora
Experimental Results (cont’d) Efficiency: Since the goal of the project was an accurate RE generator, efficiency wasn’t a priority –Initial RE generation using buffer size 50 with 6000 character length messages takes about 2 sec using an average desktop circa 2009 –Signature updates at ~ ms
Response Time Based on the message out rate of the bot(s) generating the spam May be complicated by the existance of multiple bots or templates Bots used in this experiment generated > 100 spam messages per minute. –Since acceptable results from k >= 500, should only take a few minutes to generate a working signature
Overview ‘Judo’ is basically a learning spam filter –Content based –Requires training to produce effective signatures –Safe and Effective (both greater than 99.75%) Controlled tests show exceptional results Simulated real world tests show promise, but could be worked around by bots that can randomly generate new templates
Any Questions?