BOTNET JUDO : Fighting Spam with Itself

BOTNET JUDO : Fighting Spam with Itself
Andreas Pitsillidis Kirill Levchenko Christian Kreibich Chris Kanich Geoffrey M. Voelker Vern Paxson Nicholas Weaver Stefan Savage Presented by: Mohan Krishna Karanam

Agenda Introduction: What is Spam ? Background and Related Work
Current Antispam Approaches Spamming Botnets Template-based Spam Signature Generator Evaluation Conclusion

What is spam ? Terminology Unsolicited Commercial Email
Unsolicited Bulk Examples

What is spam ?

Introduction How do spammers operate ?
They exploit small time advantages to deliver large overall gains How does the receiver block spam ? Install filters to block spam based on pattern detection (signature). How do the filters work? How much do you know about your opponent’s next move and how quickly can you act on it? It is all about how actions and counter reactions.

Introduction What is a Botnet ? Components of a Botnet
Command and control Fast Flux DNS Zombie computer Botnet: A number of Internet-connected computers communicating with other similar machines in which components located on networked computers communicate and coordinate their actions by .command and control or by passing messages to one another. They have been many times been used to send spam or participate in distributed denial-of-service attacks. The word botnet is a combination of the words robot and network. The term is usually used with a negative or malicious connotation. Botnets can be legal and illegal(en.wikipedia.org/wiki/Botnet).

Introduction Types of attacks using a botnet :
Distributed Denial-of-Service attacks Spyware Spam Click fraud Bitcoin Mining

E-mail Spam Target: Mail servers Generate a bulk of mails
Contain attractive content which misleads the user spams are generated based on a template Templates are designed in a way such that the messages being derived from them are not generic, thus allowing the spam to escape/cross the spam filters High amounts of varied templates are generated and the detection becomes difficult as it needs manual effort. The filter cannot determine if it is a spam as the content is not generic. Difficult for both [Sender] and [Content] filter to detect the spam. Requires alternative methods to combat botnet.

Anti spam approaches Content based Sender based Template based
Popular preventive measures include Anti-virus software, Passive OS fingerprinting, Network based approaches (nullrouting) and Spam filtering. The filter would be derived from templates followed by spammers and this can potentially identify spam s. Can require significant manual effort to Reverse Engineer each unique protocol Here comes blackbox approach

Content based spam Oldest and the best known filtering technique.
Filters based on the textual content Textual Content may include: Message Body Anomalous Headers Undesirable Messages Transition from manual configuration to systems based on supervised learning.

Sender Based spam Focuses on means by which spam is delivered.
Any IP address delivering spam is highly likely to repeat this act and unlikely to send any legitimate communication. Makes use of blacklists to store IP addresses of internet hosts which send spam. Disadvantages: Scalability and increase in number of hosts.

BOTNET Judo: Black Box approach
Individual botnet instances are executed in a controlled environment and the spam messages they attempt to send are analyzed online. Produce comprehensive regular expression Captures all messages generated from a template. Avoids extraneous matches. Produce zero false positives against non-spam mail Match all spam based on same template

Background work General approach for inferring generation templates (e.g., mail header anomalies, subject lines, URLs) Concrete algorithm to generate an initial high-quality regular expression update in fraction of seconds if needed. Test empirically against live botnet spam, demonstrate its effectiveness, and identify the requirements for practical use.

Template based Spam Uses macros Personalize each message
Avoid spam filtering based on message content The figure shows a regex signature generated from 1000 message instances

Template Elements Macros are of two types: Noise Macros :
Used to randomize strings for which there are few semantic rules Dictionary Macros : Used for content that must be semantically appropriate in context.

Real time Filtering Macros can be converted to regular expressions
Noise macros become repeated character classes Dictionary macros become a disjunction of the dictionary elements These regexes then match all spam generated from the template The process of obtaining templates in highly time consuming as it involves reverse engineering the botnet ‘command and control’ panel. This is overcome by using the ‘Template Inference Algorithm’

Architecture of the judo system
The judo filtering system has three components: A Bot Farm Signature Generator Spam Filter

System Assumptions: Our proposed spam ﬁltering system operates on a number of assumptions. we assume that bots compose spam using a template system as described. rely on spam campaigns employing a small number of templates at any point in time. System assumes that the ﬁrst few messages of a new template do not appear intermixed with messages from other new templates.

Assumptions for the judo system
Spam is composed by bots The experiment relies on spam campaigns employing a small number of templated at any point in time The judo system assumes that the first few messages of a new template do not appear intermixed with messages from other new templates

The Signature Generator
Template Interface Anchors Macros Dictionary Macros Micro-anchors Noise Macros Leveraging Domain Knowledge Header Filtering Special tokens Signature Update Second Chance Mechanism Pre-Clustering Execution Time

Signature Update There is a need for a set of signatures, rather than single signature because several templates may be in use at the same time Training buffer Trade off between signature selectivity and training time Two additional mechanisms for handling extreme cases: Second Chance Mechanism Pre-Clustering

Second Chance Mechanism
Used to mitigate the effects of a small training buffer Developed to combine the advantage of fast signature deployment with the eventual beneﬁts of dictionaries. If a message signature fails to match an existing signature It is re-checked with existing signatures consisting only anchors If matched, signature is updated. Update is performed incrementally without needing to rerun a new instance of the inference algorithm Pre-Clustering: A large training buffer may intermix messages from different templates, resulting in an amalgamated signature. unclassiﬁed messages are clustered using skeleton signatures. A skeleton signature is a kin to an anchor signature, but is built with a larger minimum anchor size, More permissive Used to mitigate the effects of overly large training buffers (potentially mixed RE’s) Skeleton signatures used to sort incoming messages prior to running Judo on them. Similar to second chance mechanism, but with a larger allowable anchor size.

Pre-Clustering Large training buffers intermix messages from different templated, thus resulting in amalgamated signature. Pre-clustering is used to mitigate the effects of a large training buffer Skeleton signature If a message fails to match a full signature, an attempt to assign it to a training buffer using a skeleton signature is made Once the buffer is full, skeleton regex is generated. Several skeleton regex are then combined to form a full signature ready for use.

Evaluation and Methodology
Signature Safety Testing Methodology Single Template Inference Multiple Template Inference Real-world Deployment False Positives Response time Other Content based approaches

Single template inference Results
Template Inference Algorithm – Heart of the Judo System 0% False positive rate achieved at k=1000 Generated 5000 instances of spam from a ‘Storm’ bot from templates gained through reverse engineering Here ‘k’ is the size of the training buffer.

Multi Template interface Results
False negative rate decreases as the classiﬁcation delay d increases. Increasing false negative rate as k increases, may seem counterintuitive(previous experiment -increasing k decreased the false negative rate).

DYNAMIC BEHAVIOUR

Real world deployment Results
Worst Case: Rustock is only source of false positives: 1 in 12,500 messages. All others 0 total false positives in corpora

Response Time Takes under 10 seconds in almost all cases to run the algorithm. The remaining time it takes is to build up the required training sets. Rustock bots deployed required only 20 hours sent 932,474 messages spamming rate of 194 messages per minute, for each bot

Conclusion Judo is an attractive system which generates highly specialized regexes. Judo also proves that it is practical to generate high-quality spam content signatures by observing the output of the bot instances.

Queries and Discussion

THANK YOU

BOTNET JUDO : Fighting Spam with Itself

Similar presentations

Presentation on theme: "BOTNET JUDO : Fighting Spam with Itself"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BOTNET JUDO : Fighting Spam with Itself

Similar presentations

Presentation on theme: "BOTNET JUDO : Fighting Spam with Itself"— Presentation transcript:

Similar presentations

About project

Feedback