Download presentation
Presentation is loading. Please wait.
Published byDarren Hood Modified over 9 years ago
1
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten, Ivan Osipkov, Microsoft Corporation 2008 ACM SIGCOMM 1
2
OUTLINE 1.INTRODUCTION 2.AUTORE 3.AUTOMATIC URL REGULAR EXPRESSION GENERATION 4.DATASETS AND RESULTS 5.BOTNET VALIDATION 6.UNDERSTANDING SPAMMING BOTNET CHARACTERISTICS 7.DISCUSSION 8.CONCLUSION 2
3
1.INTRODUCTION Botnets have been widely used for sending spam emails AutoRE that identifies botnet hosts by generating botnet spam signatures from emails. Challenging to derive URL. 3
4
1.INTRODUCTION(cont.) Complete URL based signatures, Conjunction based signatures and Regular expression signatures. Desirable features of AutoRE Spamming botnet characteristics and their activity trends. 4
5
2. AUTORE Bursty and Distributed AutoRE is comprised of the following three modules: URL Pre-Processing URL Group Selection Signature Generation and Botnet Identification 5
6
2. AUTORE(cont.) 6
7
URL Pre-Processing Information extracting. Discard all forwarded emails. Grouping URLs from the same domain together. 7
8
URL Group Selection Each email might be associated with multiple groups. Bursty property. 8
9
Signature Generation and Botnet Identification Complete URL based signatures and Regular expression signatures. Distributed, Bursty, and Specific This signature characterizes the set of matching emails as botnet-based spam and the originating mail servers as botnet hosts. 9
10
3.AUTOMATIC URL REGULAR EXPRESSION GENERATION Generation module: Signature Tree Construction Regular Expression Generation Signature Quality Evaluation Polymorphic URLs 10
11
Signature Tree Construction Most frequent substring that is both bursty and distributed. Keyword-based signature tree. 11
12
Regular Expression Generation Detailing and Generalization. C{l 1, l 2 }, C is the character set, and l 1 and l 2 are the min. and max. substring lengths. 12
13
Signature Quality Evaluation Entropy reduction, the probability of a random string matching a signature. 1 AutoRE discards all signatures whose entropy reductions are smaller than a preset threshold. 13
14
4.DATASETS AND RESULTS Dataset was collected in Nov 2006, Jun 2007, and Jul 2007, with a total of 5,382,460 sampled emails (sampling rate 1:25000). Excluding those that originated from blacklisted IPs, such as the ones published by Spamhaus. Ignored the classification labels. 14
15
DATASETS AND RESULTS(cont.) AutoRE identified 7,721 botnet-based spam campaigns. Include 580,466 spam messages, sent from 340,050 distinct botnet host IP addresses spanning 5,916 ASes. 15
16
DATASETS AND RESULTS(cont.) CU category (70.3-79.6%) RE category (20.4-29.7%) 16
17
DATASETS AND RESULTS(cont.) Cumulative distribution of botnet size in terms of number of distinct IPs involved. 17
18
DATASETS AND RESULTS(cont.) Number of regular expression patterns before and after generalization. 18
19
DATASETS AND RESULTS(cont.) Percentage of spam captured by AutoRE signatures. 19
20
5.BOTNET VALIDATION 20
21
Evaluation of Botnet URL Signatures False Positive Rate: 21
22
Evaluation of Botnet URL Signatures Ability to Detect Future Spam. The signatures derived in Nov 2006 and Jun 2007 to the (sampled) emails collected in Jul 2007. 22
23
Evaluation of Botnet URL Signatures Regular Expressions vs Keyword Conjunctions. (e.g., token1.*token2.*token3) 23
24
Evaluation of Botnet URL Signatures Domain-Specific vs Domain-Agnostic Signatures. 24
25
Evaluation of Botnet IP Addresses (a) The false positive rate over the total identified IPs (sessions). (b) Total botnet-based spam volume. 25
26
Is Each Campaign a Group? 26
27
6.UNDERSTANDING SPAMMING BOTNET CHARACTERISTICS All Botnet Hosts: A General Perspective Per Campaign: An Individual Perspective Comparison of Different Campaigns Correlation with Scanning Traffic 27
28
All Botnet Hosts: A General Perspective Distribution of Botnet IP Addresses 28
29
All Botnet Hosts: A General Perspective The number of ASes vs. the number of IPs for each spam campaign. 29
30
All Botnet Hosts: A General Perspective More than 80% of campaigns have at least half of their hosts in the dynamic IP ranges. 30
31
All Botnet Hosts: A General Perspective Spam Sending Patterns Number of recipients per email Connections per second Nonexisting recipient frequency 31
32
Per Campaign: An Individual Perspective Similarity of Email Properties Similarity of Sending Time Similarity of Email Sending Behavior 32
33
Similarity of Email Properties The contents are quite different even though their target web pages are similar. 33
34
Similarity of Sending Time 50% of campaigns have std (standard deviation) less than 1.81 hours. 34
35
Similarity of Email Sending Behavior 35
36
Comparison of Different Campaigns Botnets sharing a domain-agnostic signature barely overlap with each other. 36
37
Correlation with Scanning Traffic Botnet attacks have different phases. 37
38
7.DISCUSSION AutoRE has the potential to work in real time mode. Spammers may attempt to craft emails to evade the AutoRE URL selection process. Spammers may wish to evade detection by having no patterns in their URLs. 38
39
7.DISCUSSION(cont.) AutoRE leverages the bursty and distributed features of botnet attacks for detection. URL redirection techniques. 39
40
8.CONCLUSION AutoRE generates regular expression signatures, which were previously written by human experts only. Low false positive rate. Botnets several important findings. Botnets are evolving and getting increasingly sophisticated. 40
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.