Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,

Similar presentations

Presentation on theme: "Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,"— Presentation transcript:

1 Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten, Ivan Osipkov, Microsoft Corporation 2008 ACM SIGCOMM 1


3 1.INTRODUCTION  Botnets have been widely used for sending spam emails  AutoRE that identifies botnet hosts by generating botnet spam signatures from emails.  Challenging to derive URL. 3

4 1.INTRODUCTION(cont.)  Complete URL based signatures, Conjunction based signatures and Regular expression signatures.  Desirable features of AutoRE  Spamming botnet characteristics and their activity trends. 4

5 2. AUTORE  Bursty and Distributed  AutoRE is comprised of the following three modules:  URL Pre-Processing  URL Group Selection  Signature Generation and Botnet Identification 5

6 2. AUTORE(cont.) 6

7 URL Pre-Processing  Information extracting.  Discard all forwarded emails.  Grouping URLs from the same domain together. 7

8 URL Group Selection  Each email might be associated with multiple groups.  Bursty property. 8

9 Signature Generation and Botnet Identification  Complete URL based signatures and Regular expression signatures.  Distributed, Bursty, and Specific  This signature characterizes the set of matching emails as botnet-based spam and the originating mail servers as botnet hosts. 9

10 3.AUTOMATIC URL REGULAR EXPRESSION GENERATION  Generation module:  Signature Tree Construction  Regular Expression Generation  Signature Quality Evaluation  Polymorphic URLs 10

11 Signature Tree Construction  Most frequent substring that is both bursty and distributed.  Keyword-based signature tree. 11

12 Regular Expression Generation  Detailing and Generalization.  C{l 1, l 2 }, C is the character set, and l 1 and l 2 are the min. and max. substring lengths. 12

13 Signature Quality Evaluation  Entropy reduction, the probability of a random string matching a signature.  1  AutoRE discards all signatures whose entropy reductions are smaller than a preset threshold. 13

14 4.DATASETS AND RESULTS  Dataset was collected in Nov 2006, Jun 2007, and Jul 2007, with a total of 5,382,460 sampled emails (sampling rate 1:25000).  Excluding those that originated from blacklisted IPs, such as the ones published by Spamhaus.  Ignored the classification labels. 14

15 DATASETS AND RESULTS(cont.)  AutoRE identified 7,721 botnet-based spam campaigns. Include 580,466 spam messages, sent from 340,050 distinct botnet host IP addresses spanning 5,916 ASes. 15

16 DATASETS AND RESULTS(cont.)  CU category (70.3-79.6%)  RE category (20.4-29.7%) 16

17 DATASETS AND RESULTS(cont.)  Cumulative distribution of botnet size in terms of number of distinct IPs involved. 17

18 DATASETS AND RESULTS(cont.)  Number of regular expression patterns before and after generalization. 18

19 DATASETS AND RESULTS(cont.)  Percentage of spam captured by AutoRE signatures. 19


21 Evaluation of Botnet URL Signatures  False Positive Rate: 21

22 Evaluation of Botnet URL Signatures  Ability to Detect Future Spam.  The signatures derived in Nov 2006 and Jun 2007 to the (sampled) emails collected in Jul 2007. 22

23 Evaluation of Botnet URL Signatures  Regular Expressions vs Keyword Conjunctions. (e.g., token1.*token2.*token3) 23

24 Evaluation of Botnet URL Signatures  Domain-Specific vs Domain-Agnostic Signatures. 24

25 Evaluation of Botnet IP Addresses  (a) The false positive rate over the total identified IPs (sessions).  (b) Total botnet-based spam volume. 25

26 Is Each Campaign a Group? 26

27 6.UNDERSTANDING SPAMMING BOTNET CHARACTERISTICS  All Botnet Hosts: A General Perspective  Per Campaign: An Individual Perspective  Comparison of Different Campaigns  Correlation with Scanning Traffic 27

28 All Botnet Hosts: A General Perspective  Distribution of Botnet IP Addresses 28

29 All Botnet Hosts: A General Perspective  The number of ASes vs. the number of IPs for each spam campaign. 29

30 All Botnet Hosts: A General Perspective  More than 80% of campaigns have at least half of their hosts in the dynamic IP ranges. 30

31 All Botnet Hosts: A General Perspective  Spam Sending Patterns  Number of recipients per email  Connections per second  Nonexisting recipient frequency 31

32 Per Campaign: An Individual Perspective  Similarity of Email Properties  Similarity of Sending Time  Similarity of Email Sending Behavior 32

33 Similarity of Email Properties  The contents are quite different even though their target web pages are similar. 33

34 Similarity of Sending Time  50% of campaigns have std (standard deviation) less than 1.81 hours. 34

35 Similarity of Email Sending Behavior 35

36 Comparison of Different Campaigns  Botnets sharing a domain-agnostic signature barely overlap with each other. 36

37 Correlation with Scanning Traffic  Botnet attacks have different phases. 37

38 7.DISCUSSION  AutoRE has the potential to work in real time mode.  Spammers may attempt to craft emails to evade the AutoRE URL selection process.  Spammers may wish to evade detection by having no patterns in their URLs. 38

39 7.DISCUSSION(cont.)  AutoRE leverages the bursty and distributed features of botnet attacks for detection.  URL redirection techniques. 39

40 8.CONCLUSION  AutoRE generates regular expression signatures, which were previously written by human experts only.  Low false positive rate.  Botnets several important findings.  Botnets are evolving and getting increasingly sophisticated. 40

Download ppt "Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,"

Similar presentations

Ads by Google