Presentation is loading. Please wait.

Presentation is loading. Please wait.

Corrado LeitaSymantec Research Labs Ulrich Bayer Technical University Vienna Engin KirdaInstitute iSecLab.

Similar presentations

Presentation on theme: "Corrado LeitaSymantec Research Labs Ulrich Bayer Technical University Vienna Engin KirdaInstitute iSecLab."— Presentation transcript:

1 Corrado LeitaSymantec Research Labs Ulrich Bayer Technical University Vienna Engin KirdaInstitute Eurecom @ iSecLab

2 Outline  Introduction  Related Work  SGNET and EPM Clustering  Results  Conclusion 2010/7/20 2 ADLab Meeting

3 2010/7/20ADLab Meeting 3

4 Introduction 2010/7/20 4 ADLab Meeting

5 Introduction 2010/7/20 5 ADLab Meeting

6 Introduction 2010/7/20ADLab Meeting 6

7 2010/7/20ADLab Meeting 7

8 Related Work  Ghorghescu, 2005  Disassembling  Comparing their basic blocks  Kolter and Maloof, 2006  Comparing a hex dump of their code segments  Wicherski, 2009, peHash  Polymorphic binaries receive the same hash value  According to the portions of the PE header that are not mutated 2010/7/20ADLab Meeting 8

9 Related Work  Lee and Mody, 2006  Based on system call traces  First attempts to cluster malware according to its behavior  Bailey et al., 2007  The first builds a clustering system that described a sample’s behavior in more abstract terms  O(n^2) 2010/7/20ADLab Meeting 9

10 Related Work  Anubis   Data tainting  The tracking of sensitive compare operations  Dynamic analysis system for capturing a sample’s behavior 2010/7/20ADLab Meeting 10

11 2010/7/20ADLab Meeting 11

12 SGNET and EPM Clustering 2010/7/20ADLab Meeting 12

13 SGNET and EPM Clustering  SGNET  ScriptGen  Learning 0-day behavior  Argos  Program flow hijack detection  Nepenthes  Shellcode emulation  Malware download 2010/7/20ADLab Meeting 13

14 SGNET and EPM Clustering  Sensor: ScriptGen FSM  Sample Factory: Argos  Shellcode handlers: Nepenthes 2010/7/20ADLab Meeting 14

15 2010/7/20ADLab Meeting 15

16 EPM Clustering 2010/7/20ADLab Meeting 16

17 EPM Clustering  Phase 1: feature definition 2010/7/20ADLab Meeting 17

18 EPM Clustering 2010/7/20ADLab Meeting 18  Pi  PUSH-based interaction  PULL-based interaction  Central repository  Mu  PE header characteristics seem to be more difficult to mutate  The change in their value is likely to be associated to a modification or recompilation of existing codebase

19 EPM Clustering  Clearly, all of the features taken into account for the classification could be easily randomized by the malware writer  More complex (costly) polymorphic approaches might appear in the future 2010/7/20ADLab Meeting 19

20 EPM Clustering  Phase 2: invariant discovery  An invariant value is a value that is not specific to a certain..  Attack instance  Attacker  Destination  Threshold-based:  At least 10 different attack instances  At least 3 different attackers  At least 3 honeypot IPs 2010/7/20ADLab Meeting 20

21 EPM Clustering  Phase 3: pattern discovery  T = v 1, v 2, v 3, …, v n 2010/7/20ADLab Meeting 21

22 EPM Clustering  Phase 4: pattern-based classification  Clustering  Multiple patterns could match the same instance  Each instance is always associated with the most specific pattern matching its feature values  All the instances associated to the same pattern are said to belong to the same EPM cluster 2010/7/20ADLab Meeting 22

23 EPM Clustering  E-clusters  Exploit  P-clusters  Payload  M-clusters  Malware 2010/7/20ADLab Meeting 23

24 EPM Clustering 2010/7/20ADLab Meeting 24

25 2010/7/20ADLab Meeting 25

26 Results  Data: Jan 2008 ~ May 2009, collected by SGNET deployment  6353 malware samples  Only 5165 can be correctly executed in Anubis  Some malwares can not download correctly by Nepenthes 2010/7/20ADLab Meeting 26

27 Results  39 E-clusters  27 P-clusters  260 M-clusters  972 B-clusters 2010/7/20ADLab Meeting 27

28 Results 2010/7/20ADLab Meeting 28

29 Results  #(exploit/payload combinations) is low  Most malware variants seem to be sharing few distinct exploitation routines for propagation  #(B-clusters) is lower than #(M-clusters)  Some M-clusters are likely to correspond to variations of the same codebase 2010/7/20ADLab Meeting 29

30 Results 2010/7/20ADLab Meeting 30

31 Results 2010/7/20ADLab Meeting 31

32 Results  P-pattern 45:  PUSH-based download  TCP port 9988 2010/7/20ADLab Meeting 32

33 Results  M-cluster 13: 2010/7/20ADLab Meeting 33

34 Results  M-cluster 13 is a polymorphic malware associated to several different B-clusters  MD5 is not an invariant  Allaple mutates its content at each attack instance 2010/7/20ADLab Meeting 34

35 Results  Each behavioral profile corresponds to an execution time of 4 mins  Bot? Honeypots may help! 2010/7/20ADLab Meeting 35

36 Results 2010/7/20ADLab Meeting 36

37 Results  Allaple  Worm exploiting MS04-007  DoS attacks 2010/7/20ADLab Meeting 37

38 Results  IRC servers 2010/7/20ADLab Meeting 38

39 2010/7/20ADLab Meeting 39

40 Conclusion  Combine different clustering techniques  Improve effectiveness in building intelligence on the threats economy 2010/7/20ADLab Meeting 40

Download ppt "Corrado LeitaSymantec Research Labs Ulrich Bayer Technical University Vienna Engin KirdaInstitute iSecLab."

Similar presentations

Ads by Google