Presentation is loading. Please wait.

Presentation is loading. Please wait.

MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research.

Similar presentations


Presentation on theme: "MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research."— Presentation transcript:

1 MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research Labs; Kang G. Shin, University of Michigan 2015. 04. 21 박 종 화 akdwhd0921@gmail.com 컴퓨터 보안 및 운영체제 연구실

2 Computer Security & OS Lab. IndexIndex 2  Motivation  Architecture  Generic Unpacking Algorithm  Feature Extraction  Prototype-based clustering  Evaluation

3 Computer Security & OS Lab. MotivationMotivation 3  Why clustering malware?  The current lack of automatic and labeling of a large number of malware sample

4 Computer Security & OS Lab. MotivationMotivation 4  How to efficiently process this huge influx of new samples and accurately labels them? Family 1 Family 2 Family 3 Family 4  One possible solution is to automatically cluster malware sample  Prioritize limited resources  Avoid analyzing samples that have already been analyzed  Label new incoming samples by association  Generalized previous detection and mitigation strategies to new variants

5 Computer Security & OS Lab. ArchitecureArchitecure 5  MutantX-S is framework developed to automatically detect malware.  Does by analyzing a program’s static features(assembly code)  Process 1.Preprocess 2.Feature Extraction 3.Clustering

6 Computer Security & OS Lab. Generic Unpacking Algorithm 6  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware

7 Computer Security & OS Lab. Generic Unpacking Algorithm 7  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 0 X = 1 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory Executable but non-writable loads

8 Computer Security & OS Lab. Generic Unpacking Algorithm 8  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 0 X = 1 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory Executable but non-writable Memory write W Exception

9 Computer Security & OS Lab. Generic Unpacking Algorithm 9  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 1 X = 0 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory write Dirty page marking

10 Computer Security & OS Lab. Generic Unpacking Algorithm 10  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 1 X = 0 W = 1 X = 0 W = 1 X = 0 Packed data Unpacker code Memory pages Process Memory Finish unpacking X Exception Dump the process memory image unpacked malware disassembler

11 Computer Security & OS Lab. Feature Extraction 11  MutantX-S uses the IDA Pro to disassemble a malware program into a sequence of machine instructions that are then used for feature extraction.  Similarity comparison between malware samples based on the disassembled instruction sequences.  MutantX-S uses the opcode  Opcodes generalize well to represent variants of a malware family.  Opcode sequence offers a better representation of instruction semantics.

12 Computer Security & OS Lab. Feature Extraction 12  N-gram analysis to embedded features into feature vectors -The number of dimensions D determines the complexity -D increases exponentially with N in N-gram( where |O| is the number of different opcodes)  Hashing kernel  Reduce dimensionality of the feature vector  Save both storage and computation overhead  Incur only small penalty on the feature vector distance

13 Computer Security & OS Lab. Prototype-Based Clustering 13  The process repeats until the distance from all the data points to their nearest prototype is smaller than a predefined threshold P max.

14 Computer Security & OS Lab. EvaluationEvaluation 14  Data set  Reference data set : 4821 samples  Large data set : 132,234 samples  System configuration  Core i7 3.0G Hz CPU  12 G memory

15 Computer Security & OS Lab. Clustering Accuracy and Running Time 15  Comparing with existing cluster methods:  MutantX : less than 30s, Hierarchical: 51.3(precision 0.82), k-mean: 32.3s(precision 0.75)

16 Computer Security & OS Lab. Impact of Hash Size 16 In practice, a 12-bit hash function is found to be a good compromise, reducing the time and memory requirements by over 80% while still keeping good accuracy.

17 Computer Security & OS Lab. ReferencesReferences 17  N-gram-based Detection of New Malicious Code. Tony Abou-Assaleh, Nick Cercone, Vlado Keˇselj, Ray Sweidan Privacy and Security Laboratory, Faculty of Computer Science, Dalhousie University  http://www.av-test.org/en/ http://www.av-test.org/en/  https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GDa ta_PCMWR_H1_2014_EN_v2.pdf https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GDa ta_PCMWR_H1_2014_EN_v2.pdf  http://endic.naver.com/ http://endic.naver.com/  www.Wikipedia.org www.Wikipedia.org  Etc.

18 Computer Security & OS Lab. 18 Thank You !


Download ppt "MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research."

Similar presentations


Ads by Google