MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Grifﬁn, Symantec Research.

MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Grifﬁn, Symantec Research Labs; Kang G. Shin, University of Michigan 2015. 04. 21 박 종 화 akdwhd0921@gmail.com 컴퓨터 보안 및 운영체제 연구실

Computer Security & OS Lab. IndexIndex 2  Motivation  Architecture  Generic Unpacking Algorithm  Feature Extraction  Prototype-based clustering  Evaluation

Computer Security & OS Lab. MotivationMotivation 3  Why clustering malware?  The current lack of automatic and labeling of a large number of malware sample

Computer Security & OS Lab. MotivationMotivation 4  How to efficiently process this huge influx of new samples and accurately labels them? Family 1 Family 2 Family 3 Family 4  One possible solution is to automatically cluster malware sample  Prioritize limited resources  Avoid analyzing samples that have already been analyzed  Label new incoming samples by association  Generalized previous detection and mitigation strategies to new variants

Computer Security & OS Lab. ArchitecureArchitecure 5  MutantX-S is framework developed to automatically detect malware.  Does by analyzing a program’s static features(assembly code)  Process 1.Preprocess 2.Feature Extraction 3.Clustering

Computer Security & OS Lab. Generic Unpacking Algorithm 6  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware

Computer Security & OS Lab. Generic Unpacking Algorithm 7  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 0 X = 1 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory Executable but non-writable loads

Computer Security & OS Lab. Generic Unpacking Algorithm 8  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 0 X = 1 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory Executable but non-writable Memory write W Exception

Computer Security & OS Lab. Generic Unpacking Algorithm 9  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 1 X = 0 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory write Dirty page marking

Computer Security & OS Lab. Generic Unpacking Algorithm 10  Exploits an inherent property of unpacking process  A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution.  Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 1 X = 0 W = 1 X = 0 W = 1 X = 0 Packed data Unpacker code Memory pages Process Memory Finish unpacking X Exception Dump the process memory image unpacked malware disassembler

Computer Security & OS Lab. Feature Extraction 11  MutantX-S uses the IDA Pro to disassemble a malware program into a sequence of machine instructions that are then used for feature extraction.  Similarity comparison between malware samples based on the disassembled instruction sequences.  MutantX-S uses the opcode  Opcodes generalize well to represent variants of a malware family.  Opcode sequence offers a better representation of instruction semantics.

Computer Security & OS Lab. Feature Extraction 12  N-gram analysis to embedded features into feature vectors -The number of dimensions D determines the complexity -D increases exponentially with N in N-gram( where |O| is the number of different opcodes)  Hashing kernel  Reduce dimensionality of the feature vector  Save both storage and computation overhead  Incur only small penalty on the feature vector distance

Computer Security & OS Lab. Prototype-Based Clustering 13  The process repeats until the distance from all the data points to their nearest prototype is smaller than a predefined threshold P max.

Computer Security & OS Lab. EvaluationEvaluation 14  Data set  Reference data set : 4821 samples  Large data set : 132,234 samples  System configuration  Core i7 3.0G Hz CPU  12 G memory

Computer Security & OS Lab. Clustering Accuracy and Running Time 15  Comparing with existing cluster methods:  MutantX : less than 30s, Hierarchical: 51.3(precision 0.82), k-mean: 32.3s(precision 0.75)

Computer Security & OS Lab. Impact of Hash Size 16 In practice, a 12-bit hash function is found to be a good compromise, reducing the time and memory requirements by over 80% while still keeping good accuracy.

Computer Security & OS Lab. ReferencesReferences 17  N-gram-based Detection of New Malicious Code. Tony Abou-Assaleh, Nick Cercone, Vlado Keˇselj, Ray Sweidan Privacy and Security Laboratory, Faculty of Computer Science, Dalhousie University  http://www.av-test.org/en/ http://www.av-test.org/en/  https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GDa ta_PCMWR_H1_2014_EN_v2.pdf https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GDa ta_PCMWR_H1_2014_EN_v2.pdf  http://endic.naver.com/ http://endic.naver.com/  www.Wikipedia.org www.Wikipedia.org  Etc.

Computer Security & OS Lab. 18 Thank You !

MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Grifﬁn, Symantec Research.

Similar presentations

Presentation on theme: "MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Grifﬁn, Symantec Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Grifﬁn, Symantec Research.

Similar presentations

Presentation on theme: "MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Grifﬁn, Symantec Research."— Presentation transcript:

Similar presentations

About project

Feedback