Download presentation
Presentation is loading. Please wait.
Published byStuart Harrison Modified over 9 years ago
1
MutantX-S: Scalable Malware Clustering Based on Static Features Xin Hu, IBM T.J. Watson Research Center; Sandeep Bhatkar and Kent Griffin, Symantec Research Labs; Kang G. Shin, University of Michigan 2015. 04. 21 박 종 화 akdwhd0921@gmail.com 컴퓨터 보안 및 운영체제 연구실
2
Computer Security & OS Lab. IndexIndex 2 Motivation Architecture Generic Unpacking Algorithm Feature Extraction Prototype-based clustering Evaluation
3
Computer Security & OS Lab. MotivationMotivation 3 Why clustering malware? The current lack of automatic and labeling of a large number of malware sample
4
Computer Security & OS Lab. MotivationMotivation 4 How to efficiently process this huge influx of new samples and accurately labels them? Family 1 Family 2 Family 3 Family 4 One possible solution is to automatically cluster malware sample Prioritize limited resources Avoid analyzing samples that have already been analyzed Label new incoming samples by association Generalized previous detection and mitigation strategies to new variants
5
Computer Security & OS Lab. ArchitecureArchitecure 5 MutantX-S is framework developed to automatically detect malware. Does by analyzing a program’s static features(assembly code) Process 1.Preprocess 2.Feature Extraction 3.Clustering
6
Computer Security & OS Lab. Generic Unpacking Algorithm 6 Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution. Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware
7
Computer Security & OS Lab. Generic Unpacking Algorithm 7 Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution. Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 0 X = 1 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory Executable but non-writable loads
8
Computer Security & OS Lab. Generic Unpacking Algorithm 8 Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution. Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 0 X = 1 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory Executable but non-writable Memory write W Exception
9
Computer Security & OS Lab. Generic Unpacking Algorithm 9 Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution. Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 1 X = 0 W = 0 X = 1 W = 0 X = 1 Packed data Unpacker code Memory pages Process Memory write Dirty page marking
10
Computer Security & OS Lab. Generic Unpacking Algorithm 10 Exploits an inherent property of unpacking process A packed binary has to write the unpacked code into some memory space and transfer control to the modified memory locations to continue execution. Tracks memory access via non-execution(NX) support in modern x86 CPU and OS Packed malware W = 1 X = 0 W = 1 X = 0 W = 1 X = 0 Packed data Unpacker code Memory pages Process Memory Finish unpacking X Exception Dump the process memory image unpacked malware disassembler
11
Computer Security & OS Lab. Feature Extraction 11 MutantX-S uses the IDA Pro to disassemble a malware program into a sequence of machine instructions that are then used for feature extraction. Similarity comparison between malware samples based on the disassembled instruction sequences. MutantX-S uses the opcode Opcodes generalize well to represent variants of a malware family. Opcode sequence offers a better representation of instruction semantics.
12
Computer Security & OS Lab. Feature Extraction 12 N-gram analysis to embedded features into feature vectors -The number of dimensions D determines the complexity -D increases exponentially with N in N-gram( where |O| is the number of different opcodes) Hashing kernel Reduce dimensionality of the feature vector Save both storage and computation overhead Incur only small penalty on the feature vector distance
13
Computer Security & OS Lab. Prototype-Based Clustering 13 The process repeats until the distance from all the data points to their nearest prototype is smaller than a predefined threshold P max.
14
Computer Security & OS Lab. EvaluationEvaluation 14 Data set Reference data set : 4821 samples Large data set : 132,234 samples System configuration Core i7 3.0G Hz CPU 12 G memory
15
Computer Security & OS Lab. Clustering Accuracy and Running Time 15 Comparing with existing cluster methods: MutantX : less than 30s, Hierarchical: 51.3(precision 0.82), k-mean: 32.3s(precision 0.75)
16
Computer Security & OS Lab. Impact of Hash Size 16 In practice, a 12-bit hash function is found to be a good compromise, reducing the time and memory requirements by over 80% while still keeping good accuracy.
17
Computer Security & OS Lab. ReferencesReferences 17 N-gram-based Detection of New Malicious Code. Tony Abou-Assaleh, Nick Cercone, Vlado Keˇselj, Ray Sweidan Privacy and Security Laboratory, Faculty of Computer Science, Dalhousie University http://www.av-test.org/en/ http://www.av-test.org/en/ https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GDa ta_PCMWR_H1_2014_EN_v2.pdf https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/GDa ta_PCMWR_H1_2014_EN_v2.pdf http://endic.naver.com/ http://endic.naver.com/ www.Wikipedia.org www.Wikipedia.org Etc.
18
Computer Security & OS Lab. 18 Thank You !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.