Presentation is loading. Please wait.

Presentation is loading. Please wait.

By: Georg Wicherski Presenting: Rasika Bindoo. Introduction Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback.

Similar presentations


Presentation on theme: "By: Georg Wicherski Presenting: Rasika Bindoo. Introduction Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback."— Presentation transcript:

1 By: Georg Wicherski Presenting: Rasika Bindoo

2 Introduction Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow. Thus development of peHash for clustering group instances of the same polymorphic instances.

3 Other Attempts at Hashing Spamsum, mrshash n-grams Signatures Vx-Class

4 peHash Function Design The function should have the following design characteristics It should not have the need to look into the contents of the sections. Low computational complexity. Scaling the result of the bzip2 compression ratio to [0…7] С N leads to best matches.

5 Structural properties The polymorphic malware share the same structural Portable Executable properties. Thus following properties are taken into account for distinction between binaries :  Image characteristics.  Subsystem.  Stack commit size.  Heap commit size.

6 Structural properties Structural information used for each section in the Portable Executable.  Virtual address  Raw size  Section Characteristics

7 Generation of hash values hash[0] := characteristics[0…7] V characteristics[8…15] hash[1] := subsystem[0…7] V subsystem [8…15] hash[2] := stackcommit[0…7] V stackcommit[8…15] V stackcommit[24…31] hash[3] := heapcommit[0…7] V heapcommit [8…15] V heapcommit[24…31] ‘V’ symbolizes XOR operation

8 Generation of hash values Sub-hash shash[0] := virtaddress-9…31] shash[2] := rawsize[8…31] shash[4] := characteristics[16…23] V characteristics[24…31] shash[5] := kolmogorov [0…7] С N

9 Advantages of this hash function Complexity is O(1). SHA1 of the hash buffer is calculated to obtain the final hash value. Thus difficult to create collisions. Constant length hashes are generated in spite of variable number of sections in the executables.

10 Entry Points and Imports The value of entry point can be easily changed for each instance of polymorphic specimen. Most packers specify misleading Import Address Tables. The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function. Thus both entry point information and imports are not included in hash function.

11 Evaluation Cluster Size Mwcollect Alliance Arbor Networks 1710916543 2-931654104 10-99549611 100-4997071 500-999194 1000-4999188 5000+72 peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.

12 Evaluation FileMD5Size diantz.exe 48734e9b45dca36 e8a… 85504 makecab.exe 2740dc2fbefaddb8 91f… 85504 find.exe 09b4e22c86f7e9f1 e5… 9216 print.exe 76b96ed5304319f2 08… 9216 subst.exe 77847ef3cec784b13 7… 9216 bootvrfy.exe c2ab77d9dc66447 dc1… 5120 comrereg.exe 908f0eda6a49625f 98… 5120 dcomcnfg.exe 1178cd20b9093683 7d… 5120 Files in broken cluster share same size. Differentiation can be done only by looking at actual code or imports. Hence not possible for peHash.

13 Performance Analysis to be carried out for one sample per peHash cluster. Performance is not related to binary size or section count.

14 Conclusion peHash provides a performant solution to the problem of seemingly new malware samples. peHash can accomplish correct clustering for large sets by using basic information from Portable Executables. peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.

15 Thank You


Download ppt "By: Georg Wicherski Presenting: Rasika Bindoo. Introduction Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback."

Similar presentations


Ads by Google