Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

Similar presentations


Presentation on theme: "Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th."— Presentation transcript:

1 Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th Workshop “Software Engineering Education and Reverse Engineering” Sinaia, Romania 24-30 August 2014

2 14th Workshop SEE and RE 2/16 Agenda Clone detection Binary code clones Metrics approach Conclusions

3 14th Workshop SEE and RE 3/16 Motivation (1) A motivating scenario is to find the reuse of a software library in a source code without an appropriate permission from the owner of the library.

4 14th Workshop SEE and RE 4/16 Code clones Type-1: Identical code (ignoring formatting) Type-2: Syntactically identical fragments (ignoring naming and formatting) Type-3: Copied fragments with further modifications (ignoring some statements, naming and formatting) Type-4: Two or more code fragments that perform the same computation

5 14th Workshop SEE and RE 5/16 Existing tools SimCadCCFinderDeckardACDMoss Supported languages C, C#, Java, Py C/C++, C#, Cobol, Java, VB, Text C, Java, PhpC/C++ C/C++, C#, Cobol, Java, VB, MIPS, Text… Language in experiment CCCCASM Comparison level block, procedure file Clone detection technique text basedtoken basedAST based text based (ASM generated from C) text based Types of detected clones 1, 2, and 3 Source code required not available for commercial product

6 14th Workshop SEE and RE 6/16 Motivation (2) A motivating scenario is to find the reuse of a software library in a commercial product binary without an appropriate permission from the owner of the library. Source code transformed by compiler (what compiler?) ARM architecture

7 14th Workshop SEE and RE 7/16 Approach

8 14th Workshop SEE and RE 8/16 Approach

9 14th Workshop SEE and RE 9/16 Metrics Metric’s description Acronym Value type Measure type Flow type Number of all instructions AINSA- Number of all branches ABNSAC Number of all calls ACNSAC Number of all loops APNSAC Number of all arithmetic instructions AANSAD Number of all logic instructions ALNSAD Number of all data transfer instructions ATNSAD Frequency of all branches ABFSNC Frequency of all calls ACFSNC Frequency of all loops APFSNC Frequency of all arithmetic instructions AAFSND Frequency of all logic instructions ALFSND Frequency of all data transfer instructions ATFSND Number of occurrences for each instruction EINVA- Frequency of occurrences for each instruction EIFVN- Number of occurrences for each target address in branches EBNVAC Frequency of occurrences for each target address in branches EBFVNC Number of occurrences for each target address in calls ECNVAC Frequency of occurrences for each target address in calls ECFVNC

10 14th Workshop SEE and RE 10/16 Filters/Formulas Filters: - No filtering - Adaptive filtering (based on previous knowledge) - Interval filtering Formulas: - Arithmetic mean - Geometric mean - Harmonic mean - Weighted functions (based on previous knowledge)

11 14th Workshop SEE and RE 11/16 Results (STAMP + Busy Box)

12 14th Workshop SEE and RE 12/16 Results (STAMP + Busy Box) Support Vector Machines and K-Nearest neighbors had much lower results!

13 14th Workshop SEE and RE 13/16 Results (STAMP + Busy Box)

14 Configurations with newly introduced metrics achieves up to 1.44 times better recall than configurations that use only metrics from the high level languages. Comparison of the proposed approach with some clone detection tools shows that it achieves a higher recall for an acceptable level of precision. Observing only the first position, for the real world example, the proposed approach achieves recall of 43% and precision of 43% (Busy Box). 14th Workshop SEE and RE 14/16 Conclusion

15 14th Workshop SEE and RE 15/16 Motivation (3) - final A motivating scenario is to find the use of a patent in a commercial product binary without an appropriate permission from the owner of the patent.

16 Thank you! Radivojevic Zaharije


Download ppt "Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th."

Similar presentations


Ads by Google