Embedded Lab. Park Yeongseong
Introduction State of the art Core values Design Experiment Discussion Conclusion Q&A
Identifying same or similar code is very important Previous works ◦ Static source code comparison – C1 ◦ Static excutable code comparison – C2 ◦ Dynamic control flow based methods – C3 ◦ Dynamic API based methods – C4
Three highly desired requirements ◦ R1 – Resiliency ◦ R2 - Ability to directly work on binary executables ◦ R3 – Platform independence BUT!!!! Not satisfy requirement ◦ Static source code comparison – C1 R1 R2 ◦ Static excutable code comparison – C2 R1 ◦ Dynamic control flow based methods – C3 R1 R3 ◦ Dynamic API based methods – C4 R3
Introduce new approach ◦ Core-values 5 optimization options (-O0 ~ -O3, -Os) 3 Compilers ( GCC, TCC, WCC ) KlassMaster, Thicket, Loco/Diablo Obfuscators
Code Obfuscation Techniques ◦ data obfuscation, control obfuscation, layout obfuscation and preventive transformations ◦ indirect branches, control-flow flattening, function- pointer aliasing Static Analysis Based Plagiarism Detection ◦ String-based ◦ AST-based ◦ Token-based ◦ PDG-based ◦ Birthmark-based
Dynamic Analysis Based Plagiarism Detection ◦ Whole program path based (WPP) ◦ Sequence of API function calls birthmark(EXESEQ) ◦ Frequency of API function calls birthmark(EXEFREQ) ◦ System call based birthmark
Not all values associated with the execution of a program are core-values ◦ Value-updating instruction ◦ Related to the program’s semantics
To refine value sequences ◦ Sequential refinement – reduction rate 16%~34% ◦ Optimization-based refinement – 5 optimization ◦ Address removal – exclude pointer values
Intel Quad-Core 2.00 GHz CPU 4GB RAM Linux machin QEMU Questions 1.resilient 2.false accusation 3.credible
Obfuscation techniques ◦ SandMark, KlassMaster : Java bytecode obfuscators Test application : Jlex ◦ Lexical analyzer
Test Application ◦ 5 individual XML pasers:expat, libxml2, Parsifal, rxp,xercesc
Test application ◦ Bzip2, gzip, oggenc, 9 of 11 programs Result ◦ Similarity scores between 0 and 0.27 ◦ zip and gzip similarity scores are 1.0 Same compression algorithm : deflate ◦ zip and bzip2 similarity scores are 0.01 to 0.03 Different compression algorithm : block sorting
introduce a novel approach to dynamic characterization of executable programs. The value-based method successfully discriminates 34 plagiarisms by SandMark, KlassMaster, Thicket.