Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.

Slides:



Advertisements
Similar presentations
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 15 Programming and Languages: Telling the Computer What to Do.
Advertisements

8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /15/2013 Lecture 11: MIPS-Conditional Instructions Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
CS0004: Introduction to Programming Introduction to Programming.
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Engineering Problem Solving With C++ An Object Based Approach Fundamental Concepts Chapter 1 Engineering Problem Solving.
Lecture 2: Do you speak Java?. From Problem to Program Last Lecture we looked at modeling with objects! Steps to solving a business problem –Investigate.
Implementation Of The Discrete Event Simulator Based On Distributed Processing Zaharije Radivojević 1, Ljubomir Samarđić, Miloš Cvetanović 1 1 Elektrotehnički.
Computers: Tools for an Information Age
Program Flow Charting How to tackle the beginning stage a program design.
The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
Partial Automation of an Integration Reverse Engineering Environment of Binary Code Author : Cristina Cifuentes Reverse Engineering, 1996., Proceedings.
Lecture 2: Do you speak Java?. From Problem to Program Last Lecture we looked at modeling with objects! Steps to solving a business problem –Investigate.
SMIILE Finaly COBOL! and what else is new Gordana Rakić, Zoran Budimac.
Implementation of Distributed Air Traffic Control Simulator Ranko Radovanović, Miloš Cvetanović, Zaharije Radivojević School of Electrical Engineering,
1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.
Programming Languages: Telling the Computers What to Do Chapter 16.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Custom driven scientific information extraction from digital libraries using integrated text mining services Betim Çiço, Adrian Besimi, Visar Shehu 14th.
University of Coimbra, DEI-CISUC
Dr. Tom WayCSC Code Reviews & Inspections CSC 4700 Software Engineering.
Levels of Architecture & Language CHAPTER 1 © copyright Bobby Hoggard / material may not be redistributed without permission.
CSCE 548 Code Review. CSCE Farkas2 Reading This lecture: – McGraw: Chapter 4 – Recommended: Best Practices for Peer Code Review,
Programming Language Rico Yu. Levels of Programming Languages 1.Low level languages 2.High level languages.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Teaching Database Courses Using Educational System ADVICE Miloš Cvetanović, Zaharije Radivojević School of Electrical Engineering, Belgrade University.
Computer system overview1 The Effects of Computers Pervasive in all professions How have computers affected my life? How have computers affected my life?
Configuration Management (CM)
Computer Programming A program is a set of instructions a computer follows in order to perform a task. solve a problem Collectively, these instructions.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
3/5/2009Computer software1 Introduction Computer System Hardware Software HW Kernel/OS API Application Programs SW.
Survey of Computer Science Fields Related to the Titles of Master Thesis at Faculty of Mathematics in Belgrade Dušan Tošić University of Belgrade, Faculty.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Principles of Software Development 1 Principles Of Software Design and Development Types of language / Choosing a language.
What is Programming? A program is a list of instructions that is executed by a computer to accomplish a particular task. Creating those instructions is.
Module 4 Part 2 Introduction To Software Development : Programming & Languages Introduction To Software Development : Programming & Languages.
Hassen Grati, Houari Sahraoui, Pierre Poulin DIRO, Université de Montréal Extracting Sequence Diagrams from Execution Traces using Interactive Visualization.
Teaching the simulator design in Java Zaharije Radivojević, Miloš Cvetanović 11th Workshop “Software Engineering Education and Reverse Engineering” Ohrid,
Scenario-Based Analysis of Software Architecture Rick Kazman, Gregory Abowd, Len Bass, and Paul Clements Presented by Cuauhtémoc Muñoz.
Gordana Rakić, Zoran Budimac
Chapter 14 Programming and Languages McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All rights reserved.
CASE Tools CSC 532 : Advance Topics CSC 532 : Advance Topics Software Engineering Software Engineering Dr. box Dr. box Moayad Almohaishi Moayad Almohaishi.
Using software metrics for estimating code similarities in binaries Saša Stojanović, Miloš Cvetanović, Zaharije Radivojević School of Electrical Engineering,
Lecture1 Instructor: Amal Hussain ALshardy. Introduce students to the basics of writing software programs including variables, types, arrays, control.
CS223: Software Engineering
Software Protection in Korea Ways to protect software-related inventions –Software Patent –Computer Program Copyright –Trade Secret –Confidentiality Contract.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
CASE Tools and their Effect on Software Quality
PROGRAMMING (1) LECTURE # 1 Programming and Languages: Telling the Computer What to Do.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Introduction To Software Development Environment.
Experience Report: System Log Analysis for Anomaly Detection
Why don’t programmers have to program in machine code?
CSCI-235 Micro-Computer Applications
课程名 编译原理 Compiling Techniques
Verification and Validation
COMP4211 : Advance Computer Architecture
Individual Research Presentation
Presentation transcript:

Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th Workshop “Software Engineering Education and Reverse Engineering” Sinaia, Romania August 2014

14th Workshop SEE and RE 2/16 Agenda Clone detection Binary code clones Metrics approach Conclusions

14th Workshop SEE and RE 3/16 Motivation (1) A motivating scenario is to find the reuse of a software library in a source code without an appropriate permission from the owner of the library.

14th Workshop SEE and RE 4/16 Code clones Type-1: Identical code (ignoring formatting) Type-2: Syntactically identical fragments (ignoring naming and formatting) Type-3: Copied fragments with further modifications (ignoring some statements, naming and formatting) Type-4: Two or more code fragments that perform the same computation

14th Workshop SEE and RE 5/16 Existing tools SimCadCCFinderDeckardACDMoss Supported languages C, C#, Java, Py C/C++, C#, Cobol, Java, VB, Text C, Java, PhpC/C++ C/C++, C#, Cobol, Java, VB, MIPS, Text… Language in experiment CCCCASM Comparison level block, procedure file Clone detection technique text basedtoken basedAST based text based (ASM generated from C) text based Types of detected clones 1, 2, and 3 Source code required not available for commercial product

14th Workshop SEE and RE 6/16 Motivation (2) A motivating scenario is to find the reuse of a software library in a commercial product binary without an appropriate permission from the owner of the library. Source code transformed by compiler (what compiler?) ARM architecture

14th Workshop SEE and RE 7/16 Approach

14th Workshop SEE and RE 8/16 Approach

14th Workshop SEE and RE 9/16 Metrics Metric’s description Acronym Value type Measure type Flow type Number of all instructions AINSA- Number of all branches ABNSAC Number of all calls ACNSAC Number of all loops APNSAC Number of all arithmetic instructions AANSAD Number of all logic instructions ALNSAD Number of all data transfer instructions ATNSAD Frequency of all branches ABFSNC Frequency of all calls ACFSNC Frequency of all loops APFSNC Frequency of all arithmetic instructions AAFSND Frequency of all logic instructions ALFSND Frequency of all data transfer instructions ATFSND Number of occurrences for each instruction EINVA- Frequency of occurrences for each instruction EIFVN- Number of occurrences for each target address in branches EBNVAC Frequency of occurrences for each target address in branches EBFVNC Number of occurrences for each target address in calls ECNVAC Frequency of occurrences for each target address in calls ECFVNC

14th Workshop SEE and RE 10/16 Filters/Formulas Filters: - No filtering - Adaptive filtering (based on previous knowledge) - Interval filtering Formulas: - Arithmetic mean - Geometric mean - Harmonic mean - Weighted functions (based on previous knowledge)

14th Workshop SEE and RE 11/16 Results (STAMP + Busy Box)

14th Workshop SEE and RE 12/16 Results (STAMP + Busy Box) Support Vector Machines and K-Nearest neighbors had much lower results!

14th Workshop SEE and RE 13/16 Results (STAMP + Busy Box)

Configurations with newly introduced metrics achieves up to 1.44 times better recall than configurations that use only metrics from the high level languages. Comparison of the proposed approach with some clone detection tools shows that it achieves a higher recall for an acceptable level of precision. Observing only the first position, for the real world example, the proposed approach achieves recall of 43% and precision of 43% (Busy Box). 14th Workshop SEE and RE 14/16 Conclusion

14th Workshop SEE and RE 15/16 Motivation (3) - final A motivating scenario is to find the use of a patent in a commercial product binary without an appropriate permission from the owner of the patent.

Thank you! Radivojevic Zaharije