Framework for Safe Reuse Of Software Binaries Ramakrishnan Venkitaraman Advisor: Gopal Gupta The University of Texas at Dallas 11/15/2004.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

MATH 224 – Discrete Mathematics
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Programming Types of Testing.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Program analysis Mooly Sagiv html://
Program analysis Mooly Sagiv html://
Overview of program analysis Mooly Sagiv html://
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Overview of program analysis Mooly Sagiv html://
CHAPTER 10 Recursion. 2 Recursive Thinking Recursion is a programming technique in which a method can call itself to solve a problem A recursive definition.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Language Evaluation Criteria
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Control Flow Resolution in Dynamic Language Author: Štěpán Šindelář Supervisor: Filip Zavoral, Ph.D.
ELG6163 Presentation Geoff Green March 20, 2006 TI Standard for Writing Algorithms.
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
5-1 Chapter 5: Names, Bindings, Type Checking, and Scopes Variables The Concept of Binding Type Checking Strong Typing Type Compatibility Scope and Lifetime.
Department of Computer Science A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta.
Recursion Textbook chapter Recursive Function Call a recursive call is a function call in which the called function is the same as the one making.
ECE 353 Lab 1: Cache Simulation. Purpose Introduce C programming by means of a simple example Reinforce your knowledge of set associative caches.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
TMS320 DSP Algorithm Standard: Overview & Rationalization.
C++ History C++ was designed at AT&T Bell Labs by Bjarne Stroustrup in the early 80's Based on the ‘C’ programming language C++ language standardised in.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
Introduction to Code Generation and Intermediate Representations
Buffer Overflow Proofing of Code Binaries By Ramya Reguramalingam Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Dynamic Array. An Array-Based Implementation - Summary Good things:  Fast, random access of elements  Very memory efficient, very little memory is required.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Computer Organization and Design Pointers, Arrays and Strings in C Montek Singh Sep 18, 2015 Lab 5 supplement.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator.
R-Verify: Deep Checking of Embedded Code James Ezick † Donald Nguyen † Richard Lethin † Rick Pancoast* (†) Reservoir Labs (*) Lockheed Martin The Eleventh.
Static Analysis of Executable Assembly Code to Ensure QA and Reuse Ramakrishnan Venkitaraman Graduate Student, Research Track Computer Science, UT-Dallas.
PROGRAMMING FUNDAMENTALS INTRODUCTION TO PROGRAMMING. Computer Programming Concepts. Flowchart. Structured Programming Design. Implementation Documentation.
Hello world !!! ASCII representation of hello.c.
Analyzing and Transforming Binary Code (for Fun & Profit) Gopal Gupta R. Venkitaraman, R. Reghuramalingam The University of Texas at Dallas 11/15/2004.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Object Lifetime and Pointers
Chapter 1 Introduction.
Names and Attributes Names are a key programming language feature
YAHMD - Yet Another Heap Memory Debugger
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Chapter 1 Introduction.
This pointer, Dynamic memory allocation, Constructors and Destructor
CSCI1600: Embedded and Real Time Software
Programming Fundamentals (750113) Ch1. Problem Solving
Programming Fundamentals (750113) Ch1. Problem Solving
PROGRAMMING FUNDAMENTALS Lecture # 03. Programming Language A Programming language used to write computer programs. Its mean of communication between.
Programming Fundamentals (750113) Ch1. Problem Solving
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Framework for Safe Reuse Of Software Binaries Ramakrishnan Venkitaraman Advisor: Gopal Gupta The University of Texas at Dallas 11/15/2004

Software Reuse & System Integration But, the Integrated System does not work Cost of Project Companies

Outline Need for Reusable Software Binaries Our Framework for Reuse of Software Binaries XDAIS Standard for Software Standardization Automated tool to enforce standard compliance

Need for reusable software binaries Incompatibilities make integration difficult Complexity in software reuse COTS Marketplace Time to Market

Scope of the Framework Gives the sufficient conditions for software binary code reusability Usability vs. Reusability Usability is a precondition for reusability E.g. Array index out of bound reference

Conditions to ensure reusablility C1: The binary code should not change during execution in a way that link-time symbol resolution will become invalid C2: The binary code should not be written in a way that it needs to be located starting from some fixed location in the virtual memory

Broadening the Conditions C1 and C2 are hard to characterize and even harder to detect So, broaden the conditions C1 and C2 to get conditions C3 and C4

Framework to ensure reusability C3: The binary code is re-entrant No self-modifying code Should not make symbol resolution invalid C4: The binary code does not contain any hard-wired memory addresses Binaries should not be assumed to be located at a fixed virtual memory location

TI XDAIS Standard Contains 35 rules and 15 guidelines SIX General Programming Rules No tool currently exists to check for compliance We want to build a tool to ENFORCE software compliance for these rules

XDAIS – General Programming Rules 1)All programs should follow the runtime conventions of TI’s C programming language 2)Algorithms must be re-entrant 3)No hard coded data memory locations 4)No hard coded program memory locations 5)Algorithms must characterize their ROM-ability 6)No peripheral device accesses

Advantages Of Compliant Code Allows system integrators to easily migrate between TI DSP chips Subsystems from multiple software vendors can be integrated into a single system Programs are framework-agnostic: the same program can be efficiently used in virtually any application

XDAIS vs. Our Framework Rule 1 is not really a programming rule, since it requires compliance with TI's definition of the C Language Rules 2 through 5 are manifestations of conditions C3 and C4 above. Rules 2 and 5 correspond to condition C3 Rules 3, 4, and 6 correspond to condition C4

Problem and Solution Problem: Detection of hard coded addresses in programs without accessing source code. Solution: “Static Program Analysis of Assembly Code”

Some examples showing hardcoding void main() { int * p = 0x8800; // Some code *p = …; } Example1: Directly Hardcoded void main() { int *p = 0x80; int *q = p; //Some code *q = …; } Example2: Indirectly Hardcoded void main() { int *p, val; p = ….; val = …; if(val) p = 0x900; else p = malloc(…); *p; } Example3: Conditional Hardcoding NOTE: We don’t care if a pointer is hard coded and is never dereferenced.

Interest in Static Analysis “We actually went out and bought for 30 million dollars, a company that was in the business of building static analysis tools and now we want to focus on applying these tools to large-scale software systems ” Remarks by Bill Gates, 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Application, November 2002.

Static Analysis Defined as any analysis of a program carried out without completely executing the program Un-decidability: Impossible to build a tool that will precisely detect hard coding

Hard Coded Addresses Bad Programming Practice. Results in non relocatable code. Results in non reusable code.

Overview Of Our Approach Input: Object Code of the Software Output: Compliant or Not Compliant status Activity Diagram for our Static Analyzer Disassemble Object Code Split Into Functions Obtain Basic Blocks Obtain Flow Graph Static Analysis Output the Result

Basic Aim Of Analysis Find a path to trace pointer origin. Problem: Exponential Complexity Static Analysis approximation makes it linear

Analyzing Source Code – Easy { { q } } { { p } } P IS HARD CODED So, the program is not compliant with the standard

Analyzing Assembly Code is Hard Problem No type information is available Instruction level pipeline and parallelism Solution Backward analysis Use Abstract Interpretation

Analyzing Assembly – Hard A0 main: A0 07BD09C2 SUB.D2 SP,0x8,SP A4 020FA02A MVK.S2 0x1f40,B A8 023C22F6 STW.D2T2 B4,*+SP[0x1] AC NOP B0 023C42F6 STW.D2T2 B4,*+SP[0x2] B NOP B8 0280A042 MVK.D2 5,B BC F6 STW.D2T2 B5,*+B4[0x0] C NOP C4 008C8362 BNOP.S2 B3, C8 07BD0942 ADD.D2 SP,0x8,SP CC NOP D NOP {{ }} { { B4 } } B4 = 0x1f40 So, B4 is HARD CODED Code is NOT Compliant

Abstract Interpretation Based Analysis Domains from which variables draw their values are approximated by abstract domains The original domains are called concrete domains

Lattice Abstraction Lattice based abstraction is used to determine pointer hard-coded ness.

Contexts Contexts to Abstract Contexts Abstract Context to Context

Phases In Analysis Phase 1: Find the set of dereferenced pointers Phase 2: Check the safety of dereferenced pointers

Building Unsafe Sets (Phase 1) The first element is added to the unsafe set during pointer dereferencing. E.g. If “*Reg” in the disassembled code, the unsafe set is initialized to {Reg}. ‘N’ Pointers Dereferenced  ‘N’ Unsafe sets Maintained as SOUS (Set Of Unsafe Sets)

Populating Unsafe Sets (Phase 2) For e.g., if Reg = reg1 + reg2, the element “Reg” is deleted from the unsafe set, and the elements “reg1”, “reg2”, are inserted into the unsafe set. Contents of the unsafe set will now become {reg1, reg2}.

Pointer Arithmetic All pointer operations are abstracted during analysis

Handling Loops Complex: # iterations of loop may not be known until runtime. Cycle the loop until the unsafe set reaches a “fixed point”. No new information is added to the unsafe set during successive iterations.

Merging Information If no merging, then exponential complexity. Mandatory when loops Information loss. If (Cond) Then Block B Else Block C Block D Block A Block E

Extensive Compliance Checking Handle all cases occurring in programs Single pointer, double pointer, triple pointer… Global pointer variables Static and Dynamic arrays

Extensive Compliance Checking Loops – all forms (e.g. for, while…) Function calls Pipelining and Parallelism Merging information from multiple paths

Proof – Analysis is Sound Consistency of α and γ functions is established by showing the existence of Galois Connection. That is, x = α(γ(x)) y belongs to γ(α(y))

Analysis Results Program# Lines# * Ptrs # Hard Coded Chain Length Running Time (ms) t_read timer mcbsp figtest m_hdrv dat gui_codec codec stress demo

Sample Code

Fig. Flow Graph

Related Work UNO Project – Bell Labs Analyze at source level TI XDAIS Standard Contains 35 rules and 15 guidelines. SIX General Programming Rules. No tool currently exists to check for compliance.

Current Status and Future Work Prototype Implementation done But, context insensitive, intra-procedural Extend to context sensitive, inter-procedural. Extend compliance check for other rules.

So… Hard Coding is a bad programming practice Non relocatable/reusable code Automatically checking code for compatibility at assembly level is possible A Static Analysis based technique is useful and practical

Software Reuse & System Integration WOW!!!! It works… Select ONLY Compliant Software

More Information 1.R.Venkitaraman and G.Gupta, Static Program Analysis of Embedded Executable Assembly Code. Compilers, Architecture, and Synthesis for Embedded Systems (ACM CASES), September R.Venkitaraman and G.Gupta, Framework for Safe Reuse of Software Binaries. ICDCIT, December Masters Thesis Report – R.Venkitaraman, A Framework for Automatic Reusability Analysis Of Software Binaries, The University of Texas at Dallas

Questions…