Copyright © 2016 Curt Hill Static Code Analysis What it is and does.

Slides:



Advertisements
Similar presentations
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Advertisements

Lecture Roger Sutton CO331 Visual programming 15: Debugging 1.
Floating-Point and High-Level Languages Programming Languages Spring 2004.
UNIT 3 TEMPLATE AND EXCEPTION HANDLING. Introduction  Program errors are also referred to as program bugs.  A C program may have one or more of four.
Sahar Mosleh California State University San MarcosPage 1 A for loop can contain multiple initialization actions separated with commas Caution must be.
Basics of Java IMPORTANT: Read Chap 1-6 of How to think like a… Lecture 3.
Current Assignments Homework 2 is available and is due in three days (June 19th). Project 1 due in 6 days (June 23 rd ) Write a binomial root solver using.
Copyright © 2015 Curt Hill Java for Minecraft Those things you should know.
Copyright © Curt Hill The Compound Statement C-Family Languages and Scope.
Copyright © Curt Hill Simple I/O Input and Output using the System and Scanner Objects.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Copyright © Curt Hill The C++ IF Statement More important details More fun Part 3.
Code improvement: Coverity static analysis Valgrind dynamic analysis GABRIELE COSMO CERN, EP/SFT.
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
Object Lifetime and Pointers
The Second C++ Program Variables, Types, I/O Animation!
Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Winter 2009 Tutorial #6 Arrays Part 2, Structures, Debugger
Flow of Control An Overview
Testing and Debugging PPT By :Dr. R. Mall.
Chapter 2 :: Programming Language Syntax
More important details More fun Part 3
Floating-Point and High-Level Languages
Storage Management.
Loop Structures.
Data types and variables
Chapter 5 Conclusion CIS 61.
History of compiler development
An Automated Testing Framework
Parser and Scanner Generation: An Introduction
Lecture 07 More Repetition Richard Gesick.
Secure Coding Rules for C++ Copyright © Curt Hill
Type Systems Terms to learn about types: Related concepts: Type
CSS 161: Fundamentals of Computing
Lecture 4B More Repetition Richard Gesick
Dynamic Memory Allocation
Compiler Construction
A brief look at some of the new features introduced to the language
Object Oriented Programming COP3330 / CGS5409
Unit 2 Programming.
7 Arrays.
Concepts From Alice Switching to Java Copyright © Curt Hill.
Arrays in Java What, why and how Copyright Curt Hill.
CMSC 202 Exceptions 2nd Lecture.
Compound Statements A Quick Overview
Copyright © by Curt Hill
Accomplishing Executables
Control Structure Testing
Chapter 2 :: Programming Language Syntax
Examining Variables on Flow Paths
The Java switch Statement
PowerShell Flow of Control Copyright © 2016 – Curt Hill.
Type Systems Terms to learn: Type Type system
C. M. Overstreet Old Dominion University Spring 2006
CMSC 202 Exceptions 2nd Lecture.
Chapter 2 :: Programming Language Syntax
Classes, Objects and Methods
Type Systems Terms to learn about types: Related concepts: Type
The IF Revisited A few more things Copyright © Curt Hill.
1.3.7 High- and low-level languages and their translators
Chapter 15 Debugging.
C. M. Overstreet Old Dominion University Fall 2005
CMSC 202 Exceptions 2nd Lecture.
C. M. Overstreet Old Dominion University Fall 2007
The IF Revisited A few more things Copyright © Curt Hill.
Methods Scope How are names handled?
SPL – PS2 C++ Memory Handling.
Chapter 15 Debugging.
Presentation transcript:

Copyright © 2016 Curt Hill Static Code Analysis What it is and does.

Introduction A static code analyzer is a program that examines source code or resulting binary It attempts to find potential problems in the code Static means they do their checking without running the program –The alternative is a dynamic analyzer This is a handy tool because of two factors: –Programmers –Compilers Copyright © 2016 Curt Hill

Programmers Writing a bug free program is nearly impossible The complexity is too great Thus programmers introduce errors: –In the initial writing –In a subsequent revision Copyright © 2016 Curt Hill

Compilers Compilers are mostly interested in parsing and generating code –They do extensive static analysis –The errors they detect are mostly because of their goal to generate object code There are types of errors that they cannot detect or cannot detect perfectly –Generally, if they cannot always find it they never look Copyright © 2016 Curt Hill

Compiler Static Analysis One common form of static analysis that a compiler does is identifying blocks to optimize register usage Suppose that this code is seen: x = x * 2; On a general register machine the following actions need to be generated: –x is loaded into a register –Multiplied by 2 –Result saved in x –3 machine language operations Copyright © 2016 Curt Hill

Example Consider the following code: if(a>3) a = a*2+b; b = b / 5; Since a is likely loaded into the register by the if, there is no need for a further load for the assignment –There is no code that could reload the register However, there can be no assumptions about b being in a register Copyright © 2016 Curt Hill

In Contrast Suppose this code: int a, b, c; cin >> b; c = 2*a / b; Generally the compiler has no reason to complain that a is used before it is initialized –This does not help with its code generation process –Some actually do complain, but not all Copyright © 2016 Curt Hill

Language Levels FORTRAN did not require variable declarations When a variable name is used the compiler determined the type from the first letter of the name The problem is that if a variable name was mis-spelled this was not detected at compile time but run- time C requires declaration –This reduces a run-time error into a compile time error Copyright © 2016 Curt Hill

Other Errors Unfortunately, all run-time errors cannot be eliminated by good language design –The unitialized variable is an example Some errors cannot be found until run-time –There is no way static analysis can find it The most famous is the halting problem –It has an elegant proof as well Copyright © 2016 Curt Hill

The Halting Problem All we want to know is if a particular program will terminate –We do not care if it does what it should –We only want to know if there is an infinite loop Moreover, we want to know this without actually running the program –Which may be extremely long even if it does halt Copyright © 2016 Curt Hill

Setup What we would like is a function –Boolean Halt(program) The function takes as a parameter a program –Usually source code, but could be object It produces a Boolean as a result –True means it stops –False means it is an infinite loop Alas, it is provably impossible to write such a function Copyright © 2016 Curt Hill

Proof By Contradiction Assume that Halt does exist I know write a program that looks like this: if(Halt(x)) while (true) cout << “You lose”; else cout << “You still lose”; Finally I feed this program into Halt as x –If Halt says it will stop it does an infinite loop Copyright © 2016 Curt Hill

Results The generalization of this result is that there are lots of things that you cannot tell about a program without running it –If we could tell what a program did without running it, why would we run it? Most often if a compiler cannot detect an error all the time, it does not detect the error at all Copyright © 2016 Curt Hill

In Contrast So is it hopeless? –Not at all Clearly if we see code like: for (int j=0; j>0;j--)… we could complain that it will not run Similarly: while(k<m) m *= 2; is also a problem Copyright © 2016 Curt Hill

Static Code Analyzer Static code analyzers are not perfect –Nobody said they were They may miss certain errors –Even of a type they are looking for They may flag things as an error that are not –The false positives Still if they find any bugs then they are worth using –What they find, we do not have to find Copyright © 2016 Curt Hill

PFORT The earliest of these seems to be Portable FORTRAN verifier Published in 1974 It checked parameter passage and Common block usage in FORTRAN programs It was intended to make it easier to convert a FORTRAN program from one machine to another Copyright © 2016 Curt Hill

LINT Another early one is LINT –Not an acronym It finds problems in C files: –Uninitialized variables –Division by zero –Constant conditions –Calculations whose result is likely to be outside the range of values representable in the type used First released outside of Bell Labs in about 1979 Copyright © 2016 Curt Hill

Languages Static analyzers must be specific the language inside –They are typically built around a parser for that language Most production languages have these Consider some examples Copyright © 2016 Curt Hill

Examples Copyright © 2016 Curt Hill LanguageApplications AdaCodePeer, Fluctuat C, C++BLAST, CPPCheck, CPPLInt, CLang JavaCheckStyle,Jarchictect, SourceMeter JavaScriptESLint, JSCS PhPRIPS PowerBuilderPB Code Analyzer PythonPylint

Overlap A static analyzer is not a glorified compiler –They do have much in common Both must have a scanner and parser They are typically looking for different things A static analyzer may indicate a reference parameter is never changed –A compiler does not care Copyright © 2016 Curt Hill

What can be checked? Memory leaks –new without delete –Other resource leaks as well such as Windows handles Uninitialized variables or memory –Using something before initialization Using NULL pointers –Similar to an uninitialized variable Range issues –Assigning an int constant to a short Copyright © 2016 Curt Hill

Finally We typically use these to do checking of projects They can help us to find problems that could also be found by debugging –Debugging is much more expensive Neither technique will find all the bugs Copyright © 2016 Curt Hill