Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.

Slides:



Advertisements
Similar presentations
CS 11 C track: lecture 7 Last week: structs, typedef, linked lists This week: hash tables more on the C preprocessor extern const.
Advertisements

Copyright © by Curt Hill Expressions and Assignment Statements Part 2.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
CIS 101: Computer Programming and Problem Solving Lecture 8 Usman Roshan Department of Computer Science NJIT.
Pointer. Warning! Dangerous Curves C (and C++) have just about the most powerful, flexible and dangerous pointers in the world. –Most other languages.
Basics of Java IMPORTANT: Read Chap 1-6 of How to think like a… Lecture 3.
CS Midterm Study Guide Fall General topics Definitions and rules Technical names of things Syntax of C++ constructs Meaning of C++ constructs.
C Functions Three major differences between C and Java functions: –Functions are stand-alone entities, not part of objects they can be defined in a file.
Copyright © Curt Hill The Compound Statement C-Family Languages and Scope.
Copyright © Curt Hill Simple I/O Input and Output using the System and Scanner Objects.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Copyright © Curt Hill The C++ IF Statement More important details More fun Part 3.
The for Statement A most versatile loop
Copyright © 2016 Curt Hill Static Code Analysis What it is and does.
Object Lifetime and Pointers
Basic concepts of C++ Presented by Prof. Satyajit De
The Second C++ Program Variables, Types, I/O Animation!
Data Types In Text: Chapter 6.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Winter 2009 Tutorial #6 Arrays Part 2, Structures, Debugger
Testing and Debugging PPT By :Dr. R. Mall.
Distinguishing logic from data type
A bit of C programming Lecture 3 Uli Raich.
More important details More fun Part 3
Floating-Point and High-Level Languages
Debugging and Random Numbers
Loop Structures.
Data types and variables
Run-time organization
C Basics.
An Automated Testing Framework
Secure Coding Rules for C++ Copyright © Curt Hill
Type Systems Terms to learn about types: Related concepts: Type
CSS 161: Fundamentals of Computing
Dynamic Memory Allocation
Object Oriented Programming COP3330 / CGS5409
Topics Introduction to File Input and Output
7 Arrays.
Pointers, Dynamic Data, and Reference Types
Arrays in Java What, why and how Copyright Curt Hill.
CMSC 202 Exceptions 2nd Lecture.
CMSC 202 Exceptions 2nd Lecture.
Variables Title slide variables.
Compound Statements A Quick Overview
Copyright © by Curt Hill
Accomplishing Executables
Control Structure Testing
Examining Variables on Flow Paths
The Java switch Statement
7 Arrays.
Type Systems Terms to learn: Type Type system
COP 3330 Object-oriented Programming in C++
C. M. Overstreet Old Dominion University Spring 2006
CMSC 202 Exceptions 2nd Lecture.
Impossible problems.
Classes, Objects and Methods
Type Systems Terms to learn about types: Related concepts: Type
Chapter 3 Debugging Section 3.4
Topics Introduction to File Input and Output
The IF Revisited A few more things Copyright © Curt Hill.
1.3.7 High- and low-level languages and their translators
Chapter 15 Debugging.
Expressions and Assignment Statements
C. M. Overstreet Old Dominion University Fall 2005
CMSC 202 Exceptions 2nd Lecture.
C. M. Overstreet Old Dominion University Fall 2007
Methods Scope How are names handled?
SPL – PS2 C++ Memory Handling.
Chapter 15 Debugging.
Presentation transcript:

Static Code Analysis What it is and does. Copyright © 2016 Curt Hill

Debugging Something you already know something about The tools in our arsenal include: Compilers Run-time debuggers Insertion of monitoring code Static code analyzers Profilers Among others The good developer uses all of them Copyright © 2016 Curt Hill

Static Code Analyzers A static code analyzer is a program that examines source code or resulting binary It attempts to find potential problems in the code Static means they do their checking without running the program The alternative is a dynamic analyzer This is a handy tool because of two factors: Programmers Compilers Copyright © 2016 Curt Hill

Programmers Writing a bug free program is nearly impossible The complexity is too great Thus programmers introduce errors: In the initial writing In a subsequent revision Copyright © 2016 Curt Hill

Compilers Compilers are mostly interested in parsing and generating code They do extensive static analysis The errors they detect are mostly because of their goal to generate object code There are types of errors that they cannot detect or cannot detect perfectly Generally, if they cannot always find it they never look Copyright © 2016 Curt Hill

Compiler Static Analysis One common form of static analysis that a compiler does is identifying blocks to optimize register usage Suppose that this code is seen: x = x * 2; On a general register machine the following actions need to be generated: x is loaded into a register Multiplied by 2 Result saved in x 3 machine language operations Copyright © 2016 Curt Hill

Example Consider the following code: if(a>3) a = a*2+b; b = b / 5; Since a is likely loaded into the register by the if, there is no need for a further load for the assignment There is no code that could reload the register However, there can be no assumptions about b being in a register Copyright © 2016 Curt Hill

In Contrast Suppose this code: int a, b, c; cin >> b; c = 2*a / b; Generally the compiler has no reason to complain that a is used before it is initialized This does not help with its code generation process Some actually do complain, but not all Copyright © 2016 Curt Hill

Language Levels FORTRAN did not require variable declarations When a variable name is used the compiler determined the type from the first letter of the name The problem is that if a variable name was mis-spelled this was not detected at compile time but run-time C requires declaration This reduces a run-time error into a compile time error Copyright © 2016 Curt Hill

Other Errors Unfortunately, all run-time errors cannot be eliminated by good language design The unitialized variable is an example Some errors cannot be found until run-time There is no way static analysis can find it The most famous is the halting problem It has an elegant proof as well Copyright © 2016 Curt Hill

The Halting Problem All we want to know is if a particular program will terminate We do not care if it does what it should We only want to know if there is an infinite loop Moreover, we want to know this without actually running the program Which may be extremely long even if it does halt Copyright © 2016 Curt Hill

Setup What we would like is a function Boolean Halt(program) The function takes as a parameter a program Usually source code, but could be object It produces a Boolean as a result True means it stops False means it is an infinite loop Alas, it is provably impossible to write such a function Copyright © 2016 Curt Hill

Proof By Contradiction Assume that Halt does exist I know write a program that looks like this: if(Halt(x)) while (true) cout << “You lose”; else cout << “You still lose”; Finally I feed this program into Halt as x If Halt says it will stop it does an infinite loop Copyright © 2016 Curt Hill

Results The generalization of this result is that there are lots of things that you cannot tell about a program without running it If we could tell what a program did without running it, why would we run it? Most often if a compiler cannot detect an error all the time, it does not detect the error at all Copyright © 2016 Curt Hill

In Contrast So is it hopeless? Not at all Clearly if we see code like: for (int j=0; j>0;j--)… we could complain that it will not run Similarly: while(k<m) m *= 2; is also a problem Copyright © 2016 Curt Hill

Static Code Analyzer Static code analyzers are not perfect Nobody said they were They may miss certain errors Even of a type they are looking for They may flag things as an error that are not The false positives Still if they find any bugs then they are worth using What they find, we do not have to find Copyright © 2016 Curt Hill

PFORT The earliest of these seems to be Portable FORTRAN verifier Published in 1974 It checked parameter passage and Common block usage in FORTRAN programs It was intended to make it easier to convert a FORTRAN program from one machine to another Copyright © 2016 Curt Hill

LINT Another early one is LINT It finds problems in C files: Not an acronym It finds problems in C files: Uninitialized variables Division by zero Constant conditions Calculations whose result is likely to be outside the range of values representable in the type used First released outside of Bell Labs in about 1979 Copyright © 2016 Curt Hill

Languages Static analyzers must be specific the language inside They are typically built around a parser for that language Most production languages have these Consider some examples Copyright © 2016 Curt Hill

Examples Language Applications Ada CodePeer, Fluctuat C, C++ BLAST, CPPCheck, CPPLInt, CLang Java CheckStyle,Jarchictect, SourceMeter JavaScript ESLint, JSCS PhP RIPS PowerBuilder PB Code Analyzer Python Pylint Copyright © 2016 Curt Hill

Overlap A static analyzer is not a glorified compiler They do have much in common Both must have a scanner and parser They are typically looking for different things A static analyzer may indicate a reference parameter is never changed A compiler does not care Copyright © 2016 Curt Hill

What can be checked? Memory leaks Uninitialized variables or memory new without delete Other resource leaks as well such as Windows handles Uninitialized variables or memory Using something before initialization Using NULL pointers Similar to an uninitialized variable Range issues Assigning an int constant to a short Copyright © 2016 Curt Hill

Practical Experience I have used cppcheck on numerous projects with mixed results All my projects are very good so I expected few problems In my case I got as many false positives as real problems It takes time to sort through whether this is a problem or not Any real problem found is an advantage Copyright © 2016 Curt Hill

False Positives There was a header: void fun(char str[40], … followed by: str[40]=‘\0’; The 40 was not needed and all calls used a string of length 45 Copyright © 2016 Curt Hill

Real errors Not all were false positives Memory Leak char * ln = new char [MAX]; if(!SaveDialog1->Execute()){ Application->MessageBox(“… return; } The end of the routine did delete ln but this return had a real memory leak Mismatched allocation Vector form of new but scalar form of delete Copyright © 2016 Curt Hill

Finally We typically use these to do checking of projects They can help us to find problems that could also be found by debugging Debugging is much more expensive Neither technique will find all the bugs It only needs to find one to make it worth the effort Copyright © 2016 Curt Hill