Static Code Analysis What it is and does. Copyright © 2016 Curt Hill
Debugging Something you already know something about The tools in our arsenal include: Compilers Run-time debuggers Insertion of monitoring code Static code analyzers Profilers Among others The good developer uses all of them Copyright © 2016 Curt Hill
Static Code Analyzers A static code analyzer is a program that examines source code or resulting binary It attempts to find potential problems in the code Static means they do their checking without running the program The alternative is a dynamic analyzer This is a handy tool because of two factors: Programmers Compilers Copyright © 2016 Curt Hill
Programmers Writing a bug free program is nearly impossible The complexity is too great Thus programmers introduce errors: In the initial writing In a subsequent revision Copyright © 2016 Curt Hill
Compilers Compilers are mostly interested in parsing and generating code They do extensive static analysis The errors they detect are mostly because of their goal to generate object code There are types of errors that they cannot detect or cannot detect perfectly Generally, if they cannot always find it they never look Copyright © 2016 Curt Hill
Compiler Static Analysis One common form of static analysis that a compiler does is identifying blocks to optimize register usage Suppose that this code is seen: x = x * 2; On a general register machine the following actions need to be generated: x is loaded into a register Multiplied by 2 Result saved in x 3 machine language operations Copyright © 2016 Curt Hill
Example Consider the following code: if(a>3) a = a*2+b; b = b / 5; Since a is likely loaded into the register by the if, there is no need for a further load for the assignment There is no code that could reload the register However, there can be no assumptions about b being in a register Copyright © 2016 Curt Hill
In Contrast Suppose this code: int a, b, c; cin >> b; c = 2*a / b; Generally the compiler has no reason to complain that a is used before it is initialized This does not help with its code generation process Some actually do complain, but not all Copyright © 2016 Curt Hill
Language Levels FORTRAN did not require variable declarations When a variable name is used the compiler determined the type from the first letter of the name The problem is that if a variable name was mis-spelled this was not detected at compile time but run-time C requires declaration This reduces a run-time error into a compile time error Copyright © 2016 Curt Hill
Other Errors Unfortunately, all run-time errors cannot be eliminated by good language design The unitialized variable is an example Some errors cannot be found until run-time There is no way static analysis can find it The most famous is the halting problem It has an elegant proof as well Copyright © 2016 Curt Hill
The Halting Problem All we want to know is if a particular program will terminate We do not care if it does what it should We only want to know if there is an infinite loop Moreover, we want to know this without actually running the program Which may be extremely long even if it does halt Copyright © 2016 Curt Hill
Setup What we would like is a function Boolean Halt(program) The function takes as a parameter a program Usually source code, but could be object It produces a Boolean as a result True means it stops False means it is an infinite loop Alas, it is provably impossible to write such a function Copyright © 2016 Curt Hill
Proof By Contradiction Assume that Halt does exist I know write a program that looks like this: if(Halt(x)) while (true) cout << “You lose”; else cout << “You still lose”; Finally I feed this program into Halt as x If Halt says it will stop it does an infinite loop Copyright © 2016 Curt Hill
Results The generalization of this result is that there are lots of things that you cannot tell about a program without running it If we could tell what a program did without running it, why would we run it? Most often if a compiler cannot detect an error all the time, it does not detect the error at all Copyright © 2016 Curt Hill
In Contrast So is it hopeless? Not at all Clearly if we see code like: for (int j=0; j>0;j--)… we could complain that it will not run Similarly: while(k<m) m *= 2; is also a problem Copyright © 2016 Curt Hill
Static Code Analyzer Static code analyzers are not perfect Nobody said they were They may miss certain errors Even of a type they are looking for They may flag things as an error that are not The false positives Still if they find any bugs then they are worth using What they find, we do not have to find Copyright © 2016 Curt Hill
PFORT The earliest of these seems to be Portable FORTRAN verifier Published in 1974 It checked parameter passage and Common block usage in FORTRAN programs It was intended to make it easier to convert a FORTRAN program from one machine to another Copyright © 2016 Curt Hill
LINT Another early one is LINT It finds problems in C files: Not an acronym It finds problems in C files: Uninitialized variables Division by zero Constant conditions Calculations whose result is likely to be outside the range of values representable in the type used First released outside of Bell Labs in about 1979 Copyright © 2016 Curt Hill
Languages Static analyzers must be specific the language inside They are typically built around a parser for that language Most production languages have these Consider some examples Copyright © 2016 Curt Hill
Examples Language Applications Ada CodePeer, Fluctuat C, C++ BLAST, CPPCheck, CPPLInt, CLang Java CheckStyle,Jarchictect, SourceMeter JavaScript ESLint, JSCS PhP RIPS PowerBuilder PB Code Analyzer Python Pylint Copyright © 2016 Curt Hill
Overlap A static analyzer is not a glorified compiler They do have much in common Both must have a scanner and parser They are typically looking for different things A static analyzer may indicate a reference parameter is never changed A compiler does not care Copyright © 2016 Curt Hill
What can be checked? Memory leaks Uninitialized variables or memory new without delete Other resource leaks as well such as Windows handles Uninitialized variables or memory Using something before initialization Using NULL pointers Similar to an uninitialized variable Range issues Assigning an int constant to a short Copyright © 2016 Curt Hill
Practical Experience I have used cppcheck on numerous projects with mixed results All my projects are very good so I expected few problems In my case I got as many false positives as real problems It takes time to sort through whether this is a problem or not Any real problem found is an advantage Copyright © 2016 Curt Hill
False Positives There was a header: void fun(char str[40], … followed by: str[40]=‘\0’; The 40 was not needed and all calls used a string of length 45 Copyright © 2016 Curt Hill
Real errors Not all were false positives Memory Leak char * ln = new char [MAX]; if(!SaveDialog1->Execute()){ Application->MessageBox(“… return; } The end of the routine did delete ln but this return had a real memory leak Mismatched allocation Vector form of new but scalar form of delete Copyright © 2016 Curt Hill
Finally We typically use these to do checking of projects They can help us to find problems that could also be found by debugging Debugging is much more expensive Neither technique will find all the bugs It only needs to find one to make it worth the effort Copyright © 2016 Curt Hill