Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.

Debugging Something you already know something about
The tools in our arsenal include: Compilers Run-time debuggers Insertion of monitoring code Static code analyzers Profilers Among others The good developer uses all of them Copyright © 2016 Curt Hill

Static Code Analyzers A static code analyzer is a program that examines source code or resulting binary It attempts to find potential problems in the code Static means they do their checking without running the program The alternative is a dynamic analyzer This is a handy tool because of two factors: Programmers Compilers Copyright © 2016 Curt Hill

Programmers Writing a bug free program is nearly impossible
The complexity is too great Thus programmers introduce errors: In the initial writing In a subsequent revision Copyright © 2016 Curt Hill

Compilers Compilers are mostly interested in parsing and generating code They do extensive static analysis The errors they detect are mostly because of their goal to generate object code There are types of errors that they cannot detect or cannot detect perfectly Generally, if they cannot always find it they never look Copyright © 2016 Curt Hill

Compiler Static Analysis
One common form of static analysis that a compiler does is identifying blocks to optimize register usage Suppose that this code is seen: x = x * 2; On a general register machine the following actions need to be generated: x is loaded into a register Multiplied by 2 Result saved in x 3 machine language operations Copyright © 2016 Curt Hill

Example Consider the following code: if(a>3) a = a*2+b; b = b / 5;
Since a is likely loaded into the register by the if, there is no need for a further load for the assignment There is no code that could reload the register However, there can be no assumptions about b being in a register Copyright © 2016 Curt Hill

In Contrast Suppose this code: int a, b, c; cin >> b; c = 2*a / b; Generally the compiler has no reason to complain that a is used before it is initialized This does not help with its code generation process Some actually do complain, but not all Copyright © 2016 Curt Hill

Language Levels FORTRAN did not require variable declarations
When a variable name is used the compiler determined the type from the first letter of the name The problem is that if a variable name was mis-spelled this was not detected at compile time but run-time C requires declaration This reduces a run-time error into a compile time error Copyright © 2016 Curt Hill

Other Errors Unfortunately, all run-time errors cannot be eliminated by good language design The unitialized variable is an example Some errors cannot be found until run-time There is no way static analysis can find it The most famous is the halting problem It has an elegant proof as well Copyright © 2016 Curt Hill

The Halting Problem All we want to know is if a particular program will terminate We do not care if it does what it should We only want to know if there is an infinite loop Moreover, we want to know this without actually running the program Which may be extremely long even if it does halt Copyright © 2016 Curt Hill

Setup What we would like is a function
Boolean Halt(program) The function takes as a parameter a program Usually source code, but could be object It produces a Boolean as a result True means it stops False means it is an infinite loop Alas, it is provably impossible to write such a function Copyright © 2016 Curt Hill

Proof By Contradiction
Assume that Halt does exist I know write a program that looks like this: if(Halt(x)) while (true) cout << “You lose”; else cout << “You still lose”; Finally I feed this program into Halt as x If Halt says it will stop it does an infinite loop Copyright © 2016 Curt Hill

Results The generalization of this result is that there are lots of things that you cannot tell about a program without running it If we could tell what a program did without running it, why would we run it? Most often if a compiler cannot detect an error all the time, it does not detect the error at all Copyright © 2016 Curt Hill

In Contrast So is it hopeless?
Not at all Clearly if we see code like: for (int j=0; j>0;j--)… we could complain that it will not run Similarly: while(k<m) m *= 2; is also a problem Copyright © 2016 Curt Hill

Static Code Analyzer Static code analyzers are not perfect
Nobody said they were They may miss certain errors Even of a type they are looking for They may flag things as an error that are not The false positives Still if they find any bugs then they are worth using What they find, we do not have to find Copyright © 2016 Curt Hill

PFORT The earliest of these seems to be Portable FORTRAN verifier
Published in 1974 It checked parameter passage and Common block usage in FORTRAN programs It was intended to make it easier to convert a FORTRAN program from one machine to another Copyright © 2016 Curt Hill

LINT Another early one is LINT It finds problems in C files:
Not an acronym It finds problems in C files: Uninitialized variables Division by zero Constant conditions Calculations whose result is likely to be outside the range of values representable in the type used First released outside of Bell Labs in about 1979 Copyright © 2016 Curt Hill

Languages Static analyzers must be specific the language inside
They are typically built around a parser for that language Most production languages have these Consider some examples Copyright © 2016 Curt Hill

Examples Language Applications Ada CodePeer, Fluctuat C, C++
BLAST, CPPCheck, CPPLInt, CLang Java CheckStyle,Jarchictect, SourceMeter JavaScript ESLint, JSCS PhP RIPS PowerBuilder PB Code Analyzer Python Pylint Copyright © 2016 Curt Hill

Overlap A static analyzer is not a glorified compiler
They do have much in common Both must have a scanner and parser They are typically looking for different things A static analyzer may indicate a reference parameter is never changed A compiler does not care Copyright © 2016 Curt Hill

What can be checked? Memory leaks Uninitialized variables or memory
new without delete Other resource leaks as well such as Windows handles Uninitialized variables or memory Using something before initialization Using NULL pointers Similar to an uninitialized variable Range issues Assigning an int constant to a short Copyright © 2016 Curt Hill

Practical Experience I have used cppcheck on numerous projects with mixed results All my projects are very good so I expected few problems In my case I got as many false positives as real problems It takes time to sort through whether this is a problem or not Any real problem found is an advantage Copyright © 2016 Curt Hill

Real errors Not all were false positives
Memory Leak char * ln = new char [MAX]; if(!SaveDialog1->Execute()){ Application->MessageBox(“… return; } The end of the routine did delete ln but this return had a real memory leak Mismatched allocation Vector form of new but scalar form of delete Copyright © 2016 Curt Hill

Finally We typically use these to do checking of projects
They can help us to find problems that could also be found by debugging Debugging is much more expensive Neither technique will find all the bugs It only needs to find one to make it worth the effort Copyright © 2016 Curt Hill

Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.

Similar presentations

Presentation on theme: "Static Code Analysis What it is and does. Copyright © 2016 Curt Hill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.

Similar presentations

Presentation on theme: "Static Code Analysis What it is and does. Copyright © 2016 Curt Hill."— Presentation transcript:

Similar presentations

About project

Feedback