Improving Network Applications Security: a New Heuristic to Generate Stress Testing Data Presented by Conrad Pack Del Grosso et al
Overview Buffer Overflow problem –Network security –Critical systems Testing to identify/remove vulnerabilities –Combined static and dynamic approach –Static slicing –Genetic algorithms (GAs) in dynamic search New heuristic
Buffer Overflow Incorrect handling of input Data overwritten
Impact of Buffer Overflow Scope –Language variations (C++ vs. Java) –Prevalence of unaudited code Over 50% of vulnerabilities (CERT) Potential harm –Unauthorized access in network/security applications –Serious accidents in critical embedded systems
Overview of Approach
Static Analysis Tools –RatScan (front end to RATS) –Splint Extracted Information –Potentially vulnerable source statements –Call to potentially unsafe functions/libraries –Estimated buffer sizes
Static Slicing Software maintenance technique –“all program code that can in anyway affect the value of a given variable” Inputs and source code relationship –Data dependency –Some inputs not tied to vulnerable statements Tool: CodeSurfer (GrammaTech) Purpose: Search space reduction
Test Case Generation Using GA GA aspects –Chromosome (2 dimensional array) –Crossover/mutation operators (whole/creep) –Fitness function (to follow) –Parameters Number of generations (500) Population size (70) Propagation rules (2 best) Probabilities (p cross = 0.7, p mut = 0.01)
Fitness GA is an optimization problem Three Approaches –Vulnerable coverage fitness –Nesting fitness –Buffer boundary fitness Correlation to crashes alone not enough –Flat landscape –Random search
Vulnerable Coverage Fitness Statement coverage Vulnerable statement coverage Number of vulnerable statement executions Function F(g) = w 1 scov + w 2 log(k) vcov + w 3 crash
Nesting Fitness Unconstrained nodes (graph theory) –Control flow graphs –Do not dominate any node –Do not postdominate any node Often correspond with maximum nesting Function F(g) = w 1 scov + w 2 log(k) vcov + w 3 nesting
Buffer Boundary Fitness Buffer boundaries in fitness calculation –Often difficult to precisely determine –Intended for future implementation Distance from boundary by size estimate –Compile time (can’t always be determined) Function F(g) = w 1 scov + w 2 log(k) vcov + w 3 nesting + w 4 max i {min j (L i,j – SB i )}
Empirical Results Two test programs –White noise generator (scientific application) –FTP client (network application) Random search as a control –Pure random search –GA search with no fitness White noise: fixed initial population FTP: random initial populations
White Noise Generator Results
FTP Client Results
Personal Conclusions Use of Genetic Algorithms in testing is compelling Fitness Heuristic using source code is a valuable concept Useful in large projects Buffer overflow will likely have less importance over time GA assumptions