Improving Static Analysis Results Accuracy Chris Wysopal CTO & Co-founder, Veracode SATE Summit October 1, 2010.

Slides:

Advertisements

Similar presentations

Model-Based Testing with Smartesting Jean-Pierre Schoch Sogetis Second Testing Academy 29 April 2009.

Advertisements

What is Test Director? Test Director is a test management tool

Juliet Test Suite Overview

Test process essentials Riitta Viitamäki,

Testing and Quality Assurance

Automated Software Testing: Test Execution and Review Amritha Muralidharan (axm16u)

SATE 2010 Background Vadim Okun, NIST October 1, 2010 The SAMATE Project

© Coverity 2010 Coverity Analysis: Improving Quality in the Software Supply Chain Peter Henriksen, Development Manager for Analysis, Coverity October 1,

Choosing SATE Test Cases Based on CVEs Sue Wang October 1, 2010 The SAMATE Project 1SATE 2010 Workshop.

Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.

Static code check – Klocwork

Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.

Mercury Quality Center Formerly Test Director. Topics Covered Testdirector Introduction Understanding the Testdirector Interface. Understanding Requirement.

Copyright © 2004, GemStone Systems Inc. All Rights Reserved. A Capstone with GemStone David Whitlock October 25, 2004.

SE 450 Software Processes & Product Metrics 1 Defect Removal.

1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.

(c) 2007 Mauro Pezzè & Michal Young Ch 1, slide 1 Software Test and Analysis in a Nutshell.

Russell Taylor Lecturer in Computing & Business Studies.

Static Analysis for Dynamic Assessments Greg Patton | September 2014.

Static Code Analysis and Governance Effectively Using Source Code Scanners.

Evaluating Static Analysis Tools Dr. Paul E. Black

I-Web Documentation Part VI – Work Items FACILITY CONDITION SURVEY TEAM Steve Oravetz PE, R1 (Team Leader) Bruce Crockett, R1 Randy Warbington PE, R8 Jim.

Test coverage Tor Stålhane. What is test coverage Let c denote the unit type that is considered – e.g. requirements or statements. We then have C c =

Evaluating Detection & Treatment Effectiveness of Commercial Anti-Malware Programs Jose Andre Morales, Ravi Sandhu, Shouhuai Xu Institute for Cyber Security.

Chapter 36 Quality Engineering Part 2 (Review) EIN 3390 Manufacturing Processes Summer A, 2012.

CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.

CPIS 357 Software Quality & Testing

CSCE 548 Code Review. CSCE Farkas2 Reading This lecture: – McGraw: Chapter 4 – Recommended: Best Practices for Peer Code Review,

Security of Open Source Web Applications Maureen Doyle, James Walden Northern Kentucky University Students: Grant Welch, Michael Whelan Acknowledgements:

Software Quality See accompanying Word file “Software quality 1”

Testing Workflow In the Unified Process and Agile/Scrum processes.

Software Requirements Engineering: What, Why, Who, When, and How

© 2001 by Carnegie Mellon University SS5 -1 OCTAVE SM Process 5 Background on Vulnerability Evaluations Software Engineering Institute Carnegie Mellon.

Software Security Weakness Scoring Chris Wysopal Metricon August 2007.

Determina DARPA PI meeting Page 2Confidential © Determina, Inc. Agenda LiveShield –Product and Technology –Current Status Applications to Application.

SATE 2010 Analysis Aurélien Delaitre, NIST October 1, 2010 The SAMATE Project

Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach J. Ruthruff et al., University of Nebraska-Lincoln, NE U.S.A, Google.

Copyright © 1994 Carnegie Mellon University Disciplined Software Engineering - Lecture 3 1 Software Size Estimation I Material adapted from: Disciplined.

This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.

This is a work of the U.S. Government and is not subject to copyright protection in the United States. The OWASP Foundation OWASP AppSec DC October 2005.

Affected Products –Java SE JDK and JRE 6 update 23 and earlier JDK 5.0 Update 27 and earlier SDK 1.4.2_29 and earlier –JRockit R and earlier (JDK/JRE.

Context Process0. student Data Flow Diagram Progression.

Automating Government With Quickbase Chris Willey Interim CTO District of Columbia Government.

Software Security Common Vulnerabilities Encoded During Development Chris Wysopal, CTO & Co-Founder, Veracode. ISACA Luncheon, 11:30am Tuesday, February.

SwA Co-Chair and Task Lead Strategy Session Agenda Technology, Tools and Product Evaluation Working Group Status Briefing Co-Chair(s) Michael Kass (NIST),

Markland J. Benson, Computer Systems Manager, White Sands Complex, (575) , Technology Infusion of CodeSonar into the Space.

Parasoft : Improving Productivity in IT Organizations David McCaw.

Week # 4 Quality Assurance Software Quality Engineering 1.

Tool Support for Testing Classify different types of test tools according to their purpose Explain the benefits of using test tools.

Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA

Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.

Smashing WebGoat for Fun and Research: Static Code Scanner Evaluation Josh Windsor & Dr. Josh Pauli.

Software Development Life Cycle Waterfall Model

Business process management (BPM)

OWASP Static Analysis (SA) Track Goals, Objectives, and Track Roadmap

Topics Introduction to Repetition Structures

Business process management (BPM)

Chapter 8 – Software Testing

Software engineering – 1

Design and Programming

What-If Testing Framework

CS240: Advanced Programming Concepts

Test coverage Tor Stålhane.

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

Software metrics.

HP ALM Introduction.

Exploring Complexity Metrics as Indicators of Software Vulnerability

White Box testing & Inspections

Human vs. Automated Coding Style Grading in Computing Education

Presentation transcript:

Improving Static Analysis Results Accuracy Chris Wysopal CTO & Co-founder, Veracode SATE Summit October 1, 2010

© 2010 Veracode, Inc. 2 Abstract  A major challenge for static analysis is finding all weaknesses of a specific weakness category for a wide range of input software. It is trivial to build a static analyzer with a high false positive rate and a low true positive rate and very difficult to build one with very low false positive rate and very high true positive rate. To build an accurate analyzer you need a broad real world data set to test with. I will discuss how we use the SATE test data in addition to other test data to improve the accuracy of our analyzer.

© 2010 Veracode, Inc. 3 Veracode History of Participation 3 organizations participated all 3 years: Checkmarx, Grammatech, and Veracode Perfect Veracode Attendance! Tota l C Java C C lighttpdnagiosnaimdspace mvnfor um openn msirssipvmdmdircrollerdovecotchrome wiresha rkpebbletomcat Armorize xx xx4 Aspect xxx 3 Checkmarx xxx xxxxx 8 Coverity xx xxx 5 cppcheck xxx 3 Devinspect (HP) x 1 Findbugs xxx 3 Flawfinderxxx 3 Fortifyxxxxxx 6 Grammatechxxx xx x 6 Klocwork xx 2 LDRA xx x 3 marfcat xxxxx5 Redlizards xxx 3 SofCheck xxx xx x 6 Sparrow xx 2 Veracodexxxxxxxxxxxxxxx15 Total

© 2010 Veracode, Inc. 4 CVE Matching Exercise  Only done for Chrome, Wireshark, and Tomcat  sate2010_cve/README.txt, “We did not find any matching tool warnings for CVEs in Wireshark and Chrome.”  Five CVE matched warnings were found for Tomcat. Veracode found one. Armorize’s Code Secure found the other four.  We submitted 2 CVE’s for Tomcat ( and in and again in ). We should have been credited for 3.  We also reported in Chrome This was not credited either.

© 2010 Veracode, Inc. 5 What level of noise is acceptable? CVEs foundTotal Flaws Useful / Total * 100 Veracode -Tomcat Veracode -Tomcat Armorize - Tomcat Armorize - Tomcat Veracode Tomcat 1 / Armorize Tomcat Veracode Tomcat 2 / Armorize Tomcat 286.5

© 2010 Veracode, Inc. 6 Lab Measurement Capabilities  Synthetic test suites –Publicly available test suites such as NIST SAMATE. –Developed a complex synthetic test suite by taking the simple SAMATE cases and adding control flow and data flow complexity. –23,000 test programs for Java, Blackberry,.NET, and C/C++ for Windows, Windows Mobile, Solaris, and Linux. –Defects in these test applications are annotated in our test database. –Each defect is labeled with CWE ID and VCID, which is out own internal sub categorization of CWEs

© 2010 Veracode, Inc. 7 Lab Measurement Capabilities  Real world test suite –Baselined real world applications by performing a manual code review of the source code and annotating the security defects with CWE and VCID –Micro-baselined real world applications by using publicly disclosed vulnerability information and then identifying the location of the security defect in the code and annotating them with CWE and VCID –300 test programs for Java, Blackberry,.NET, and C/C++ for Windows, Windows Mobile, Solaris, and Linux. –Annotations entered into our test database. –135 flaws comprising 63 CVEs that were given for us to compare with what we found in our assessment. SATE provided us a file name and line number for each flaw. These are being added to the real world suite for future testing.

© 2010 Veracode, Inc. 8 Lab Measurement Capabilities  Test Execution –We run the full test suite over dozens of test machines. It takes 48 hours to complete over 20+ machines. The results of our automated analysis are stored in our test database.  Reporting –Reports are run to compare the annotated “known” defects to our analysis results. An undetected defect is counted as a false negative. A spurious detection is counted as a false positive. –We measure false positive and false negative rates for each CWE category and down to internal Veracode sub-categories. These results are trended over time. The results can be drilled down by a developer to see the details of the test case down to the source line. –Diffs from our last release can be shown to isolate FP and FN regressions.

© 2010 Veracode, Inc. 9 Production Measurement Capabilities  As a SaaS provider can measure how well we perform on every customer application for FALSE POSITIVES.  Quality analysts inspect the output for each job to make sure the application was modeled correctly and the result set is within normal bounds.  If necessary they will inspect down to the flaw level and mark some results false positives. This false positive data is tracked by CWE category.  In this way any deviations from Veracode’s false positive SLO of 15% can be eliminated immediately and test cases can be developed that engineering will use to create a permanent quality improvement.

© 2010 Veracode, Inc. 10 Continuous Improvement Process  Veracode’s analysis accuracy improves with every release of our analysis engine.  We release on a six week schedule.  Each release contains improvements gathered from lab and production test results.  Our policy is not to tradeoff false negatives to meet false positive accuracy targets and support new features. Our extensive test suite allows us to enforce a no false negative regression rule from release to release.

on Twitter