Improving Software Reliability via Static and Dynamic Analysis Tao Xie, Automated Software Engineering Group Department of Computer Science North Carolina.

Slides:

Advertisements

Similar presentations

Verification and Validation

Advertisements

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

SMU SRG reading by Tey Chee Meng: Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications by David Brumley, Pongsin Poosankam,

David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.

Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.

Department of Software Engineering Faculty of Mathematics and Physics CHARLES UNIVERSITY IN PRAGUE Czech Republic Extracting Zing Models from C Source.

SBSE Course 3. EA applications to SE Analysis Design Implementation Testing Reference: Evolutionary Computing in Search-Based Software Engineering Leo.

(Quickly) Testing the Tester via Path Coverage Alex Groce Oregon State University (formerly NASA/JPL Laboratory for Reliable Software)

SE 450 Software Processes & Product Metrics Reliability: An Introduction.

EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,

Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer

Department of Computer Science & Engineering College of Engineering Dr. Betty H.C. Cheng, Laura A. Campbell, Sascha Konrad The demand for distributed real-time.

Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.

High Level: Generic Test Process (from chapter 6 of your text and earlier lesson) Test Planning & Preparation Test Execution Goals met? Analysis & Follow-up.

Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.

1CMSC 345, Version 4/04 Verification and Validation Reference: Software Engineering, Ian Sommerville, 6th edition, Chapter 19.

LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.

This chapter is extracted from Sommerville’s slides. Text book chapter

OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.

CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.

©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Verification and Validation l Assuring that a software system meets a user's.

Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.

Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.

© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.

Software Quality Assurance Lecture #8 By: Faraz Ahmed.

Tao Xie North Carolina State University Supported by CACC/NSA Related projects supported in part by ARO, NSF, SOSI.

Software Testing. Definition To test a program is to try to make it fail.

University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.

An Introduction to MBT  what, why and when 张坚

Mining Windows Kernel API Rules Jinlin Yang 09/28/2005CS696.

Michael Ernst, page 1 Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer.

Tao Xie Automated Software Engineering Group Department of Computer Science North Carolina State University

 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.

1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.

Bug Localization with Machine Learning Techniques Wujie Zheng

Code Contracts Parameterized Unit Tests Tao Xie. Example Unit Test Case = ? Outputs Expected Outputs Program + Test inputs Test Oracles 2 void addTest()

Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.

Systems Analysis and Design in a Changing World, Fourth Edition

Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.

Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.

Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.

Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.

Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.

Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.

1 Exposing Behavioral Differences in Cross-Language API Mapping Relations Hao Zhong Suresh Thummalapenta Tao Xie Institute of Software, CAS, China IBM.

1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.

Chapter 1 Software Engineering Principles. Problem analysis Requirements elicitation Software specification High- and low-level design Implementation.

Software Quality Assurance and Testing Fazal Rehman Shamil.

HNDIT23082 Lecture 09:Software Testing. Validations and Verification Validation and verification ( V & V ) is the name given to the checking and analysis.

MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.

T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.

Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.

DevCOP: A Software Certificate Management System for Eclipse Mark Sherriff and Laurie Williams North Carolina State University ISSRE ’06 November 10, 2006.

Parasoft : Improving Productivity in IT Organizations David McCaw.

CAR-Miner: Mining Exception-Handling Rules as Sequence Association Rules Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.

Cs498dm Software Testing Darko Marinov January 24, 2012.

SOFTWARE TESTING TRAINING TOOLS SUPPORT FOR SOFTWARE TESTING Chapter 6 immaculateres 1.

Introduction to Machine Learning, its potential usage in network area,

Experience Report: System Log Analysis for Anomaly Detection

Chapter 8 – Software Testing

Verification and Testing

Verification and Validation

Verification and Validation

CS240: Advanced Programming Concepts

Presentation transcript:

Improving Software Reliability via Static and Dynamic Analysis Tao Xie, Automated Software Engineering Group Department of Computer Science North Carolina State University

Group Overview Inputs: Current funding support –NSF CyberTrust (3 yrs), NSF SoD (3 yrs), ARO (3 yrs), NIST supplement, IBM Faculty Award, Microsoft Research, ABB Research Collaboration with agencies and industry –NIST, NASA, DOE Lab, Army division, Microsoft Research, IBM Rational, ABB Research Current student team –6 Ph.D. students, 1 M.S. student, 5 probation-staged grad students

Group Overview cont. Outputs: Research around two major themes: –Automated Software Testing; Mining Software Engineering Data Industry impact –We found Parasoft Jtest 4.5 generated 90% redundant tests [ASE 04] –Agitar AgitarOne used a similar technique as our Jov [ASE 03] –MSR and NASA adopted Symstra technique [TACAS 05] –MSR Pex adopted our recent techniques Research publications –2008: TOSEM, ICSE, 3*ASE, SIGMETRIC, ISSRE, ICSM, SRDS, ACSAC, … –2007: ICSE, FSE, 4*ASE, WWW, ICSM, … –…

Major Research Collaboration Areas Mining textual SE data Mining program code data Automated testing

Mining Textual SE data Bug reports [ICSE 08] –Detecting duplicate bug reports –Classifying bug reports API documentation Project documentation

Two duplicate bug reports in Firefox - using only natural language information may fail Bug : After closing Firefox, the process is still running. Cannot reopen Firefox after that, unless the previous process is killed manually Bug : (Ghostproc) – [Meta] firefox.exe doesn't always exit after closing all windows; session-specific data retained

Two non-duplicate bug reports in Firefox - using only execution information may fail Bug : "Document contains no data" message on continuation page of NY Times article Bug : random "The Document contains no data." Alerts Proposed solution [ICSE 08]: mining both textual information of bug reports and execution information of their failing tests

Classification of Bug Reports Bugs related to security issues Bugs related to design problems Bugs related to insufficient unit testing … Manually label a subset of bug reports with their categories Apply classification algorithms on unlabeled bug reports to predict their categories Benefit: reduce manual labeling efforts

Example API Docs javax.resource.cci.Connection createInteraction(): “Creates an interaction associated with this connection”  action-resource pair: create-connection getMetaData(): “Gets the information on the underlying EIS instance represented through an active connection”  action-resource pair: get-connection close(): “Initiates close of the connection handle at the application level”  action-resource pair: close-connection

Mining Properties from API Docs

Potential Collaboration Ideas on Text Mining Documents submitted by device manufacturers are in NL and are too many or long for manual inspection Classification problem –Train learning tools with some labeled documents Clustering problem –Without labeling, group documents based on similarity Selection problem –Similar to duplicate bug report detection

Potential Collaboration Ideas on Text Mining – Possible Examples Extract safety-related requirements from documents  manually extract some and then tools recommend some more based on manually extracted ones Classify incident reports (e.g., with ontology)  manually classify some and then tools recommend categories for the rest Detect correlations among incident reports  similar to duplicate bug report detection Other pre-market textual documents Other post-market textual documents …

Major Research Collaboration Areas Mining textual SE data Mining program code data Automated testing

Motivation

Problem Software system verification: given properties, verification tools can be used to detect whether the system violates the properties –Example: malloc return check However, these properties often do not exist –Who write these property? –How often these property are written? –How often these property are known? Objective: Mine API properties for static verification from the API client code in existing system code bases

Artifacts in Code Mining Data: usage info from various code locations of using APIs such as malloc, seteuid, and execl Patterns: sequencing constraints among collected API invocation sequences and condition checks Anomalies: violations of these patterns as potential defects

Approach Overview System Code Bases 1 2 N … For each external API 2.Trace/ Search MOPS Trace/Search source files that use each external API from existing code … Usage Info Around APIs... … 3.Analyze Analyze collected traces/files to extract usage info around APIs Input System 1.Extract Internal APIs External APIs Extract external APIs from the input system Detected Violations as Bugs 5.Verify Verify the input system against these properties to detect bugs Frequent Patterns around APIs 4.Mine Mine frequent usage patterns around APIs as API properties

Example Target Defect Types Neglected-condition defects Error-handling defects Exception-handling defects These defect types can result in –Critical security, robustness, reliability issues –Performance degradation Example: Failure to release a resource may decrease the performance

Mined Neglected Condition From Grass open source GIS project Developer confirmed “I believe this issue has uncovered a bug: the pointer returned by the fopen () call isn't checked at all. The code responsible for this particular issue is surprisingly short, to make it a good example on how not to write the code” $ nl -ba main.c fp = fopen("dumpfile", "w"); 72 BM_file_write(fp, map); 73 fclose(fp);... $

Mined Patterns of Error Handling From Redhat 9.0 routed Error-check specifications Multiple-API specifications close() should be called after socket() If violated, defects are detected

Mined Patterns of Exception Handling If missing resource cleanup, defects are detected Resource creation Resource manipulation Resource cleanup

Potential Collaboration Ideas on Code Mining Address problems similar to ones targeted by FDA’s previous work on “Static Analysis of Medical Device Software using CodeSonar” by Jetley, Jones, and Anderson Benefits of our new techniques –Don’t require the code to be compilable (using partial program analysis) –Don’t require properties to be manually written down –Can accumulate knowledge (API usages) within or across devices or manufacturers (or even open source world) –May ask manufacturers to submit API usages (if not code itself?)

Potential Collaboration Ideas on Code Mining cont. Our tool development status Neglected condition bugs: tools for Java and C are ready; tool for C# is being developed Error-handling bugs: tool for C is ready Exception-handling bugs: tool for Java is ready and tool for C# is being developed Working on tools for framework reuse bugs

Major Research Collaboration Areas Mining textual SE data Mining program code data Automated testing

Dynamic Symbolic Execution Dynamic symbolic execution combines static and dynamic analysis: Execute program multiple times with different inputs –build abstract representation of execution path on the side –plug in concrete results of operations which cannot be reasoned about symbolically Use constraint solver to obtain new inputs –solve constraint system that represents an execution path not seen before

Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Whole-program, white-box code analysis Initially, choose Arbitrary

Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Whole-program, white-box code analysis Initially, choose Arbitrary a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; … a[0] = 0; a[1] = 0; a[2] = 0; a[3] = 0; …

Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Initially, choose Arbitrary Path Condition: … ⋀ magicNum != 0x Whole-program, white-box code analysis

Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Initially, choose Arbitrary … ⋀ magicNum != 0x … ⋀ magicNum == 0x Whole-program, white-box code analysis

Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve a[0] = 206; a[1] = 202; a[2] = 239; a[3] = 190; a[0] = 206; a[1] = 202; a[2] = 239; a[3] = 190; Initially, choose Arbitrary Whole-program, white-box code analysis

Test Inputs Constraint System Execution Path Known Paths Run Test and Monitor Record Path Condition Choose an Uncovered Path Solve Initially, choose Arbitrary Whole-program, white-box code analysis

Potential Collaboration Ideas on Automated Testing Address problems similar to ones targeted by FDA’s previous work on “Static Analysis of Medical Device Software using CodeSonar” by Jetley, Jones, and Anderson Benefits of our new techniques (also in contrast to existing testing techniques) –No false positives. Each reported issue is a REAL one –Much more powerful than existing commercial tools (Parasoft C#Test, Parasoft Jtest, Agitar AgitarOne, …)

Potential Collaboration Ideas on Automated Testing cont. Our tool development status Most mature/powerful for C# testing (built around MSR Pex by collaborating with MSR Researchers) Java testing tools based on NASA Java Pathfinder, jCUTE, C testing tools based on Crest and Splat

Potential Collaboration Ideas on Automated Testing cont. Regression test generation/differential testing: Given two versions, try to find test inputs to show different behavior –Possible idea 1: given a buggy version and claimed fixed version submitted by manufacturers, generate test inputs to show different behaviors –Possible idea 2: change impact analysis on models or code submitted by manufacturers Use code mining to find targets to violate by testing –Address false positive issues

Other Research Areas Mining program execution to aid program understanding, debugging, … Mining version histories Security policy testing Attack generation Design testing Web app/service testing DB app testing Performance testing …

Major Research Collaboration Areas Mining textual SE data Mining program code data Automated testing