Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done Tao Xie North Carolina State University Raleigh, NC, USA.

Slides:

Advertisements

Similar presentations

Tutorial Pex4Fun: Teaching and Learning Computer Science via Social Gaming Nikolai Tillmann, Jonathan de Halleux, Judith Bishop, Michal.

Advertisements

1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)

Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.

Software checking: the performance gap Nils Klarlund Lucent Technologies Bell Labs.

CSE503: SOFTWARE ENGINEERING SYMBOLIC TESTING, AUTOMATED TEST GENERATION … AND MORE! David Notkin Spring 2011.

Chapter 15 Design, Coding, and Testing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Design Document The next step in the Software.

Pexxxx White Box Test Generation for

Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering 3 October 2007.

EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,

Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Microsoft ® Official Course Monitoring and Troubleshooting Custom SharePoint Solutions SharePoint Practice Microsoft SharePoint 2013.

Fundamentals of Python: From First Programs Through Data Structures

CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.

Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering.

Deep Dive into Pex How Pex works, implications for design of Code Hunt puzzles Nikolai Tillmann Principal Software Engineering Manager Microsoft, Redmond,

Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.

Separation of Concerns Tao Xie Peking University, China North Carolina State University, USA In collaboration with Nikolai Tillmann, Peli de Halleux, Wolfram.

Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.

Tao Xie North Carolina State University Supported by CACC/NSA Related projects supported in part by ARO, NSF, SOSI.

Automated Testing of System Software (Virtual Machine Monitors) Tao Xie Department of Computer Science North Carolina State University

DART: Directed Automated Random Testing Koushik Sen University of Illinois Urbana-Champaign Joint work with Patrice Godefroid and Nils Klarlund.

Tao Xie (North Carolina State University) Nikolai Tillmann, Jonathan de Halleux, Wolfram Schulte (Microsoft Research, Redmond WA, USA)

Automated Developer Testing: Achievements and Challenges Tao Xie North Carolina State University contact:

CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

CMSC 345 Fall 2000 Unit Testing. The testing process.

Study of Automated Extraction of Security Policy from Natural-Language Software Documents * Nov. 21, 2013, Kaidi Ma, Man Sun Computer Information Science.

Tao Xie University of Illinois at Urbana-Champaign Part of the research work described in this talk was done in collaboration with the Pex team (Nikolai.

Xusheng Xiao, Tao Xie North Carolina State University Amit Paradkar IBM T.J. Watson Research Center

Tao Xie Automated Software Engineering Group Department of Computer Science North Carolina State University

1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.

Teaching and Learning Programming and Software Engineering via Interactive Gaming Tao Xie University of Illinois at Urbana-Champaign In collaboration with.

Tao Xie University of Illinois at Urbana-Champaign,USA SBQS 2013.

Computer Concepts 2014 Chapter 12 Computer Programming.

Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.

Tao Xie North Carolina State University Nikolai Tillmann, Peli de Halleux, Wolfram Schulte Microsoft Research.

Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Machine Learning.

Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.

Mining Gigabytes of Dynamic Traces for Test Generation Suresh Thummalapenta North Carolina State University Peli de Halleux and Nikolai Tillmann Microsoft.

Code Hunt: Experience with Coding Contests at Scale Judith Bishop, R Nigel Horspool, Tao Xie, Nikolai Tillmann, Jonathan de Halleux Microsoft Research,

Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.

Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.

Nikolai Tillmann, Jonathan de Halleux Tao Xie Microsoft Research Univ. Illinois at Urbana-Champaign.

Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.

Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.

Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.

Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.

Cooperative Developer Testing: Tao Xie North Carolina State University In collaboration with Xusheng ASE and Nikolai Tillmann, Peli de

CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.

A Test Case + Mock Class Generator for Coding Against Interfaces Mainul Islam, Christoph Csallner Software Engineering Research Center (SERC) Computer.

Improving Structural Testing of Object-Oriented Programs via Integrating Evolutionary Testing and Symbolic Execution Kobi Inkumsah Tao Xie Dept. of Computer.

CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Winter 2007SEG2101 Chapter 121 Chapter 12 Verification and Validation.

Symbolic Execution in Software Engineering By Xusheng Xiao Xi Ge Dayoung Lee Towards Partial fulfillment for Course 707.

CSE 331 SOFTWARE DESIGN & IMPLEMENTATION SYMBOLIC TESTING Autumn 2011.

Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.

CS223: Software Engineering Lecture 26: Software Testing.

Jeremy Nimmer, page 1 Automatic Generation of Program Specifications Jeremy Nimmer MIT Lab for Computer Science Joint work with.

Control Flow Testing Handouts

Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing

Dynamic Symbolic Execution

Towards Trustworthy Program Repair

Outline of the Chapter Basic Idea Outline of Control Flow Testing

A Test Case + Mock Class Generator for Coding Against Interfaces

Structural testing, Path Testing

RDE: Replay DEbugging for Diagnosing Production Site Failures

Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.

CSC-682 Advanced Computer Security

CUTE: A Concolic Unit Testing Engine for C

Presentation transcript:

Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done Tao Xie North Carolina State University Raleigh, NC, USA

IBM's Deep Blue defeated chess champion Garry Kasparov in 1997 IBM Watson defeated top human Jeopardy! players in 2011

"Completely Automated Public Turing test to tell Computers and Humans Apart"

Movie: Minority Report CNN News iPad

…

2010 Dagstuhl Seminar Practical Software Testing : Tool Automation and Human Factors

2010 Dagstuhl Seminar Practical Software Testing : Tool Automation and Human Factors Human Factors

9  Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing  Instrument code to explore feasible paths  Example tool: Pex from Microsoft Research (for.NET programs) Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In Proc. PLDI 2005 Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proc. ESEC/FSE 2005 Nikolai Tillmann and Jonathan de Halleux. Pex - White Box Test Generation for.NET. In Proc. TAP 2008

Code to generate inputs for: Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]== void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == ) throw new Exception("bug"); } void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == ) throw new Exception("bug"); } Observed constraints a==null a!=null && !(a.Length>0) a!=null && a.Length>0 && a[0]!= a!=null && a.Length>0 && a[0]== Data null {} {0} {123…} a==null a.Length>0 a[0]==123… T T F T F F Execute&Monitor Solve Choose next path Done: There is no path left. Negated condition

 Method sequences  MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09], Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]  Environments e.g., db, file systems, network, …  DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11]  CloudApp Testing [Zhang et al. IEEE Soft 12]  Loops  Fitnex [Xie et al. DSN 09]  Code evolution  eXpress [Taneja et al. ISSTA ASE

Download counts (20 months) (Feb Oct ) Academic: 17,366 Devlabs: 13,022 Total: 30,388

Publications:

Running Symbolic PathFinder... … ===================================== ================= results no errors detected ===================================== ================= statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 … 14

 object-creation problems (OCP) - 65%  external-method call problems (EMCP) – 27% Total block coverage achieved is 50%, lowest coverage 16%. 15  Example: Dynamic Symbolic Execution/Concolic Testing  Instrument code to explore feasible paths  Challenge: path explosion

16  A graph example from QuickGraph library  Includes two classes Graph DFSAlgorithm  Graph AddVertex AddEdge: requires both vertices to be in graph 00: class Graph : IVEListGraph { … 03: public void AddVertex (IVertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (IVertex v1, IVertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (IVertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } } 16 [Thummalapenta et al. OOPSLA 11]

17  Test target: Cover true branch (B4) of Line 24  Desired object state: graph should include at least one edge  Target sequence: Graph ag = new Graph(); Vertex v1 = new Vertex(0); Vertex v2 = new Vertex(1); ag.AddVertex(v1); ag.AddVertex(v2); ag.AddEdge(v1, v2); DFSAlgorithm algo = new DFSAlgorithm(ag); algo.Compute(v1); 17 00: class Graph : IVEListGraph { … 03: public void AddVertex (IVertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (IVertex v1, IVertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (IVertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } } [Thummalapenta et al. OOPSLA 11]

 object-creation problems (OCP) - 65%  external-method call problems (EMCP) – 27% Total block coverage achieved is 50%, lowest coverage 16%. 18  Example: Dynamic Symbolic Execution/Concolic (Pex)  Instrument code to explore feasible paths  Challenge: path explosion

 Example 1:  File.Exists has data dependencies on program input  Subsequent branch at Line 1 using the return value of File.Exists.  Example 2:  Path.GetFullPath has data dependencies on program input  Path.GetFullPath throws exceptions.  Example 3: String.Format do not cause any problem

Tackle object-creation problems with Factory Methods 20

Tackle external-method call problems with Mock Methods or Method Instrumentation Mocking System.IO.File.ReadAllText 21

Running Symbolic PathFinder... … ===================================== ================= results no errors detected ===================================== ================= statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 … Tools Typically Don’t Communicate Challenges Faced by Them to Enable Cooperation between Tools and Users 22

 Machine is better at task set A  Mechanical, tedious, repetitive tasks, …  Ex. solving constraints along a long path  Human is better at task set B  Intelligence, human intent, abstraction, domain knowledge, …  Ex. local reasoning after a loop, recognizing naming semantics = A U B 23

 Human-Assisted Computing  Driver: tool  Helper: human  Ex. Covana [Xiao et al. ICSE 2011]  Human-Centric Computing  Driver: human  Helper: tool  Ex. Coding for Fun Interfaces are important. Contents are important too! 24

 Motivation  Tools are often not powerful enough  Human is good at some aspects that tools are not  What difficulties does the tool face?  How to communicate info to the user to get help?  How does the user help the tool based on the info? 25 Iterations to form Feedback Loop

 Motivation  Tools are often not powerful enough  Human is good at some aspects that tools are not  What difficulties does the tool face?  How to communicate info to the user to get help?  How does the user help the tool based on the info? 26 Iterations to form Feedback Loop

external-method call problems (EMCP) object-creation problems (OCP) 27

 Existing solution  identify all executed external-method calls  report all object types of program inputs and fields  Limitations  the number is often high  some identified problem are irrelevant for achieving higher structural coverage 28

Real EMCPs: 0 Real OCPs: 5 Reported EMCPs: 44 Reported OCPs: 18 vs. 29

 Goal: Precisely identify problems faced by tools when achieving structural coverage  Insight: Partially-Covered Statements have data dependency on real problem candidates 30 [Xiao et al. ICSE 11] Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011

Data Dependence Analysis Forward Symbolic Execution Problem Candidates Problem Candidate Identification Runtime Information Identified Problems Coverage Program Generated Test Inputs Runtime Events 31

Data Dependencies 32  External-method calls whose arguments have data dependencies on program inputs

Symbolic Expression: return(File.Exists) == true Element of EMCP Candidate: return(File.Exists) Branch Statement Line 1 has data dependency on File.Exists at Line 1 33  Partially-covered branch statements have data dependencies on EMCP candidates for return values

 Subjects:  xUnit: unit testing framework for.NET ▪ 223 classes and interfaces with 11.4 KLOC  QuickGraph: C# graph library ▪ 165 classes and interfaces with 8.3 KLOC  Evaluation setup:  Apply Pex to generate tests for program under test  Feed the program and generated tests to Covana  Compare existing solution and Covana 34

 RQ1: How effective is Covana in identifying the two main types of problems, EMCPs and OCPs?  RQ2: How effective is Covana in pruning irrelevant problem candidates of EMCPs and OCPs? 35

Covana identifies 43 EMCPs with only 1 false positive and 2 false negatives 155 OCPs with 20 false positives and 30 false negatives. 36

Covana prunes 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives 37

 Human-Assisted Computing  Driver: tool  Helper: human  Ex. Covana [Xiao et al. ICSE 2011]  Human-Centric Computing  Driver: human  Helper: tool  Ex. Coding for Fun Interfaces are important. Contents are important too! 38

1,126,136 clicked 'Ask Pex!' The contributed concept of Coding Duel games as major game type of Pex for Fun since Summer N. Tillmann, J. De Halleux, T. Xie, S. Gulwani and J. Bishop. Teaching and Learning Programming and Software Engineering via Interactive Gaming. In Proc. ICSE 2013, Software Engineering Education (SEE), 2013.

Secret Implementation class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } Player Implementation class Player { public static int Puzzle(int x) { return x ; } class Test { public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); } behavior Secret Impl == Player Impl 40

 Coding duels at  Brain exercising/learning while having fun  Fun: iterative, adaptive/personalized, w/ win criterion  Abstraction/generalization, debugging, problem solving Brain exercising

Especially valuable in Massive Open Online Courses (MOOC)

44 Internet class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } }  Everyone can contribute  Coding duels  Duel solutions

 ACP includes rules to control which principals have access to which resources  A policy rule includes four elements  subject – HCP  action - edit  resource - patient's account  effect - deny “The Health Care Personnel (HCP) does not have the ability to edit the patient's account.” ex.

 How to ensure correct specification of ACPs?  ACPs may be complex/error-prone to specify  ACPs are often written in natural language (NL)  How to ensure correct enforcement of ACPs?  Gap btw ACPs (domain concepts) and system implementation (programming concepts)  Functional requirements bridge the gap but are often written in NL NL Functional Requirement System Implementation NL ACPs conformance

 Model Construction  specify and combine access control (AC) models (e.g., Multi-Level, RBAC )  Model Verification  verify AC models against given properties  Implementation Testing  test AC implementation with NIST ACTS  XACML Synthesis ~130 organizations/users : DISA, DOE Fermi Lab, SAIC, NOAA, Rosssampson Corporation, John Hopkins U, Inventure Enterprises, …

 In practice, ACPs are often written in natural language (NL), especially in legacy systems  Supposed to be written in non-functional requirements (e.g., security requirement)  But often buried inside functional requirements …… Patient MID should be the number assigned when the patient is added to the system and cannot be edited. The HCP does not have the ability to edit the patient's security question and password. ……. ( UC1 of iTrust use cases) ex.

ACP Extraction Access Control Policy Effect Subject Action Resource HCP edit patient.account deny “The Health Care Personnel (HCP) does not have the ability to edit the patient's account.”

 Scenario-based functional requirements:  use case: a sequence of action steps, describing ▪ principals access different resources for achieving some functionalities  Resource access information:  subject – patient  action – view  resource – access log The patient views access log. ex.

 Validate to detect inconsistencies of action steps  with formalized/extracted ACPs  in terms of inconsistent names used for referring to the same entity (e.g., user) across different use cases enterer/editor used in UC 4 of iTrust use cases actually refers to admin and LHCP users. ex. “An admin creates a LHCP, an ER, a Laboratory Technician (LT), or a public health agent (PHA) [S1]. A LHCP creates [S2] UAPs. Once entered, the enterer/editor is presented a screen of the input to approve [E2].”

 TC1: Semantic Structure Variance  different ways to specify the same rule  TC2: Negative Meaning Implicitness  verb could have negative meaning ACP 1: An HCP cannot change patient’s account. ACP2: An HCP is disallowed to change patient’s account.

 TC3: Anaphora  TC4: Transitive Subject  TC5: Perspective Variance These challenges apply when extracting ACPs from Functional Requirements Step 1: An HCP creates an account. Step 2:He edits the account. Step 3: The system updates the account. Step 4: The system displays the updated account. HCP HCP views the updated account.

 Ensure correct specification  automatically extract ACPs from NL documents  Ensure correct enforcement  automatically extract action steps from NL use cases  New Natural Language Processing (NLP) techniques  syntactic analysis: extract syntactic structure (noun group, verb group)  semantic analysis: extract semantic meaning of elements (e.g., subject, action, resource, and effect) [FSE 2012]

 Human-Assisted Computing  Covana  Human-Centric Computing  Pex for Fun  Security Policy  NCSU/NIST ACPT  Text2Policy

Questions ?

57 Pattern Matching Bug update Problematic Pattern Repository Bug Database Trace analysis Bug filing StackMine [Han et al. ICSE 12] Trace Storage Trace collection Internet Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.” - from Development Manager in Windows Highly effective new issue discovery on Windows mini-hang Continuous impact on future Windows versions 58 Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

 Static analysis + dynamic analysis  Static checking + Test generation  …  Dynamic analysis + static analysis  Fix generation + fix validation  …  Static analysis + static analysis  …  Dynamic analysis + dynamic analysis  … 59 Example: Xiaoyin Wang, Lu Zhang, Tao Xie, Yingfei Xiong, and Hong Mei. Automating Presentation Changes in Dynamic Web Applications via Collaborative Hybrid Analysis. In Proc. FSE 2012