Tao Xie University of Illinois at Urbana-Champaign,USA SBQS 2013.

Slides:



Advertisements
Similar presentations
Object Oriented Analysis And Design-IT0207 iiI Semester
Advertisements

Tutorial Pex4Fun: Teaching and Learning Computer Science via Social Gaming Nikolai Tillmann, Jonathan de Halleux, Judith Bishop, Michal.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
Tao Xie University of Illinois at Urbana-Champaign Part of the research work described in this talk was done in collaboration with the Pex team (Nikolai.
Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.
The Future of Correct Software George Necula. 2 Software Correctness is Important ► Where there is software, there are bugs ► It is estimated that software.
CSE503: SOFTWARE ENGINEERING SYMBOLIC TESTING, AUTOMATED TEST GENERATION … AND MORE! David Notkin Spring 2011.
Pexxxx White Box Test Generation for
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
CS350/550 Software Engineering Lecture 1. Class Work The main part of the class is a practical software engineering project, in teams of 3-5 people There.
EE694v-Verification-Lect5-1- Lecture 5 - Verification Tools Automation improves the efficiency and reliability of the verification process Some tools,
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done Tao Xie North Carolina State University Raleigh, NC, USA.
Deep Dive into Pex How Pex works, implications for design of Code Hunt puzzles Nikolai Tillmann Principal Software Engineering Manager Microsoft, Redmond,
Tao Xie North Carolina State University Supported by CACC/NSA Related projects supported in part by ARO, NSF, SOSI.
Software Testing. Definition To test a program is to try to make it fail.
Automated Testing of System Software (Virtual Machine Monitors) Tao Xie Department of Computer Science North Carolina State University
Tao Xie (North Carolina State University) Nikolai Tillmann, Jonathan de Halleux, Wolfram Schulte (Microsoft Research, Redmond WA, USA)
Automated Developer Testing: Achievements and Challenges Tao Xie North Carolina State University contact:
TESTING.
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Designing For Testability. Incorporate design features that facilitate testing Include features to: –Support test automation at all levels (unit, integration,
Tao Xie University of Illinois at Urbana-Champaign Part of the research work described in this talk was done in collaboration with the Pex team (Nikolai.
Tao Xie Automated Software Engineering Group Department of Computer Science North Carolina State University
TVAC Electronic Call Sheet System Team HeatWave Summer 2007.
Teaching and Learning Programming and Software Engineering via Interactive Gaming Tao Xie University of Illinois at Urbana-Champaign In collaboration with.
Using UML, Patterns, and Java Object-Oriented Software Engineering Chapter 4, Requirements Elicitation.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
From Quality Control to Quality Assurance…and Beyond Alan Page Microsoft.
Tao Xie North Carolina State University Nikolai Tillmann, Peli de Halleux, Wolfram Schulte Microsoft Research.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
Tao Xie (North Carolina State University) Peli de Halleux, Nikolai Tillmann, Wolfram Schulte (Microsoft Research)
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Nikolai Tillmann, Jonathan de Halleux Tao Xie Microsoft Research Univ. Illinois at Urbana-Champaign.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Software Analytics: Towards Software Mining that Matters Tao Xie University of Illinois at Urbana-Champaign
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Week 14 Introduction to Computer Science and Object-Oriented Programming COMP 111 George Basham.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Cooperative Developer Testing: Tao Xie North Carolina State University In collaboration with Xusheng ASE and Nikolai Tillmann, Peli de
Connecting with Computer Science2 Objectives Learn how software engineering is used to create applications Learn some of the different software engineering.
Software Engineering 2004 Jyrki Nummenmaa 1 BACKGROUND There is no way to generally test programs exhaustively (that is, going through all execution.
Tao Xie (North Carolina State University) Nikolai Tillmann, Peli de Halleux, Wolfram Schulte (Microsoft Research)
1 Exposing Behavioral Differences in Cross-Language API Mapping Relations Hao Zhong Suresh Thummalapenta Tao Xie Institute of Software, CAS, China IBM.
Improving Structural Testing of Object-Oriented Programs via Integrating Evolutionary Testing and Symbolic Execution Kobi Inkumsah Tao Xie Dept. of Computer.
Grigore Rosu Founder, President and CEO Professor of Computer Science, University of Illinois
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Symbolic Execution in Software Engineering By Xusheng Xiao Xi Ge Dayoung Lee Towards Partial fulfillment for Course 707.
CSE 331 SOFTWARE DESIGN & IMPLEMENTATION SYMBOLIC TESTING Autumn 2011.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Control Flow Testing Handouts
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
Dynamic Symbolic Execution
Outline of the Chapter Basic Idea Outline of Control Flow Testing
runtime verification Brief Overview Grigore Rosu
Preliminary Analysis of Contestant Performance for a Code Hunt Contest
Introduction to Software Testing
How to Read Research Papers?
CUTE: A Concolic Unit Testing Engine for C
From Use Cases to Implementation
Presentation transcript:

Tao Xie University of Illinois at Urbana-Champaign,USA SBQS 2013

IBM's Deep Blue defeated chess champion Garry Kasparov in 1997 IBM Watson defeated top human Jeopardy! players in 2011

Google’s driverless car Microsoft's instant voice translation tool IBM Watson as Jeopardy! player

"Completely Automated Public Turing test to tell Computers and Humans Apart"

Movie: Minority Report CNN News iPad

 Machine is better at task set A  Mechanical, tedious, repetitive tasks, …  Ex. solving constraints along a long path  Human is better at task set B  Intelligence, human intent, abstraction, domain knowledge, …  Ex. local reasoning after a loop, recognizing naming semantics = A U B 8

Malaysia Airlines Flight Lisanne Bainbridge, "Ironies of Automation”, Automatica Ironies of Automation “Even highly automated systems, such as electric power networks, need human beings... one can draw the paradoxical conclusion that automated systems still are man-machine systems, for which both technical and human factors are important.” “As the plane passed feet, the stall and overspeed warning indicators came on simultaneously—something that’s supposed to be impossible, and a situation the crew is not trained to handle.” IEEE Spectrum 2009

Malaysia Airlines Flight Lisanne Bainbridge, "Ironies of Automation”, Automatica Ironies of Automation “The increased interest in human factors among engineers reflects the irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator.”

 Don’t forget human factors  Using your tools as end-to-end solutions  Helping your tools  Don’t forget cooperations of human and tool; human and human  Human can help your tools too  Human and human could work together to help your tools, e.g., crowdsourcing 11

 Don’t forget human factors  Using your tools as end-to-end solutions  Helping your tools  Don’t forget cooperations of human and tool; human and human  Human can help your tools too  Human and human could work together to help your tools, e.g., crowdsourcing 12

14 “During the past 21 years, over 75 papers and 9 Ph.D. theses have been published on pointer analysis. Given the tones of work on this topic one may wonder, “Haven't we solved this problem yet?'' With input from many researchers in the field, this paper describes issues related to pointer analysis and remaining open problems.” Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001)

15 Section 4.3 Designing an Analysis for a Client’s Needs “ Barbara Ryder expands on this topic: “… We can all write an unbounded number of papers that compare different pointer analysis approximations in the abstract. However, this does not accomplish the key goal, which is to design and engineer pointer analyses that are useful for solving real software problems for realistic programs.”

17 Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proc. OSDI MSRA XIAO Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning code clones at hands of engineers in practice. In Proc. ACSAC 2012 MSR 2011 Keynote by YY Zhou: Connecting Technology with Real-world Problems – From Copy-paste Detection to Detecting Known Bugs Human to Determine What are Serious (Known) Bugs

18 Available in Visual Studio 2012 Searching similar snippets for fixing bug once Finding refactoring opportunity Yingnong Dang, Dongmei Zhang, Song Ge, Yingjun Qiu, and Tao Xie. XIAO: Tuning code clones at hands of engineers in practice. In Proc. Annual Computer Security Applications Conference (ACSAC 2012) XIAO Code Clone Search service integrated into workflow of Microsoft Security Response Center (MSRC) Microsoft Technet Blog about XIAOMicrosoft Technet Blog about XIAO: We wanted to be sure to address the vulnerable code wherever it appeared across the Microsoft code base. To that end, we have been working with Microsoft Research to develop a “Cloned Code Detection” system that we can run for every MSRC case to find any instance of the vulnerable code in any shipping product. This system is the one that found several of the copies of CVE that we are now addressing with MS

19 XIAO enables code clone analysis with High scalability, High compatibility High tunability: what you tune is what you get High explorability: How to navigate through the large number of detected clones? How to quickly review a pair of clones?

 50 years of automated debugging research  N papers  only 5 evaluated with actual programmers “ ” Chris Parnin and Alessandro Orso. Are automated debugging techniques actually helping programmers?. In Proc. ISSTA 2011

 Academia  Tend to leave human out of loop (involving human makes evaluations difficult to conduct or write)  Tend not to spend effort on improving tool usability ▪ tool usability would be valued more in HCI than in SE ▪ too much to include both the approach/tool itself and usability/its evaluation in a single paper  Real-world  Often has human in the loop (familiar IDE integration, social effect, lack of expertise/willingness to write specs,…)  Examples  Agitar [ISSTA 2006] vs. Daikon [TSE 2001]  Test generation in Pex based on constraint solving

 Goal: to identify the future directions in research in formal methods and its transition to industrial practice.  The workshop will bring together researchers and identify primary challenges in the field, both foundational, infrastructural, and in transitioning ideas from research labs to developer tools.

 “Lack of education amongst practitioners”  “Education of students in logic and design for verification”  “Expertise required to create and use a verification tool. E.g., both Astre for Airbus and SDV for Windows drivers were closely shepherded by verification experts.”  “Tools require lots of up-front effort (e.g., to write specifications)”  “User effort required to guide verification tools, such as assertions or specifications”

 “Not integrated with standard development flows (testing)”  “Too many false positives and no ranking of errors”  “General usability of tools, in terms of false alarms and error messages. The Coverity CACM paper pointed out that they had developed features that they do not deploy because they baffle users. Many tools choose unsoundness over soundness to avoid false alarms.”

 “The necessity of detailed specifications and complex interaction with tools, which is very costly and discouraging for industrial, who lack high-level specialists.”  “Feedback to users. It’s difficult to explain to users why automated verification tools are failing. Counterexamples to properties can be very difficult for users to understand, especially when they are abstract, or based on incomplete environment models or constraints.”

2010 Dagstuhl Seminar Practical Software Testing : Tool Automation and Human Factors

2010 Dagstuhl Seminar Practical Software Testing : Tool Automation and Human Factors Human Factors

Andy Ko and Brad Myers. Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior. In Proc. ICSE 2008

 Don’t forget human factors  Using your tools as end-to-end solutions  Helping your tools  Don’t forget cooperations of human and tool intelligence; human and human intelligence  Human can help your tools too  Human and human could work together to help your tools, e.g., crowdsourcing 29

 Motivation  Architecture recovery is challenging (abstraction gap)  Human typically has high-level view in mind  Repeat  Human: define/update high-level model of interest  Tool: extract a source model  Human: define/update declarative mapping between high-level model and source model  Tool: compute a software reflexion model  Human: interpret the software reflexion model Until happy Gail C. Murphy, David Notkin. Reengineering with Reflection Models: A Case Study. IEEE Computer 1997

Running Symbolic PathFinder... … ===================================== ================= results no errors detected ===================================== ================= statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 … 31

32  Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing  Instrument code to explore feasible paths  Example tool: Pex from Microsoft Research (for.NET programs) P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. PLDI 2005 K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. ESEC/FSE 2005 N. Tillmann and J. de Halleux. Pex - White Box Test Generation for.NET. TAP 2008 L. A. Clarke. A system to generate test data and symbolically execute programs. TSE J. C. King. Symbolic execution and program testing. CACM 1976.

Code to generate inputs for: Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]== void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == ) throw new Exception("bug"); } void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == ) throw new Exception("bug"); } Observed constraints a==null a!=null && !(a.Length>0) a!=null && a.Length>0 && a[0]!= a!=null && a.Length>0 && a[0]== Data null {} {0} {123…} a==null a.Length>0 a[0]==123… T T F T F F Execute&Monitor Solve Choose next path Done: There is no path left. Negated condition

Download counts initial 20 months of release Academic: 17,366 Industrial: 13,022 Total: 30, “It has saved me two major bugs (not caught by normal unit tests) that would have taken at least a week to track down and fix normally plus a few smaller issues so I'm a big proponent of Pex.” Pex detected various bugs (including a serious bug) in a core.NET component (already been extensively tested over 5 years by 40 testers), used by thousands of developers and millions of end users. Released since

 Method sequences  MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09], Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]  Environments e.g., db, file systems, network, …  DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11]  CloudApp Testing [Zhang et al. IEEE Soft 12]  Loops  Fitnex [Xie et al. DSN 09]

36 void test1() { Graph ag = new Graph(); Vertex v1 = new Vertex(0); ag.AddVertex(v1); } 36 00: class Graph { … 03: public void AddVertex (Vertex v) { 04: vertices.Add(v); 05: } 06: public Edge AddEdge (Vertex v1, Vertex v2) { … 15: } 16: } Class Under Test void test2() { Graph ag = new Graph(); Vertex v1 = new Vertex(0); ag.AddEdge(v1, v1); } … Generated Unit Tests Manual Test Generation: Tedious, Missing Special/Corner Cases, …

Running Symbolic PathFinder... … ===================================== ================= results no errors detected ===================================== ================= statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 … 37

 object-creation problems (OCP) - 65%  external-method call problems (EMCP) – 27% Total block coverage achieved is 50%, lowest coverage 16%. 38  Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing  Instrument code to explore feasible paths  Challenge: path explosion When desirable receiver or argument objects are not generated

39  A graph example from QuickGraph library  Includes two classes Graph DFSAlgorithm  Graph AddVertex AddEdge: requires both vertices to be in graph 00: class Graph { … 03: public void AddVertex (Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (Vertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } } 39 [OOPSLA 11]

40  Test target: Cover true branch (B4) of Line 24  Desired object state: graph should include at least one edge  Target sequence: Graph ag = new Graph(); Vertex v1 = new Vertex(0); Vertex v2 = new Vertex(1); ag.AddVertex(v1); ag.AddVertex(v2); ag.AddEdge(v1, v2); DFSAlgorithm algo = new DFSAlgorithm(ag); algo.Compute(v1); 40 00: class Graph { … 03: public void AddVertex (Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (Vertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } } [OOPSLA 11]

 object-creation problems (OCP) - 65%  external-method call problems (EMCP) – 27% Total block coverage achieved is 50%, lowest coverage 16%. 41  Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing  Instrument code to explore feasible paths  Challenge: path explosion Typically DSE instruments or explores only project under test; Third-party API external methods (network, I/O,..): too many paths uninstrumentable

42

Total block coverage achieved is 50%, lowest coverage 16%. 43  Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing  Instrument code to explore feasible paths  Challenge: path explosion Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011

2010 Dagstuhl Seminar Practical Software Testing: Tool Automation and Human Factors

 Tackling object-creation problems  Seeker [OOSPLA 11], MSeqGen [ESEC/FSE 09] Covana [ICSE 11], OCAT [ISSTA 10] Evacon [ASE 08], Symclat [ASE 06]  Still not good enough (at least for now)! ▪ Seeker (52%) > Pex/DSE (41%) > Randoop/random (26%)  Tackling external-method call problems  DBApp Testing [ESEC/FSE 11], [ASE 11]  CloudApp Testing [IEEE Soft 12]  Deal with only common environment ASE

46  Test target: Cover true branch (B4) of Line 24  Desired object state: graph should include at least one edge  Target sequence: Graph ag = new Graph(); Vertex v1 = new Vertex(0); Vertex v2 = new Vertex(1); ag.AddVertex(v1); ag.AddVertex(v2); ag.AddEdge(v1, v2); DFSAlgorithm algo = new DFSAlgorithm(ag); algo.Compute(v1); 46 00: class Graph { … 03: public void AddVertex (Vertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (Vertex v1, Vertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (Vertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } }

Tackle object-creation problems with Factory Methods 47

Tackle external-method call problems with Mock Methods or Method Instrumentation Mocking System.IO.File.ReadAllText 48

 Human-Assisted Computing  Driver: tool  Helper: human  Ex. Covana [ICSE 2011]  Human-Centric Computing  Driver: human  Helper: tool  Ex. Pex for Fun [ICSE 2013 SEE] Interfaces are important. Contents are important too! 49

50 Symptoms (Likely) Causes external-method call problems (EMCP) all executed external-method calls object-creation problems (OCP) all non-primitive program inputs/fields

 Causal analysis: tracing between symptoms and (likely) causes  Reduce cost of human consumption ▪ reduction of #(likely) causes ▪ diagnosis of each cause  Solution construction: fixing suspected causes  Reduce cost of human contribution ▪ measurement of solution goodness ▪ Inner iteration of human-tool cooperation! 51

52 Symptoms (Likely) Causes external-method call problems (EMCP) object-creation problems (OCP) Given symptom s foreach (c in LikelyCauses) { Fix(c); if (IsObserved(s)) RelevantCauses.add(c) }

 Goal: Precisely identify problems (causes) faced by a tool for causing not to cover a statement (symptom)  Insight: Partially-covered conditional has data dependency on a real problem 53 [ICSE 11] From xUnit

Data Dependencies 54  Consider only EMCPs whose arguments have data dependencies on program inputs ▪ Fixing such problem candidates facilitates test-generation tools From xUnit

Symptom Expression: return(File.Exists) == true Element of EMCP Candidate: return(File.Exists) Conditional in Line 1 has data dependency on File.Exists 55  Partially-covered conditionals have data dependencies on EMCP candidates

56 From xUnit

Data Dependence Analysis Forward Symbolic Execution Problem Candidates Problem Candidate Identification Runtime Information Identified Problems Coverage Program Generated Test Inputs Runtime Events 57 [Inputs  EMCP] [EMCP  Symptom]

 Subjects:  xUnit: unit testing framework for.NET ▪ 223 classes and interfaces with 11.4 KLOC  QuickGraph: C# graph library ▪ 165 classes and interfaces with 8.3 KLOC  Evaluation setup:  Apply Pex to generate tests for program under test  Feed the program and generated tests to Covana  Compare baseline solution and Covana 58

 RQ1: How effective is Covana in identifying the two main types of problems, EMCPs and OCPs?  RQ2: How effective is Covana in pruning irrelevant problem candidates of EMCPs and OCPs? 59

Covana identifies 43 EMCPs with only 1 false positive and 2 false negatives 155 OCPs with 20 false positives and 30 false negatives. 60

Covana prunes 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives 61

 Motivation  Tools are often not powerful enough  Human is good at some aspects that tools are not  What difficulties does the tool face?  How to communicate info to the user to get help?  How does the user help the tool based on the info? 62 Iterations to form Feedback Loop

 Human-Assisted Computing  Driver: tool  Helper: human  Ex. Covana [ICSE 2011]  Human-Centric Computing  Driver: human  Helper: tool  Ex. Pex for Fun [ICSE 2013 SEE] Interfaces are important. Contents are important too! 63

1,270,159 clicked 'Ask Pex!' 64 Nikolai Tillmann, Jonathan De Halleux, Tao Xie, Sumit Gulwani and Judith Bishop. Teaching and Learning Programming and Software Engineering via Interactive Gaming. In Proc. ICSE 2013 SEE.

Secret Implementation class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } Player Implementation class Player { public static int Puzzle(int x) { return x ; } class Test { public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); } behavior Secret Impl == Player Impl 65

 Coding duels at  Brain exercising/learning while having fun  Fun: iterative, adaptive/personalized, w/ win criterion  Abstraction/generalization, debugging, problem solving Brain exercising

Observed Benefits Automatic Grading Real-time Feedback (for Both Students and Teachers) Fun Learning Experiences

“It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!” “I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.” “I’m afraid I’ll have to constrain myself to spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.” X

70 Internet class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } }  Everyone can contribute  Coding duels  Duel solutions

Internet Puzzle Games Made from Difficult Constraints or Object- Creation Problems Supported by MSR SEIF Award Ning Chen and Sunghun Kim. Puzzle-based Automatic Testing: bringing humans into the loop by solving puzzles. In Proc. ASE 2012

73 Pattern Matching Bug update Problematic Pattern Repository Bug Database Trace analysis Bug filing StackMine [Han et al. ICSE 12] Trace Storage Trace collection Internet Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.” - from Development Manager in Windows Highly effective new issue discovery on Windows mini-hang Continuous impact on future Windows versions 74 Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

 Don’t forget human factors  Using your tools as end-to-end solutions  Helping your tools  Don’t forget cooperations of human and tool intelligence; human and human intelligence  Human can help your tools too  Human and human could work together to help your tools, e.g., crowdsourcing 75

 Human-Assisted Computing  Human-Centric Computing  Human-Human Cooperation

 Don’t forget human factors  Using your tools as end-to-end solutions  Helping your tools  Don’t forget cooperations of human and tool; human and human  Human can help your tools too  Human and human could work together to help your tools, e.g., crowdsourcing 77

 Wonderful current/former  Collaborators, especially those from Microsoft Research Redmond/Asia, Peking University  Colleagues who gave feedback and inspired me NSF grants CCF , CCF , CNS , ARO grant W911NF , an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award

Questions ?

 Human-Assisted Computing  Human-Centric Computing  Human-Human Cooperation