Automated Developer Testing: Achievements and Challenges Tao Xie North Carolina State University contact:

Slides:

Advertisements

Similar presentations

Tutorial Pex4Fun: Teaching and Learning Computer Science via Social Gaming Nikolai Tillmann, Jonathan de Halleux, Judith Bishop, Michal.

Advertisements

Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.

1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)

Symbolic execution © Marcelo d’Amorim 2010.

Tao Xie University of Illinois at Urbana-Champaign Part of the research work described in this talk was done in collaboration with the Pex team (Nikolai.

Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.

Pexxxx White Box Test Generation for

Rick Kuhn Computer Security Division

Program Exploration with Pex Nikolai Tillmann, Peli de Halleux Pex

Chair of Software Engineering Automatic Verification of Computer Programs.

Terms: Test (Case) vs. Test Suite

CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.

Unit Testing & Defensive Programming. F-22 Raptor Fighter.

Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job Done Tao Xie North Carolina State University Raleigh, NC, USA.

Deep Dive into Pex How Pex works, implications for design of Code Hunt puzzles Nikolai Tillmann Principal Software Engineering Manager Microsoft, Redmond,

Separation of Concerns Tao Xie Peking University, China North Carolina State University, USA In collaboration with Nikolai Tillmann, Peli de Halleux, Wolfram.

Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.

Tao Xie North Carolina State University Supported by CACC/NSA Related projects supported in part by ARO, NSF, SOSI.

Software Testing. Definition To test a program is to try to make it fail.

Automated Testing of System Software (Virtual Machine Monitors) Tao Xie Department of Computer Science North Carolina State University

Tao Xie (North Carolina State University) Nikolai Tillmann, Jonathan de Halleux, Wolfram Schulte (Microsoft Research, Redmond WA, USA)

CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

Tao Xie University of Illinois at Urbana-Champaign Part of the research work described in this talk was done in collaboration with the Pex team (Nikolai.

Tao Xie Automated Software Engineering Group Department of Computer Science North Carolina State University

1 Automatic Identification of Common and Special Object-Oriented Unit Tests Dept. of Computer Science & Engineering University of Washington, Seattle Oct.

Teaching and Learning Programming and Software Engineering via Interactive Gaming Tao Xie University of Illinois at Urbana-Champaign In collaboration with.

Tao Xie University of Illinois at Urbana-Champaign,USA SBQS 2013.

Low-Level Detailed Design SAD (Soft Arch Design) Mid-level Detailed Design Low-Level Detailed Design Design Finalization Design Document.

Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.

Tao Xie North Carolina State University Nikolai Tillmann, Peli de Halleux, Wolfram Schulte Microsoft Research.

Code Contracts Parameterized Unit Tests Tao Xie. Example Unit Test Case = ? Outputs Expected Outputs Program + Test inputs Test Oracles 2 void addTest()

Parameterized Unit Tests By Nikolai Tillmann and Wolfram Schulte Proc. of ESEC/FSE 2005 Presented by Yunho Kim Provable Software Lab, KAIST TexPoint fonts.

Well-behaved objects Main concepts to be covered Testing Debugging Test automation Writing for maintainability Objects First with Java - A Practical.

Cs2220: Engineering Software Class 6: Defensive Programming Fall 2010 University of Virginia David Evans.

Tao Xie (North Carolina State University) Peli de Halleux, Nikolai Tillmann, Wolfram Schulte (Microsoft Research)

Mining Gigabytes of Dynamic Traces for Test Generation Suresh Thummalapenta North Carolina State University Peli de Halleux and Nikolai Tillmann Microsoft.

Code Hunt: Experience with Coding Contests at Scale Judith Bishop, R Nigel Horspool, Tao Xie, Nikolai Tillmann, Jonathan de Halleux Microsoft Research,

Unit Testing with JUnit and Clover Based on material from: Daniel Amyot JUnit Web site.

Nikolai Tillmann, Jonathan de Halleux Tao Xie Microsoft Research Univ. Illinois at Urbana-Champaign.

1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng

Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.

Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.

Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.

Cooperative Developer Testing: Tao Xie North Carolina State University In collaboration with Xusheng ASE and Nikolai Tillmann, Peli de

CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.

Tao Xie (North Carolina State University) Nikolai Tillmann, Peli de Halleux, Wolfram Schulte (Microsoft Research)

David Streader Computer Science Victoria University of Wellington Copyright: David Streader, Victoria University of Wellington Debugging COMP T1.

Parameterized Unit Testing in the Open Source Wild Wing Lam (U. Illinois) In collaboration with Siwakorn Srisakaokul, Blake Bassett, Peyman Mahdian and.

1 Exposing Behavioral Differences in Cross-Language API Mapping Relations Hao Zhong Suresh Thummalapenta Tao Xie Institute of Software, CAS, China IBM.

PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.

CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.

1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.

Random Test Generation of Unit Tests: Randoop Experience

Symbolic Execution in Software Engineering By Xusheng Xiao Xi Ge Dayoung Lee Towards Partial fulfillment for Course 707.

Clear Lines Consulting · clear-lines.comApril 21, 2010 · 1 The Joy of Pex

Cs498dm Software Testing Darko Marinov January 24, 2012.

1 © Agitar Software, 2007 Automated Unit Testing with AgitarOne Presented by Eamon McCormick Senior Solutions Consultant, Agitar Software Inc. Presented.

Dynamic Symbolic Execution

Input Space Partition Testing CS 4501 / 6501 Software Testing

Chapter 8 – Software Testing

A Test Case + Mock Class Generator for Coding Against Interfaces

Preliminary Analysis of Contestant Performance for a Code Hunt Contest

Marcelo d’Amorim (UIUC)

Eclat: Automatic Generation and Classification of Test Inputs

Code Contracts and Pex Peli de Halleux, Nikolai Tillmann

White-Box Testing Using Pex

It is great that we automate our tests, but why are they so bad?

CUTE: A Concolic Unit Testing Engine for C

CSE 1020:Software Development

Presentation transcript:

Automated Developer Testing: Achievements and Challenges Tao Xie North Carolina State University contact:

Automation in Developer Testing Background on developer testing – – Kent Beck’s 2004 talk on “Future of Developer Testing” This talk focuses on developer testing – Not system testing etc. conducted by testers Unit Test Automation commonly referred to writing unit test cases manually, executed automatically Automation here is broad, including automatic test generation 2

Software Testing Setup = ? Outputs Expected Outputs Program + Test inputs Test Oracles 3

Software Testing Problems = ? Outputs Expected Outputs Program + Test inputs Test Oracles 4 Faster: How can tools help developers create and run tests faster?

Software Testing Problems = ? Outputs Expected Outputs Program + Test inputs Test Oracles 5 Faster: How can tools help developers create and run tests faster? Better Test Inputs: How can tools help generate new better test inputs?

Software Testing Problems = ? Outputs Expected Outputs Program + Test inputs Test Oracles 6 Faster: How can tools help developers create and run tests faster? Better Test Inputs: How can tools help generate new better test inputs? Better Test Oracles: How can tools help generate better test oracles?

Example Unit Test Case = ? Outputs Expected Outputs Program + Test inputs Test Oracles 7 void addTest() { ArrayList a = new ArrayList(1); Object o = new Object(); a.add(o); AssertTrue(a.get(0) == o); } Appropriate method sequence Appropriate primitive argument values Appropriate assertions Test Case = Test Input + Test Oracle

Levels of Test Oracles Expected output for an individual test input –In the form of assertions in test code Properties applicable for multiple test inputs –Crash (uncaught exceptions) or not, related to robustness issues, supported by most tools –Properties in production code: Design by Contract (precondition, postcondition, class invariants) supported by Parasoft Jtest, Google CodePro AnalytiX –Properties in test code: Parameterized unit tests supported by MSR Pex, AgitarOne X. Xiao, S. Thummalapenta, and T. Xie. Advances on Improving Automation in Developer Testing. In Advances in Computers, devtesthttp://people.engr.ncsu.edu/txie/publications.htm#ac12- devtest

Economics of Test Oracles 9 Expected output for an individual test input –Easy to manually verify for one test input –Expensive/infeasible to verify for many test inputs –Limited benefits: only for one test input Properties applicable for multiple test inputs –Not easy to write (need abstraction skills) –But once written, broad benefits for multiple test inputs

Assert behavior of multiple test inputs Design by Contract Example tools: Parasoft Jtest, Google CodePro AnalytiX, MSR Code Contracts, MSR Pex Class invariant: properties being satisfied by an object (in a consistent state) [AgitarOne allows a class invariant helper method used as test oracles] Precondition: conditions to be satisfied (on receiver object and arguments) before a method can be invoked Postcondition: properties being satisfied (on receiver object and return) after the method has returned Other types of specs also exist

Microsoft Research Code Contracts [ContractInvariantMethod] void ObjectInvariant() { Contract.Invariant( items != null ); } Features  Language expression syntax  Type checking / IDE  Declarative  Special Encodings  Result and Old public virtual int Add(object value) { Contract.Requires( value != null ); Contract.Ensures( Count == Contract.OldValue(Count) + 1 ); Contract.Ensures( Contract.Result () == Contract.OldValue(Count) ); if (count == items.Length) EnsureCapacity(count + 1); items[count] = value; return count++; } - Slide adapted from MSR RiSE

Parameterized Unit Testing void TestAdd(List list, int item) { Assume.IsTrue(list != null); var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count); } void TestAdd(List list, int item) { Assume.IsTrue(list != null); var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count); } Parameterized Unit Test = Unit Test with Parameters Separation of concerns – Data is generated by a tool – Developer can focus on functional specification [Tillmann&Schulte ESEC/FSE 05]

Parameterized Unit Tests are Formal Specifications Algebraic Specifications A Parameterized Unit Test can be read as a universally quantified, conditional axiom. void TestReadWrite(Res r, string name, string data) { Assume.IsTrue(r!=null & name!=null && data!=null); r.WriteResource(name, data); Assert.AreEqual(r.ReadResource(name), data); } void TestReadWrite(Res r, string name, string data) { Assume.IsTrue(r!=null & name!=null && data!=null); r.WriteResource(name, data); Assert.AreEqual(r.ReadResource(name), data); }  string name, string data, Res r: r ≠ null ⋀ name ≠ null ⋀ data ≠ null ⇒ equals( ReadResource(WriteResource(r, name, data).state, name), data)  string name, string data, Res r: r ≠ null ⋀ name ≠ null ⋀ data ≠ null ⇒ equals( ReadResource(WriteResource(r, name, data).state, name), data)

Parameterized Unit Tests in Pex

Parameterized Unit Testing Getting Popular Parameterized Unit Tests (PUTs) commonly supported by various test frameworks.NET: Supported by.NET test frameworks – – – … Java: Supported by JUnit 4.X – Generating test inputs for PUTs supported by tools.NET: Supported by Microsoft Research Pex – Java: Supported by Agitar AgitarOne –

Parameterized Test-Driven Development Write/refine Contract as PUT Write/refine Code of Implementation Fix-it (with Pex ), Debug with generated tests Fix-it (with Pex ), Debug with generated tests Use Generated Tests for Regression Run Pex Bug in PUT Bug in Code failures no failures

Assert behavior of multiple test inputs Software Agitation in AgitarOne Code Software Agitation Observations on code behavior, plus Test Coverage data If an Observation reveals a bug, fix it If it describes desired behavior, click to create a Test Assertion Code Compile Review Agitate - Slide adapted from Agitar Software Inc.

Software Agitation in AgitarOne 18 Image from

Automated Test Generation 19  Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing  Instrument code to explore feasible paths  Example tool: Pex from Microsoft Research (for.NET programs) P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Proc. PLDI 2005 K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. In Proc. ESEC/FSE 2005 N. Tillmann and J. de Halleux. Pex - White Box Test Generation for.NET. In Proc. TAP 2008

void CoverMe(int[] a) { if (a == null) return; if (a.Length > 0) if (a[0] == ) throw new Exception("bug"); } a.Length>0 a[0]==123… T F T F F a==null T Constraints to solve a!=null a!=null && a.Length>0 a!=null && a.Length>0 && a[0]== Input null {} {0} {123…} Execute&Monitor Solve Choose next path Observed constraints a==null a!=null && !(a.Length>0) a==null && a.Length>0 && a[0]!= a==null && a.Length>0 && a[0]== Done: There is no path left. Dynamic Symbolic Execution in Pex

Automating Test Generation Method sequences – MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09], Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06] Environments e.g., db, file systems, network, … – DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11] – CloudApp Testing [Zhang et al. IEEE Soft 12] Loops – Fitnex [Xie et al. DSN ASE

Pex on MSDN DevLabs Incubation Project for Visual Studio Download counts (20 months) (Feb Oct ) Academic: 17,366 Devlabs: 13,022 Total: 30,388

Open Source Pex extensions Publications:

Writing Test Oracles  Learning Formal Methods!? Parameterized Unit Test = Unit Test with Parameters Separation of concerns – Data is generated by a tool – Developer can focus on functional specification void TestAdd(List list, int item) { Assume.IsTrue(list != null); var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count); } void TestAdd(List list, int item) { Assume.IsTrue(list != null); var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count); }

Automatic Test Generation  Human Assistance to Test Generation?! Running Symbolic PathFinder... … ===================================== ================= results no errors detected ===================================== ================= statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 … 25

Challenges Faced by Test Generation Tools  object-creation problems (OCP) - 65%  external-method call problems (EMCP) – 27% Total block coverage achieved is 50%, lowest coverage 16%. 26  Example: Dynamic Symbolic Execution/Concolic Testing  Instrument code to explore feasible paths  Challenge: path explosion

 A graph example from QuickGraph library  Includes two classes Graph DFSAlgorithm  Graph AddVertex AddEdge: requires both vertices to be in graph 00: class Graph : IVEListGraph { … 03: public void AddVertex (IVertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (IVertex v1, IVertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (IVertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } } [Thummalapenta et al. OOPSLA 11]

28  Test target: Cover true branch (B4) of Line 24  Desired object state: graph should include at least one edge  Target sequence: Graph ag = new Graph(); Vertex v1 = new Vertex(0); Vertex v2 = new Vertex(1); ag.AddVertex(v1); ag.AddVertex(v2); ag.AddEdge(v1, v2); DFSAlgorithm algo = new DFSAlgorithm(ag); algo.Compute(v1); 00: class Graph : IVEListGraph { … 03: public void AddVertex (IVertex v) { 04: vertices.Add(v); // B1 } 06: public Edge AddEdge (IVertex v1, IVertex v2) { 07: if (!vertices.Contains(v1)) 08: throw new VNotFoundException(""); 09: // B2 10: if (!vertices.Contains(v2)) 11: throw new VNotFoundException(""); 12: // B3 14: Edge e = new Edge(v1, v2); 15: edges.Add(e); } } //DFS:DepthFirstSearch 18: class DFSAlgorithm { … 23: public void Compute (IVertex s) {... 24: if (graph.GetEdges().Size() > 0) { // B4 25: isComputed = true; 26: foreach (Edge e in graph.GetEdges()) { 27:... // B5 28: } 29: } } } [Thummalapenta et al. OOPSLA 11]

Challenges Faced by Test Generation Tools  object-creation problems (OCP) - 65%  external-method call problems (EMCP) – 27% Total block coverage achieved is 50%, lowest coverage 16%. 29  Example: Dynamic Symbolic Execution/Concolic (Pex)  Instrument code to explore feasible paths  Challenge: path explosion

Example External-Method Call Problems (EMCP)  Example 1:  File.Exists has data dependencies on program input  Subsequent branch at Line 1 using the return value of File.Exists.  Example 2:  Path.GetFullPath has data dependencies on program input  Path.GetFullPath throws exceptions.  Example 3: String.Format do not cause any problem

Human Can Help! Object Creation Problems (OCP) Tackle object-creation problems with Factory Methods 31

Human Can Help! External-Method Call Problems (EMCP) Tackle external-method call problems with Mock Methods or Method Instrumentation Mocking System.IO.File.ReadAllText 32

State-of-the-Art/Practice Testing Tools Running Symbolic PathFinder... … ===================================== ================= results no errors detected ===================================== ================= statistics elapsed time: 0:00:02 states: new=4, visited=0, backtracked=4, end=2 search: maxDepth=3, constraints=0 choice generators: thread=1, data=2 heap: gc=3, new=271, free=22 instructions: 2875 max memory: 81MB loaded code: classes=71, methods=884 … Tools typically don’t communicate challenges faced by them to enable cooperation between tools and users. We typically don’t teach people how to cooperate with tools. 33 X. Xiao, T. Xie, N. Tillmann, and J. de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE

Coding Duels 1,206,095 clicked 'Ask Pex!'

Coding Duels Pex computes “semantic diff” in cloud code written in browser vs. secret reference implementation You win when Pex finds no differences secret

Behind the Scene of Pex for Fun Secret Implementation class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } Player Implementation class Player { public static int Puzzle(int x) { return x ; } class Test { public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); } behavior Secret Impl == Player Impl 36

Coding Duels Fun and Engaging Iterative gameplay Adaptive Personalized No cheating Clear winning criterion

Example User Feedback “It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!” “I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.” “I’m afraid I’ll have to constrain myself to spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.” Released since 2010 X

Coding Duel

Teaching and Learning

Coding Duels for Automatic Software Engineering Course

Coding Duels for Training Testing public static string Puzzle(int[] elems, int capacity, int elem) { if ((maxsize (capacity + 1))) return "Assumption Violation!"; Stack s= new Stack(capacity); for (int i = 0; i < elems.Length; i++) s.Push(elems[i]); int origSize = s.GetNumOfElements(); //Please fill in below test scenario on the s stack //The lines below include assertions to assert the program behavior PexAssert.IsTrue(s.GetNumOfElements() == origSize + 1); PexAssert.IsTrue(s.Top() == elem); PexAssert.IsTrue(!s.IsEmpty()); PexAssert.IsTrue(s.IsMember(elem)); return s.GetNumOfElements().ToString() + "; “ + s.Top().ToString() + "; “ + s.IsMember(elem).ToString() + "; " + s.IsEmpty(); } Set up a stack with some elements Cache values used in assertions

Usage Scenarios of Pex4Fun Massive Open Online Courses (MOOC): Challenges – Grading, addressed by Pex4Fun – Cheating [Open Challenge] Course assignments (students/professionals) – E.g., intro programming, software engineering Student/professional competitions – E.g., coding-duel competition at ICSE 2011 Assessment of testing/programming/problem solving skills for job applicants – Not just final results of problem solving but also process!

More Reading Nikolai Tillmann, Jonathan De Halleux, Tao Xie, Sumit Gulwani and Judith Bishop Teaching and Learning Programming and Software Engineering via Interactive Gaming In Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering Education (SEE), San Francisco, CA, May se13see-pex4fun.pdf

Conclusion Software testing is important and yet costly; needs automation Better Test Inputs: help generate new better test inputs – Generate method arguments – Generate method sequences Better Test Oracles: help generate better test oracles – Assert behavior of individual test inputs – Assert behavior of multiple test inputs Software Testing  Educational Gaming – 45

Example Industrial Developer Testing Tools Agitar AgitatorOne Parasoft Jtest Google CodePro AnalytiX dev-tools/codepro/doc/ dev-tools/codepro/doc/ SilverMark Test Mentor Microsoft Research Pex (for.NET) Microsoft Research Spec Explorer (for.NET) 46

Trends in Practice Regression Test Selection/Prioritization Cloud Computing for Test Execution, e.g., Crowdsourcing for Testing, e.g., Mocking Environments – Google: EasyMock – Microsoft VS: Fake/Moles Automatic Test Generation – Microsoft: Pex, SAGE us/um/people/pg/

Q & A Thank you! contact: Acknowledgments: NSF grants CCF , CCF , CNS , CNS , a Microsoft Research SEIF Award, and a Microsoft Research Award.

Automated Combinatorial Testing Goals – reduce testing cost, improve cost-benefit ratio Accomplishments – huge increase in performance, scalability, 200+ users, most major IT firms and others Also non-testing applications – modelling and simulation, genome

Failure-triggering Interactions Additional studies consistent > 4,000 failure reports analyzed Conclusion: failures triggered by few variables

NIST ACTS Tool Covering array generator Coverage analysis - what is the combinatorial coverage of existing test set?.NET configuration file generator Fault characterization - ongoing Current users approximately 200 users as of July 2009, in IT, defense, finance, telecom, and many other industries

Defining a New System

Variable Interaction Strength

Constraints

Covering Array Output