Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California,

Slides:



Advertisements
Similar presentations
1 Verification by Model Checking. 2 Part 1 : Motivation.
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Modular and Verified Automatic Program Repair Francesco Logozzo, Thomas Ball RiSE - Microsoft Research Redmond.
CS 4700 / CS 5700 Network Fundamentals
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Chapter 1: An Overview of Computers and Programming Languages
Programming Types of Testing.
CS 355 – Programming Languages
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
An Automata-based Approach to Testing Properties in Event Traces H. Hallal, S. Boroday, A. Ulrich, A. Petrenko Sophia Antipolis, France, May 2003.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
1 HOIST: A System for Automatically Deriving Static Analyzers for Embedded Systems John Regehr Alastair Reid School of Computing, University of Utah.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Normal forms for Context-Free Grammars
Chapter 1: An Overview of Computers and Programming Languages
Guide To UNIX Using Linux Third Edition
Programming Fundamentals (750113) Ch1. Problem Solving
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 1: An Overview of Computers and Programming Languages C++ Programming:
/* iComment: Bugs or Bad Comments? */
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 1: An Overview of Computers and Programming Languages Updated by: Dr\Ali-Alnajjar.
Your Interactive Guide to the Digital World Discovering Computers 2012.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
An Introduction to MBT  what, why and when 张 坚
Chapter 5: Control Structures II (Repetition)
CHAPTER 5: CONTROL STRUCTURES II INSTRUCTOR: MOHAMMAD MOJADDAM.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Design and Programming Chapter 7 Applied Software Project Management, Stellman & Greene See also:
Bug Localization with Machine Learning Techniques Wujie Zheng
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
What is software testing? 1 What are the problems of software testing? 2 Time is limited Applications are complex Requirements are fluid.
Software Engineering Research paper presentation Ali Ahmad Formal Approaches to Software Testing Hierarchal GUI Test Case Generation Using Automated Planning.
By Ian Jackman Davit Stepanyan.  User executed untested code.  The order in which statements were meant to be executed are different than the order.
CS 363 Comparative Programming Languages Semantics.
Strauss: A Specification Miner Glenn Ammons Department of Computer Sciences University of Wisconsin-Madison.
Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications.
CS 158A1 1.4 Implementing Network Software Phenomenal success of the Internet: – Computer # connected doubled every year since 1981, now approaching 200.
Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms PPSN’14 September 13, 2014 Daniil Chivilikhin PhD student ITMO.
Graph Theory Techniques in Model-Based Testing
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research Mining Specifications (lots of) code  specifications of correctness.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Deriving formal specifications (almost) automatically Glenn Ammons and Ras Bodik and James R. Larus.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
INTRODUCTION TO COMPUTER PROGRAMMING(IT-303) Basics.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 1: An Overview of Computers and Programming Languages.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Department of Software & Media Technology
Algorithms and Problem Solving
Design and Programming
Strauss: A Specification Miner
Chapter 1: An Overview of Computers and Programming Languages
Programming Fundamentals (750113) Ch1. Problem Solving
Programming Fundamentals (750113) Ch1. Problem Solving
Mock Object Creation for Test Factoring
CS367 Intro to Data Structures
Programming Fundamentals (750113) Ch1. Problem Solving
Programming Fundamentals (750113) Ch1. Problem Solving
CSE 1020:Software Development
CHAPTER 6 Testing and Debugging.
Presentation transcript:

Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California, Berkeley James R. Larus, Microsoft Research POPL 2002

Motivation Formal verification is a promising alternative to software testing But Verifiers will be of little use without enough correctness specifications to be verified

The Assumption Common behavior is (often) correct behavior. If we can identify common behavior we can produce correct specifications, even from programs that contain errors.

A Program Using socket API 1 int s = socket(AF_INET, SOCK_STREAM, 0); 2 … 3 bind(s, &serv_addr, sizeof(serv_addr)); 4 … 5 listen(s, 5); 6 … 7 while (1) { 8 int ns = accept(s, &addr, &len); 9 if (ns < 0) break; 10 do { 11 read(ns, buffer, 255); 12 … 13 write(ns, buffer, size); 14 if (cond1) return; 15 } while (cond2) 16 close(ns); 17 } 18 close(s);

An Example Trace 1 socket(domain = 2, type = 1, proto = 0, return = 7) 2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0) 3 listen(so = 7, backlog = 5, return = 0) 4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8) 5 read(fd = 8, buf = 0x400320, len = 255, return = 12) 6 write(fd = 8, buf = 0x400320, len = 12, return = 12) 7 read(fd = 8, buf = 0x400320, len = 255, return = 7) 8 write(fd = 8, buf = 0x400320, len = 7, return = 7) 9 close(fd = 8, return = 0) 10 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 10) 11 read(fd = 10, buf = 0x400320, len = 255, return = 13) 12 write(fd = 10, buf = 0x400320, len = 13, return = 13) 13 close(fd = 10, return = 0) 14 close(fd = 7, return = 0)

Design Decisions 1.Learn from traces not from source Contain fewer bugs 2.Take a “vote” on what the common program behavior is. the high-probability core encodes the frequently followed protocol.

Mining System Run Tracer Automaton learner Scenario extractor Flow dependence annotator Instrumented program Traces Program Test inputs Annotated traces Scenario seed Abstract scenario strings Specifications

I - the set of all traces of interaction with an API or ADT. C I - the set of all correct traces of interaction. T - an unlabelled training set of interaction traces. Find an automaton A that generates exactly the traces in C. The (unsolvable) Problem

Restriction 1 C must be a regular language. –Model checkers require finite-state specifications. –Algorithms for learning finite-state automatons are relatively well developed.

Interaction Scenarios LinkedList(n) malloc free malloc free malloc(return = O 1 ) malloc(return = O 2 ) free(p = O n ) malloc(return = O n ) free(p = O 2 ) free(p = O 1 ) malloc(return = O 1 ) free(p = O 1 ) O1{O1{ malloc(return = O 2 ) free(p = O 2 ) O2{O2{ malloc(return = O n ) free(p = O n ) On{On{ malloc(return = O std ) free(p = O std ) O1{O1{ malloc(return = O std ) free(p = O std ) O2{O2{ malloc(return = O std ) free(p = O std ) On{On{

The Problem – Take 2 I S - the set of all interaction scenarios with an API or ADT that manipulate no more than k data objects. C S I S - the regular set of all correct scenarios. T S - an unlabelled training set of interaction scenarios from I S. Find a finite-state automaton A S that generates exactly the scenarios in C S.

Restriction 2 - Linking T s and C s T S = c 0,c 1,… be an infinite sequence of elements from C S in which each element of C S occurs at least once. for each n > 0: c 0,c 1,… c n A S n for some N ≥ 0, A S N generates exactly the scenarios in C S and A S n = A S N for all n ≥ N. A S 0,A S 1,… identifies C S in the limit.

The Probabilistic Approach I s – as before. M – a target PFSA and P M a distribution over I s that M generates. “Efficiently” find a PFSA M’ such that its distribution P M’ is an ε-good approximation of P M.

Mining System Run Tracer Automaton learner Scenario extractor Flow dependence annotator Instrumented program Traces Program Test inputs Annotated traces Scenario seed Abstract scenario strings Specifications

Tracer 1.C stdio replacement (requires recompilation) 2.Executable editing 1 socket(domain = 2, type = 1, proto = 0, return = 7) 2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0) 3 listen(so = 7, backlog = 5, return = 0) 4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8) skeleton : interaction(attribute 0,…, attribute n )

Flow Dependence Type inference Dependence analysis Untyped trace with dependencies Traces Annotated traces

Dependence Analysis Definers: socket.return bind.so listen.so accept.return close.fd Takes a list of attributes that define or use objects (manually created). Creates a flow dependence between users and definers. Users: bind.so listen.so accept.so read.fd write.fd close.fd

Type Inference If there exists a flow dependency between two attributes then typing gives these attributes the same type. Type(socket.return)=T0 Type(bind.so)=T0 Type(listen.so)=T0 Type(accept.so)=T0 Type(accept.return)=T0 Type(read.fd)=T0 Type(write.fd)=T0 Type(close.fd)=T0

Scenario Extraction Simplification Extraction scenarios simplified scenarios Annotaed traces Standardization Scenario seeds Abstract scenario strings

Extraction A scenario is a set of interactions related by flow dependences. 1 socket(domain = 2, type = 1, proto = 0, return = 7) 2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0) 3 listen(so = 7, backlog = 5, return = 0) 4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8) 5 read(fd = 8, buf = 0x400320, len = 255, return = 12) 6 write(fd = 8, buf = 0x400320, len = 12, return = 12) 7 read(fd = 8, buf = 0x400320, len = 255, return = 7) 8 write(fd = 8, buf = 0x400320, len = 7, return = 7) 9 close(fd = 8, return = 0)

Simplification Eliminate all interaction attributes that do not carry a flow dependence. 1 socket(return = 7) 2 bind(so = 7) 3 listen(so = 7) 4 accept(so = 7, return = 8) [seed] 5 read(fd = 8) 6 write(fd = 8) 7 read(fd = 8) 8 write(fd = 8) 9 close(fd = 8)

Standardization 1 socket(return = x0:T0) 2 bind(so = x0:T0) 3 listen(so = x0:T0) 4 accept(so = x0:T0, return = x1:T0) [seed] 5 read(fd = x1:T0) 7 read(fd = x1:T0) 6 write(fd = x1:T0) 8 write(fd = x1:T0) 9 close(fd = x1:T0) 1.Naming: replaces attribute values with symbolic variables. 2.Reordering (A) (B) (C) (D) (E) (F) (G)

Automaton Learning 1.OTS learner learns a PFSA 2.A corer removes infrequently traversed edges and converts the PFSA into an NFA. start final

Specification Automaton for the Socket Protocol socket(return = x) bind(so = x) listen(so = x) accept(so = x, return = y) read(fd = y)write(fd = y) close(fd = x) close(fd = y)

Experimental Results Analyzed traces from programs that use the Xlib and X Toolkit Intrinsics libraries for the X11 windowing system. Traces were generated manually Compare mined specification to Interclient Communication Conventions Manual (ICCCM) rules.

Experimental Results A small and buggy training set prevented the miner from discovering the rule. solution: an expert chooses correct traces as the training set.

Benefits Exploits the massive programmers' effort that is reflected in the code (and nowhere else). Offers convenience and insights. It is easier to approve a mined formal specification than to write one.

Conclusion Introduced a (semi) automatic machine- learning approach for discovering formal specifications. Reduced the problem to learning regular languages. Initial experience is promising.