1 David Lo 1,2 Siau-Cheng Khoo 2 Chao Liu 3 1 Singapore Management University 2 National University of Singapore 3 Microsoft Research, Redmond Mining Past-Time.

Slides:



Advertisements
Similar presentations
1 Verification by Model Checking. 2 Part 1 : Motivation.
Advertisements

Design by Contract.
Software Analysis at Philips Healthcare MSc Project Matthijs Wessels 01/09/2009 – 01/05/2010.
A Survey of Runtime Verification Jonathan Amir 2004.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
IPOG: A General Strategy for T-Way Software Testing
1 Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das.
ISBN Chapter 3 Describing Syntax and Semantics.
Automated creation of verification models for C-programs Yury Yusupov Saint-Petersburg State Polytechnic University The Second Spring Young Researchers.
1 Chapter 4 Dynamic Modeling and Analysis (Part I) Object-Oriented Technology From Diagram to Code with Visual Paradigm for UML Curtis H.K. Tsang, Clarence.
Formal Methods in Software Engineering Credit Hours: 3+0 By: Qaisar Javaid Assistant Professor Formal Methods in Software Engineering1.
Sequence Diagrams. Introduction A Sequence diagram depicts the sequence of actions that occur in a system. The invocation of methods in each object, and.
CS3773 Software Engineering Lecture 03 UML Use Cases.
1 Chapter 4 Dynamic Modeling and Analysis (Part I) Object-Oriented Technology From Diagram to Code with Visual Paradigm for UML Curtis H.K. Tsang, Clarence.
Chapter 12 ATM Case Study, Part 1: Object-Oriented Design with the UML
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Software Testing and Quality Assurance
Data Mining Association Analysis: Basic Concepts and Algorithms
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 15: Data Replication with Network Partitions & Transaction Processing Systems.
A Type System for Expressive Security Policies David Walker Cornell University.
David Lo Siau-Cheng Khoo Chao Liu DASFAA 2008 Efficient Mining of Recurrent Rules from a Sequence Database 1.
Hierarchical GUI Test Case Generation Using Automated Planning Atif M. Memon, Student Member, IEEE, Martha E. Pollack, and Mary Lou Soffa, Member, IEEE.
Describing Syntax and Semantics
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
End-to-End Design of Embedded Real-Time Systems Kang G. Shin Real-Time Computing Laboratory EECS Department The University of Michigan Ann Arbor, MI
1 Introduction Introduction to database systems Database Management Systems (DBMS) Type of Databases Database Design Database Design Considerations.
Dr. Kalpakis CMSC 461, Database Management Systems Introduction.
Romaric GUILLERM Hamid DEMMOU LAAS-CNRS Nabil SADOU SUPELEC/IETR ESM'2009, October 26-28, 2009, Holiday Inn Leicester, Leicester, United Kingdom.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Detection and Resolution of Anomalies in Firewall Policy Rules
Java Programming, 2E Introductory Concepts and Techniques Chapter 1 An Introduction to Java and Program Design.
These slides are designed to accompany Software Engineering: A Practitioner’s Approach, 7/e (McGraw-Hill 2009). Slides copyright 2009 by Roger Pressman.1.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Chapter 6 Requirements Engineering Process.
Mining Windows Kernel API Rules Jinlin Yang 09/28/2005CS696.
1 † Prog. Lang. & Sys. Lab Dept of Comp. Science National Uni. of Singapore Current: (Sch. of Info. Systems, Singapore Management Uni.) Efficient Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica TrajPattern: Mining Sequential Patterns from Imprecise Trajectories.
Software Engineering Research paper presentation Ali Ahmad Formal Approaches to Software Testing Hierarchal GUI Test Case Generation Using Automated Planning.
Yang Liu, Jun Sun and Jin Song Dong School of Computing National University of Singapore.
Software Architecture and Design Dr. Aldo Dagnino ABB, Inc. US Corporate Research Center October 23 rd, 2003.
Survey on Trace Analyzer (2) Hong, Shin /34Survey on Trace Analyzer (2) KAIST.
Faculty of Computer & Information
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
1 Qualitative Reasoning of Distributed Object Design Nima Kaveh & Wolfgang Emmerich Software Systems Engineering Dept. Computer Science University College.
Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.
Jinlin Yang and David Evans [jinlin, Department of Computer Science University of Virginia PASTE 2004 June 7 th 2004
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 Efficient Mining of Iterative Patterns for Software Specification Discovery David Lo † Joint work with: Siau-Cheng Khoo † and Chao Liu ‡ † Prog. Lang.
1 Graph Coverage (6). Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Section
CS212: Object Oriented Analysis and Design Lecture 32: Use case and Class diagrams.
Week 3: Requirement Analysis & specification
1 Lecture 15: Chapter 19 Testing Object-Oriented Applications Slide Set to accompany Software Engineering: A Practitioner’s Approach, 7/e by Roger S. Pressman.
Software Quality Assurance and Testing Fazal Rehman Shamil.
UML - Development Process 1 Software Development Process Using UML.
Formal Methods in Software Engineering1 Today’s Agenda  Mailing list  Syllabus  Introduction.
1 CEN 4020 Software Engineering PPT4: Requirement analysis.
Lecture Outline Monday 23 rd February (Week 4) 3 – 3:10pm Review of Requirements Eng. and use cases 3:10 – 3:40pm Exercise on Use Case 3:40-4:40 Class.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 5th Edition Copyright © 2015 John Wiley & Sons, Inc. All rights.
Requirement Elicitation Review – Class 8 Functional Requirements Nonfunctional Requirements Software Requirements document Requirements Validation and.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
10/23/ /23/2017 Presented at KDD’09 Classification of Software Behaviors for Failure Detection: A Discriminative Pattern Mining Approach David Lo1,
Understanding Asynchronous Interactions In Full-Stack JavaScript
Association Rule Mining
Chapter 28 Formal Modeling and Verification
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
MAPO: Mining and Recommending API Usage Patterns
Presentation transcript:

1 David Lo 1,2 Siau-Cheng Khoo 2 Chao Liu 3 1 Singapore Management University 2 National University of Singapore 3 Microsoft Research, Redmond Mining Past-Time Temporal Rules From Execution Traces

2 Issue on Software Specifications oDocumented specifications are often lacking, poor, outdated and incomplete Hard deadlines & `short-time-to-market’ Productivity == LOC or completed project High turn-over rate of IT professionals Difficulties & programmer’s reluctance in writing formal specs (Ammons et al., POPL’02, Yang et al., ICSE’06)

3 The Specification Problem o Contributes to high software costs Program comprehension = 50% of maintenance cost High maintenance cost = 90% total cost (Erlikh, 2000; Cimitile & Canfora, 2001) US GDP software component = billion USD. o Causes challenges in ensuring correctness of systems Difficulty in verifying correctness of systems US National Institute of Standards and Technology 59.5 Billions annual lost due to bugs

4 Specification Mining (SM) A process to discover protocols that a code exhibit, often through an analysis of its execution traces (ABL02 [POPL]) Benefits: Aid Program Comprehension and Maintenance Aid Program Verification RR01 [ICSE], CW98 [TOSEM] ABL02 [POPL], AMBL03 [PLDI], WML02 [ISSTA], AXPX07 [FSE] MP05 [ICEECS], LK06 [FSE] Automaton-based SM 01 Rule-based SM -> YEBBD06 [ICSE] LKL08 [DASFAA,JSME] Only future-time temporal rules are mined

5 Past-Time Temporal Rules Whenever a series of events pre occurs, previously, another series of events post happened before, denoted as: pre -> P post Among most-widely used temporal logic expressions (Dwyer,ICSE’99) - Past-time temporal exp. -> complex future-time temporal exp. (Laroussinie et al., TCS’95, LICS’02) - Not minable by existing algorithms mining future time rules (Yang et al. ICSE’06, Lo et al. JSME’08] - Many interesting properties are more intuitively expressed in past-time - Many interesting properties are non-symmetric Why Important ?

6 Sample Past-Time Rules o Whenever a file is used (read or written), it needs to be opened before. file_used -> P file_open o Whenever SSL_read is performed, SSL_init needs to be invoked before. ssl_read -> P ssl_init o Whenever a valid client request a non-sharable resource and the resource is not granted, previously the resource had been allocated to another client that requested it. request, not_granted -> P request, grant o Whenever money is dispensed from an ATM, previously, card was inserted, pin was entered, user was authenticated and account balance suffices. dispense -> P card, pin, authenticate, balance_suffice

7 Outline oMotivation and Introduction oConcepts −Past-Time LTL, Statistical Significance −Soundness and Completeness oMining Past-Time Rules −Mining Strategy, Pruning Properties −Removal of Redundant Rules −Mining Framework oPreliminary Experiments oDiscussion oRelated Work oConclusion & Future Work

8 Concepts

9 Past-Time Linear Temporal Logics (PLTL) o Linear Temporal Logic (LTL) − Logic that works on program paths − A path corresponds to an execution trace o Past-Time Linear Temporal Logic (PLTL) − Add LTL with past time operators − More succinct than LTL o Temporal operators under consideration − `G’ – Globally − `F’ – Once in the future − `X’ – Next (immediate) − `F -1 ‘ – Once in the past − `X -1 ’ – Previous (immediate)

10 PLTL- Examples o X -1 F -1 (file_open) Meaning: At a time in the past file is opened o G(file_read -> X -1 F -1 (file_open)) Meaning: Globally whenever file is read, at a time in the past file is opened o G((account_deducted ^ XF (money_dispensed)) -> (X -1 F -1 (balance_suffice ^ (X -1 F -1 (cash_requested ^ (X -1 F 1 (correct_pin^(X -1 F -1 (insert_debit_card))))))))) Meaning: Globally whenever one’s bank account is deducted and money is dispensed (from an ATM), previously user inserted debit card, entered correct pin, requested for cash and account balance suffices.

11 Notations and Scope of Mined Rules o Denote a past-time rule as pre -> P post o Sample mappings btw. rule representations and PLTL expressions o Scope of minable temporal expressions

12 Statistical Significance Metrics o Distinguish Significant Rules via Statistical Notions - Support: The number of traces supporting the premise pre - Confidence: The likelihood of the premise pre being preceded by the consequent post Rule: -> P Support: 2 Corres. to S1 and S2 Confidence: 100% All occurences of is preceded by Rule: -> P Support: 2 Confidence: 50% Sample Traces

13 Soundness and Completeness o Ensure Soundness and Completeness - With respect to input traces and specified thresholds o Sound All mined rules are statistically significant o Complete All statistically significant rules are mined or represented o Commonly used in data/pattern mining

14 Mining Past Time Temporal Rules

15 High-Level Mining Strategy o Mining Option 1: Check for all 2-event rules (n x n of them) for statistical significance. − Not scalable for rules of arbitrary lengths. o Our Mining Strategy: Consider mining as a search-space traversal for significant rules − Explore the search space depth-first − Identify significant rules o Employ pruning strategies to throw away search space containing insignificant rules o Detect search spaces containing redundant rules early during the mining process

16 Anti-Monotone Pruning Strategies Rx: a -> z ; sup(Rx) < min_sup a,b -> z a,b,c -> z a,c -> z a,b,d -> z …. Non- significant Rx: a -> z ; conf(Rx) < min_conf a -> z,b a -> z,b,c a -> z,c a -> z,b,d …. Non- significant Ry s PP P P P P P P P P P P P P

17 Detecting Redundant Rules Redundant rules are identified and removed early during mining process. a -> b a -> c a -> b,c a -> b,d …. Redundant iff sup and conf are the same Rx: a -> b,c,d Ry s P P P P P P P

18 Algorithm Steps o Step 1: Generate a pruned set of significant pre- conditions satisfying the minimum support threshold. o Step 2: For each pre-condition, find occurrences of pre in the trace database. o Step 3: For each pre-condition, generate a pruned set of significant post-conditions satisfying the minimum confidence threshold. o Step 4: Remove remaining rules that are redundant. Note that many/most redundant rules have been removed at step 1 and 3.

19 Mining Framework PART 1 PART 2 PART 3 PART 4 ProcessUser Input Intermediate Result Inst. Code Start End Instrumentation Code Trace Generation Test Suite Thresholds Trace Abstraction Mining Algorithm Display & User Selection Abst. Traces Mined Rules Selected Rules Verification Model Legend

20 Preliminary Experiments

21 Experiment Setups – JBoss Application Server o JBoss Application Server (JBoss AS) − One of the most widely used J2EE application server − Analyze the transaction and security component o Program Instrumentation & Trace Generation − Instrument the application using JBoss-AOP − Run regression tests from JBoss AS distribution o Transaction component − 2551 events, 64 unique events − min_sup: 25, min_conf: 90% − Mining time: 30 seconds, Mined non-redundant rules: 36 o Security component − 4115 events, 60 unique events − min_sup: 15, min_conf: 90% − Mining time: 2.5 seconds, Mined non-redundant rules: 4

22 A Rule from JBoss Transaction PremiseConsequent TransactionImpl.isDone()TxManagerLocator.getInstance() TxManagerLocator.locate() TxManagerLocator.tryJNDI() TxManagerLocator.usePrivateAPI() TxManager.getInstance() TxManager.begin() XidFactory.newXid() XidFactory.getNextId() XidImpl.getTrulyGlobalId() TransImpl.assocCurrentThread() … 5 events … TxManager.getTransaction() Whenever a transaction is checked for completion (premise), previously transaction manager is located (ev 1-4 consequent), transaction manager & impl are initialized (ev 5-6,10-12), ids are acquired (ev 7-9,13-15) and transaction object is obtained from the manager (ev 16). P

23 A Rule from JBoss Security PremiseConsequent SimplePrincipal.toString() SecAssoc.getPrincipal() SecAssoc.getCredential() SecAssoc.getPrincipal() SecAssoc.getCredential() XLoginConfImpl.getConfEntry() PolicyConfig.get() XLoginConfImpl$1.run() AuthenInfo.copyAppConfEntry() AuthenInfo.getName() ClientLoginModule.initialize() ClientLoginModule.login() ClientLoginModule.commit() SecAssocActs.setPrincipalInfo() SetPrincipalInfoAction.run() SecAssocActs.pushSubjectContext() SubjectThreadLocalStack.push() Whenever principal and credential info is required (the premise), previously config. info is checked to determine the auth. service availability (ev 1-5), actual authentication events are invoked (ev 6-8) and principal info is bound to the subject (ev 9-12) P

24 Discussions o Setting min-sup/conf threshold − Appropriate values depend on application − Mining as an iterative process o Sound and Complete − With respect to trace and specified thresholds − If trace is not complete or buggy so does the results − Confidence provide a measure of tolerance to buggy traces o Scalability − Algorithm works better with many shorter traces than one very long trace − It’s better to split a trace to sub-traces − Focus on immediate inter-component interaction (Mariani et al., ICSE’08) − Trace abstraction (Ammons et al., POPL’02)

25 Related Work o Daikon − Complement Daikon by mining temporal constraints o Mining Automata − Many work: ABL02, RR01, MP05, AXPX07, LK06, … − Diff: Focus on statistically significant property rather than overall behavior o Mining Future-Time Temporal Rules − Many work: YEBBD06, LKL08, … − Diff: Mining past-time temporal rules o Mining Sequence Diagram: BLL06, LMK07, LM08,... o Mining from Code: RGJ07, WN05, … o Data Mining: S99, AS94, YHA03, WH04, LKL07, …

26 Conclusion o Propose a new approach to mine past-time temporal rules using dynamic analysis, not minable by existing tools: o Address the problem of runtime costs by employing smart pruning strategies. − Throw away insignificant rules en-masse − Throw away redundant rules en-masse o Preliminary experiments on traces of JBoss AS show utility of the technique to discover program behavioral rules/specifications Whenever a series of events pre occurs, previously, another series of events post happened before, denoted as: pre -> P post

27 Future Work o User Guided Mining − Let user provide more information to the mining process aside from the significance thresholds − Mining Scenario-Based Triggers and Effects (- ASE’08 – to-appear) – Mining Sequence Diagram o Mining more complex LTL expressions − Incorporating both future and past-time temporal rules o Improving the scalability of the technique − Abstraction technique − Pruning strategies o Experimenting with more case studies

28 Comments ? Questions ? Advices ? Thank you