Maikel Leemans Wil M.P. van der Aalst
Process Mining in Software Systems 2 System under Study (SUS) Functional perspective Focus: User requests Functional perspective Focus: User requests
Process Mining in Software Systems System under Study (SUS) Instrument Aspects Instrumented SUS 3
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data 4 System under Study (SUS)
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log 5 System under Study (SUS) Business Transactions Traces: User requests Business Transactions Traces: User requests
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 6 System under Study (SUS)
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 7 System under Study (SUS)
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining InstrumentationCollect Data Discover Business Transactions Related Work and Assumptions Evaluation 8 System under Study (SUS)
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining InstrumentationCollect Data Discover Business Transactions Evaluation System under Study (SUS) Related Work and Assumptions 9
Related work – Current trends (Majority of papers) 10 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Literature Survey Comparison of various reverse engineering techniques Investigating current trend, plus advantages and disadvantages
Related work – Current trends (Majority of papers) 11 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10]
Related work – Current trends (Majority of papers) 12 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8]
Related work – Current trends (Majority of papers) 13 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9]
Related work – Current trends (Majority of papers) 14 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9] UML Sequence Diagram [1,2,3,4,5,6,7,10]
Related work – Dynamic Analysis Techniques 15 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity [6] Control-flow Sequence Diagram UML Sequence Diagrams
Related work – Dynamic Analysis Techniques 16 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity [7] Network Sequence Diagram UML Sequence Diagrams
Related work – Dynamic Analysis Techniques 17 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity UML Sequence Diagrams Performance? Right amount of detail? [7] Network Sequence Diagram
Related work – Dynamic Analysis Techniques 18 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity [9] Performance (based on # calls) Performance statistics
Related work – Dynamic Analysis Techniques 19 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Performance statistics Context? Bottlenecks? [9] Performance (based on # calls)
Assumptions – compared to related work 20 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Java [1,4,5,6,8,10] Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9] UML Sequence Diagram [1,2,3,4,5,6,7,10]
Assumptions – compared to related work 21 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Instrumentation (e.g., AspectJ) [1,2,3,6,8] No support [1,2,3,4,5,8,9] UML Sequence Diagram [1,2,3,4,5,6,7,10] Any instrumentable language
Assumptions – compared to related work 22 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity UML Sequence Diagram [1,2,3,4,5,6,7,10] Any instrumentable language Instrumentation (Joinpoint-Pointcuts) No support [1,2,3,4,5,8,9]
Instrumentation (Joinpoint-Pointcuts) Assumptions – compared to related work 23 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity UML Sequence Diagram [1,2,3,4,5,6,7,10] Any instrumentable language Supported (Communication-based)
Instrumentation (Joinpoint-Pointcuts) Assumptions – compared to related work 24 Target Language and Assumptions Information Retrieval Distributed and Event Correlation Output Model and Granularity Any instrumentable language Supported (Communication-based) Event Logs (and process models)
Instrumentation (Joinpoint-Pointcuts) Assumptions – other considerations 25 Any instrumentable language Supported (Communication-based) Event Logs (and process models) Assume global clock (e.g., NTP) Communication: Across process Across threads Other considerations Focus on User requests
Assumptions – other considerations 26 Event Log Process Models Assume global clock (e.g., NTP) Communication: Across process Across threads Other considerations Focus on User requests
Assume global clock (e.g., NTP) Communication: Across process Across threads Other considerations Focus on User requests Assumptions – other considerations 27 Event Log Process Models Context Performance
Process Mining in Software Systems Stream of Event Data Event Log Structure of SUS Process Mining Collect Data Discover Business Transactions Related Work and Assumptions Evaluation Instrument Aspects Instrumented SUS Instrumentation 28 System under Study (SUS)
Distributed System Instrumenting the System under Study 29 AB Communication channel System under Study (SUS)
Distributed System Instrumenting the System under Study 30 AB Communication channel System under Study (SUS) Tracing Instrumented SUS
Distributed System Instrumenting the System under Study 31 AB Communication channel System under Study (SUS) Instrumented SUS Tracing Instrument Aspects
function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 32
function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 33 Joinpoint
function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 34 Pointcut: function *(int); Pointcut
function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model 35 Pointcut: function *(int); Insert Before: logEvent(“start”); Insert After: logEvent(“complete”); Joinpoint Pointcut Aspect
function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Aspects: The Joinpoint-Pointcut Model Pointcut: function *(int); Insert Before: logEvent(“start”); Insert After: logEvent(“complete”); 36 Joinpoint Pointcut Aspect
Aspects: The Joinpoint-Pointcut Model 37 Common pointcut patterns: Specific interfaces, methods (Low-level) communication function H() {... } function K() {... } function G(int x) {... } function F(int x) { G(x); H(); K(); } Pointcut: function *(int); Insert Before: logEvent(“start”); Insert After: logEvent(“complete”); Joinpoint Pointcut Aspect
Process Mining in Software Systems Instrument Aspects Instrumented SUS Event Log Structure of SUS Process Mining InstrumentationDiscover Business Transactions Related Work and Assumptions Evaluation System under Study (SUS) Stream of Event Data Collect Data 38
Distributed System Collecting Data from Software Systems 39 Tracing A B Communication channel
Distributed System Collecting Data from Software Systems 40 AB Communication channel Logging server Event stream Event Log produces Tracing
Event Log Distributed System Scenario Triggering: Real-Life Behavior 41 Tracing Logging server Event stream produces AB User User request Communication channel
Generated Stream of Event Data Node A Node B func. F func. G func. M func. N 42
Generated Stream of Event Data Node A Node B func. F func. G func. M func. N Event Data: Timestamp ms (global clock) Joinpoint name of point in code Lifecycle start, complete (call, return) Node id (≈ location) Node Instance id (≈ execution thread) Resource communication data Event Data: Timestamp ms (global clock) Joinpoint name of point in code Lifecycle start, complete (call, return) Node id (≈ location) Node Instance id (≈ execution thread) Resource communication data 43
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data InstrumentationCollect Data Related Work and Assumptions Evaluation System under Study (SUS) Event Log Structure of SUS Process Mining Discover Business Transactions 44
Collection of events from multiple streams Node A Node B func. F func. G func. M func. N 45 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Create event intervals (start, end) Node A Node B func. F func. G func. M func. N 46 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Node B Cluster events (single node) Node A func. F func. G func. M func. N Same node: Interval containment Same node: Interval containment 47 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Cluster events (across nodes) Node A Node B func. F func. G func. M func. N res. R res. R’ Related resources indicate communication channel 48 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Node A Node B Cluster events (across nodes) func. F func. G func. M func. N res. R res. R’ Related resources acquired at the same time (intersection) 49 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource Node A Node B Event traces func. F func. G func. M func. N res. R res. R’ 50 A single trace
Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource Node A Node B Business Transactions func. F func. G func. M func. N res. R res. R’ Maximal trace User request 51 A single trace
Node A Node B Concurrency – Multiple node instances func. F func. G func. M func. N res. R res. R’ 52 Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Resulting Event Log Node A Node B func. F func. G func. M func. N res. R res. R’ 53 Event Log Structure of SUS Process Mining Event Data: Timestamp Joinpoint Lifecycle Node Node Instance Resource
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining InstrumentationCollect Data Discover Business Transactions Related Work and Assumptions System under Study (SUS) Evaluation 54
Case study – Pet catalog 55 Pet Catalog Webserver (Glassfish) Database (MySQL) User Browser Webpage request TCP/IP
Case study – Pet catalog Analysis questions 1) High-level end-to-end process? 2) Main bottlenecks? 56
Case study – Approach 57 Instrumentation decisions Process Mining Analysis questions 1) High-level end-to-end process? 2) Main bottlenecks? 1)High-level end-to-end process? 2)Main bottlenecks?
Process Mining Case study – Specifying input pointcuts 58 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Defined pointcuts targeting Network communication Database interface Webserver interface (i.e., servlets)
Process Mining Case study – Specifying input pointcuts 59 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Defined pointcuts targeting Network communication Database interface Webserver interface (i.e., servlets) Pointcut predicates HasInterface javax.persistence.EntityManager Communication java.net.Socket javax.servlet.* javax.faces.*
Case study – Process Discovery 60 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Main question What sequence of operations are needed to complete a user request? Event LogProcess Mining
Case study – Inductive visual miner (process tree) 61
Case study – Inductive visual miner (process tree) 62
Case study – Inductive visual miner (process tree) 63
Case study – Conversion between formal models 64 from process tree to petri net
Case study – Pet catalog 65 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? Process Mining
Case study – Performance analysis in model context 66 Align Event Log and Petri Net Analyze throughput and sync. time
Case study – Performance analysis in model context 67 Align Event Log and Petri Net Analyze throughput and sync. time
Case study – Performance analysis in model context 68 “Top-level” function, represents total time Align Event Log and Petri Net Analyze throughput and sync. time
Case study – Performance analysis in model context 69 “Top-level” function, represents total time Bottleneck in comm. with database (read) Align Event Log and Petri Net Analyze throughput and sync. time
Case study – Conclusion 70 Instrumentation decisions 1)High-level end-to-end process? 2)Main bottlenecks? High-level process Main bottleneck comm. with database (read) Process Mining
Case study – Hadoop MapReduce 71 Hadoop YARN Resource Manager User Client RPC Node Manager Container Image source: ibm.com
Case study – Hadoop MapReduce 72 Align Event Log and Petri Net Analyze throughput and sync. time The “Map” in “MapReduce” The “Reduce” in “MapReduce”
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 73 System under Study (SUS)
Process Mining in Software Systems Instrument Aspects Instrumented SUS Stream of Event Data Event Log Structure of SUS Process Mining 1) Instrumentation2) Collect Data 3) Discover Business Transactions 74 System under Study (SUS)
Future work Evaluation Investigate more complex, distributed systems (Hadoop) Investigate instrument impact Process Discovery Leverage “nested” lifecycle information Make location data more explicit in models 75 Event LogProcess Mining
Maikel Leemans Wil M.P. van der Aalst 76