Han-na Yang Rediscovering Workflow Models from Event-Based Data using Little Thumb.

Slides:



Advertisements
Similar presentations
From Local Patterns to Global Models: Towards Domain Driven Educational Process Mining Nikola Trčka Mykola Pechenizkiy.
Advertisements

Scaling. Scaling seeks to discover how varying the physical parameters of the stimulus affects the psychological parameters. In general, scaling is concerned.
Chapter 11 – Virtual Memory Management
CIS 581 Design and Verification of Information Systems (DVIS) lectures 3-4 b Two problems with current WFMS b Five perspectives on WFMS b Reference nets.
Jorge Muñoz-Gama Josep Carmona
1 Analysis of workflows : Verification, validation, and performance analysis. Wil van der Aalst Eindhoven University of Technology Faculty of Technology.
Process Models In this section, we focus on the control-flow perspective of processes. We assume that there is a set of activity labels.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Sequential Patterns & Process Mining Current State of Research Edgar de Graaf LIACS.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Chapter 4 Quality Assurance in Context
/faculteit technologie management 1 Process Mining: Organizational and Conformance Mining Algorithms Ana Karla Alves de Medeiros Ana Karla Alves de Medeiros.
Workflow Management Kap. 4. Analyzing Workflows Wil van der Aalst has copyrights to almost all figures in the following slideshow made by Lars Frank.
Goal and Scenario Validation: a Fluent Combination Chin-Yi Tsai.
/faculteit technologie management 1 Process Mining: Control-Flow Mining Algorithms Ana Karla Alves de Medeiros Ana Karla Alves de Medeiros Eindhoven University.
Statistical Issues in Research Planning and Evaluation
1 Perracotta: Mining Temporal API Rules from Imperfect Traces Jinlin Yang David Evans Deepali Bhardwaj Thirumalesh Bhat Manuvir Das.
A university for the world real R © 2009, Chapter 15 The Business Process Execution Language Chun Ouyang Marlon Dumas Petia Wohed.
Planning under Uncertainty
Joost Westra, Frank Dignum,Virginia Dignum Scalable Adaptive Serious Games using Agent Organizations.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
/faculteit technologie management Genetic Process Mining Ana Karla Medeiros Ton Weijters Wil van der Aalst Eindhoven University of Technology Department.
1 Workflow Management Systems : Functions, architecture, and products. Wil van der Aalst Eindhoven University of Technology Faculty of Technology Management.
1 Analysis of workflows a-priori and a-posteriori analysis Wil van der Aalst Eindhoven University of Technology Faculty of Technology Management Department.
1 Static vs dynamic SAGAs Ivan Lanese Computer Science Department University of Bologna/INRIA Italy.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
/faculteit technologie management Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst.
Discovering Coordination Patterns using Process Mining Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology.
/faculteit technologie management 1 Process Mining: Extension Mining Algorithms Ana Karla Alves de Medeiros Ana Karla Alves de Medeiros Eindhoven University.
Reliability, Validity, & Scaling
Data Mining CMPT 455/826 - Week 10, Day 2 Jan-Apr 2009 – w10d21.
Introduction to Statistical Inferences
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
S/W Project Management Software Process Models. Objectives To understand  Software process and process models, including the main characteristics of.
Model Transformations for Business Process Analysis and Execution Marlon Dumas University of Tartu.
Chapter 7 Structuring System Process Requirements
C ONFORMANCE C HECKING OF P ROCESSES B ASED ON M ONITORING R EAL B EHAVIOR Jason Ree 4/18/11 UNIST School of Technology Management.
Capacity analysis of complex materials handling systems.
Jorge Muñoz-Gama Universitat Politècnica de Catalunya (Barcelona, Spain) Algorithms for Process Conformance and Process Refinement.
Process Mining Control flow process discovery
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
مهندسی مجدد فرآیندهای تجاری
Chapter 7 Probability and Samples: The Distribution of Sample Means.
Inga ZILINSKIENE a, and Saulius PREIDYS a a Institute of Mathematics and Informatics, Vilnius University.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Petri nets refresher Prof.dr.ir. Wil van der Aalst
Process-oriented System Analysis Process Mining. BPM Lifecycle.
Chapter 10 Verification and Validation of Simulation Models
Chapter 7 The Practices: dX. 2 Outline Iterative Development Iterative Development Planning Planning Organizing the Iterations into Management Phases.
/faculteit technologie management PN-1 مهندسی مجدد فرآیندهای تجاری بخش دوم: مدلسازی فرآیندها به کمک Petri nets.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
/faculteit technologie management Workflow Mining: Current Status and Future Directions Ana Karla A. de Medeiros, W.M.P van der Aalst and A.J.M.M. Weijters.
Requirements Engineering Processes. Syllabus l Definition of Requirement engineering process (REP) l Phases of Requirements Engineering Process: Requirements.
Maikel Leemans Wil M.P. van der Aalst. Process Mining in Software Systems 2 System under Study (SUS) Functional perspective Focus: User requests Functional.
Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets) Cédric Favre(1,2), Hagen Völzer(1), Peter Müller(2)
Technology of information systems Lecture 5 Process management.
Process Mining – Concepts and Algorithms Review of literature on process mining techniques for event log data.
Chapter Nine Hypothesis Testing.
Process discovery Sander Leemans.
Selecting the Best Measure for Your Study
David Redlich, Thomas Molka, Wasif Gilani, Awais Rashid, Gordon Blair
Activity and State Transition Diagram
Patterns extraction from process executions
Concurrent Systems Modeling using Petri Nets – Part II
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Multi-phase process mining
3 mei 2019 Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst Ana Karla A. de Medeiros.
5 juli 2019 Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst Ana Karla A. de Medeiros.
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

Han-na Yang Rediscovering Workflow Models from Event-Based Data using Little Thumb

Introduction  Work flow management systems  Offer generic modeling and enactment capabilities for structured business processes  Too restrictive  Have problems dealing with change (=not flexible)  ex) Staffware, IBM MQSeries, COSA, etc.  Many problems are resulting from discrepancy between workflow design and workflow enactment

“Reverse the process”: Process Mining  We start by gathering information about the workflow processes as they take place. Do not start with a workflow design.  Any information system using transactional system or workflow management system will offer this information in some form  Assumption: it is possible to construct workflow logs with event data  Process mining  The method of distilling a structured process description from a set of real executions.  We focus on workflow processes with concurrent behavior  Detecting concurrency is one of our prime concerns  Distinguishing AND/OR splits/joins explicitly

Workflow Process Model  Workflows  ‘Case-based’: every piece of work is executed for a specific case  Case: handled by executing tasks in a specific order (ex. Insurance claim, mortgage)  Workflow process model  Specifies which tasks need to be executed and in what order  Routing elements: describe sequential, conditional, parallel and iterative routing

Petri nets OR-join OR-split AND-split AND-join  Tasks are modeled by transitions  Places and arcs model causal dependencies  Split & Join

WorkFlow net (WF-net)  A Petri net that models the control-flow dimension of a workflow  Focuses on the process perspective and abstract from the functional, organization, information and operation perspectives  Sound WF-nets  Termination is guaranteed  No dangling tokens are left behind  No dead task  Workflow log is sequence of events

A Heuristic Process Mining Technique  Four ordering relation in the α-algorithm (Let A, B be events, W a workflow log)  A>B: if and only if there is trace line in W in which event A is directly followed by B  A→B: dependency relation, B depends on A  A#B: non-parallel relation, no dependency between A and B  A ∥ B: parallel relation, used to detect the kinds of splits and joins  Heuristic mining technique  Less sensitive for noise and the incompleteness of logs than α-algorithm  Three mining steps ① The construction of a dependency/frequency table(D/F-table) ② The induction of a D/F-graph out of a D/F-table ③ The reconstruction of the WF-net out of the D/F-table and the D/F-graph

Step 1: Construction of the dependency/frequency table  For each task A these information is abstracted out of the workflow log ① The overall frequency of task (#A) ② The frequency of task A directly preceded by another task B (#B>A) ③ The frequency of A directly followed by another task B (#A>B) ④ A local metric that indicates the strength of the dependency relation between task A and another task B ($A→ L B) ⑤ A more global metric that indicates the strength of the dependency relation ($A→B)  $A→B-dependency counter  It is incremented with a factor (δ) n  Dependency fall factor(δ: delta) is [0.0 … 1.0]  n is the number of intermediary events between them  Therefore, if task B appears directly after task A then (δ) n =1(n=0).  $A→B-dependency counter decreases if the distance between tasks increases.

Step 1: Construction of the dependency/frequency table  Example  T6 is never directly preceded by T10 (#B>A=0)  T6 is often directly followed by T10 (#A>B=581) >

Step 2: Induction of dependency/frequency graphs  Heuristic rules  Four conditions demand that specific values of the D/F graph (#A>B, #B<A, $A→ L B, $A→B) are higher or lower than a certain threshold value(σ, N 1, N 2 )  Only task-pattern occurrences above a threshold frequency are reliable enough for our induction process  Formulating a rule that for each pair of events A and B takes the decision if they are in the dependency relation or not is not really necessary.  First (temporally) version of mining rule 1. given a task A:  A→ X if and only if X is the event for which DS(A,X) is maximal  Y→A if and only if Y is the event for which DS(Y,A) is maximal  Dependency score: DS(X,Y) = (($X→ L Y ) 2 + ($X → Y ) 2 ) / 2  New rule does not contain any parameter >

Step 2: Induction of dependency/frequency graphs  For each arc the dependency score(DS) is given and for each task the number of event occurrences in the log

Step 2: Induction of dependency/frequency graphs  The first version of mining rule 1 is updated  Mining rule 1 (definite version). Given a task A  Suppose X is the event for which DS(A,X)=M is maximal. Then A→Y if and only if DS(A,Y)<0.95*M  Suppose X is the event for which DS(X,A)=M is maximal. Then Y→A if and only if DS(Y,A)<0.95*M  Threshold value is(0.95) only one parameter and the parameter seems robust for noise and concurrent processes.

Step 3: Generating WF-nets from D/F-graphs  The types of the splits and joins are not represented in the D/F-graph  Useful information to indicate the type of a join and split  Information in the D/F-table: determine split or join  A to B AND C (AND-split): pattern B,C and pattern C,B can both appear  A to B OR C (OR-split): pattern B,C and pattern C,B will not appear  The frequency of the nodes in the D/F-graph  Used for the validation of the induced workflow model  After observations, apply the α-algorithm to translate this information into a WF-net

Little Thumb  A tool that attempts to induce a workflow model from a workflow log  The workflow log may contain errors(noise) and can be incomplete  Steps to analyze the loaded workflow log ① Already executed (D/F-table in the Figure 4) ② Induction of a D/F-graph out of D/F-table (Figure 5) ③ Use the information in the extended D/F-table to indicate join and split (Figure 6)  Check WF-net tab: possibility to validate the WF-net  First check: checks if the trace can be parsed by the WF-net  Second check: test out the frequency information of the events is in accordance with the structure of the minded WF-net

Little Thumb  Generate WF-log tab  It is possible to load a WF-net and to generate workflow logs, with or without noise  Select-events tab  We can concentrate our mining process on the most frequent events and neglect low frequent events

Little Thumb  The types of the splits and joins are not represented in the D/F-graph

Little Thumb

First Experiments  Six different free-choice workflow models  All models contain concurrent processes and loops  For each model we generate three random workflow logs with 1000event sequences  A workflow log without noise  One with 5% noise  A log with 10% noise  Four different types of noise  Delete the head of a event sequence  Delete the tail of a event sequence  Delete a part of the body  Interchange two random chosen events

First Experiments  Six noise free workflow logs results in six perfect D/F-graphs  If we add 5% or 10% noise to the work flow logs, the resulting D/F-graphs and WF-nets are still perfect >

Second experiments  Four elements strongly influence the behavior of a WF-net and/or the workflow mining process  The number of event types in the WF-net  12, 22, 32, 42 event types  The amount of material in the workflow log  100, 200, 600, 1000, 1400, 2000 trace lines  The amount of noise  5%, 10%, 20%, 50% noise  The unbalance res to the probability that enabled event will fire  Generate 480 different workflow logs by varying each of the above enumerated elements

Second experiments  Conclusion  Under all circumstances most dependency relations, the type of split and the type of joins are correctly found  Mining technique appears especially robust for the number of trace lines and the amount of unbalance  50% noise cause serious problems  Most errors have to do with short loops  An improvement of the heuristic rules for short loops seems necessary

ProM Tool (1)

ProM Tool (2)

ProM Tool (3)

ProM Tool (4)

Questions ?