Presentation is loading. Please wait.

Presentation is loading. Please wait.

Process-oriented System Analysis Process Mining. BPM Lifecycle.

Similar presentations


Presentation on theme: "Process-oriented System Analysis Process Mining. BPM Lifecycle."— Presentation transcript:

1 Process-oriented System Analysis Process Mining

2 BPM Lifecycle

3 Motivation Up until now: Designed or pre-defined models Assumption that they are appropriate Process Mining Consideration of information from the execution of proceses This is covered in log data Logs Sequence of log entries, which capture events in a company that relate to processes

4 Log entries Examples of log entries Check Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57 Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24 Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18 Function ContactCustomer(c1987, PromoMailing) completed on 12.11.2010 at 9:24:10 Function StoreCustomerData(„Miller“, c1988, „Osnabrück“) completed on 12.11.2010 at 9:26:08 Check Invoice for Invoice No. 4568 completed on 12.11.2010 at 9:26:38 Function ContactCustomer(c1988, PromoMailing) completed on 12.11.2010 at Send 9:27:32

5 Logs bear valuable information Logs bear valuable information to answer questions like When and how many process instances have been executed? Are there recurring patterns in the execution of activities? Can process models be derived from the data? Which paths of execution are used how often in the process models? Are there paths which are never taken?

6 Process Discovery Process Discovery is a technique for deriving a process model from log data Input: execution logs as ordered lists of activities with time stamp and case id Output: process model which could have generated the execution logs The case id is often not directly covered in the data, and needs to be generated in pre-processing

7 Process Conformance Process Conformance is a technique to analyze the relationship between log data and process models Input: Logs and process model Output: information on the relationship, e.g. fitness

8 Overview

9 Execution Logs Assumption Execution log defines complete order of events, which can all be related to process activities All events in the execution log relate to process instances of the considered process Hint Often log entries refer to different process models This warrants filtering activities Abstraction Techniques often work on abstraction of logs Focus on case id and activities

10 Execution Log Format Log format (caseID, activity) Example Check Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57 Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24 Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18 Resulting Log (4567, Check Invoice), (c1987, StoreCustomerData), (4567, Send Invoice), etc.

11 Execution Log Further abstraction A‘s and B‘s (case id, task id) Additional information Event type, time, resource, data Not considered here Assumption Activity execution captured by one event No intermediate activities case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

12 The Alpha Algorithm

13 Process Discovery Algorithms Simplest Algorithm: The α – Algorithm Relatively simple, some properties can be proofed Affected by Noise, therefore not first choice in practice Noise refers to incomplete or erroneous logs Furthermore, the α+(+) – Algorithms α+ and α++ are extensions to the α – Algorithm for recognizing more fine-granular structure in the process model Also affected by Noise Finally, techniques for dealing with Noise

14 Definitions Let T be a set of activities (Tasks) and T * the set of all sequences of arbitrary length over T, then we have: σ  T * is called execution sequence, if all activities in σ belong to the same process instance W  T * is called execution log (workflow log) Assumptions In each process model, each activity appears at most once Each direct neighbor relation between activities is represented at least once

15 Execution Logs case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

16 Execution Logs case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D Execution sequences: Case 1: ABCD Case 2: ACBD Case 3: ABCD Case 4: ACBD Case 5: EF Resulting workflow log: W = {ABCD, ACBD, EF}

17 Order relations Log based order relations for pairs of activities a, b  T in a workflow log W: Direct successor a > w b i.e. in an execution sequence b directly follows a Causality a  w b i.e. a > w b and not b > w a Concurrency a ║ w b i.e. a > w b and b > w a Exclusiveness a  w b i.e. not a > w b and not b > w a Activity pairs which never succeed each other

18 case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D W = {ABCD, ACBD, EF} Direct successor Causality Concurrency Execution log analysis

19 case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D A>B A>C B>C B>D C>B C>D E>F ABACBDCDEFABACBDCDEF B||C C||B 1)2)3) W = {ABCD, ACBD, EF} Direct successor Causality Concurrency Execution log analysis

20 α-Algorithm The idea is to utilize order relations for deriving a workflow net that is compliant with these relations Precisely, each order relation results in a petri net fragment, which imposes the respective relationship

21 α-Algorithm Idea (a) a  ba  b 

22 α-Algorithm Idea (b) a  b, a  c and b # c 

23 α-Algorithm Idea (c) b  d, c  d and b # c 

24 α-Algorithm Idea (d) a  b, a  c and b || c 

25 α-Algorithm Idea (e) b  d, c  d and b || c 

26 The Alpha-Algorithm (simplified) 1. Identify the set of all tasks in the log as T L. 2. Identify the set of all tasks that have been observed as the first task in some case as T I. 3. Identify the set of all tasks that have been observed as the last task in some case as T O. 4. Identify the set of all connections to be potentially represented in the process model as a set X L. Add the following elements to X L : a. Pattern (a): all pairs for which hold a→b. b. Pattern (b): all triples for which hold a→(b#c). c. Pattern (c): all triples for which hold (b#c)→d. Note that triples for which Pattern (d) a→(b||c) or Pattern (e) (b||c)→d hold are not included in X L.

27 The Alpha-Algorithm (cont.) 5. Construct the set Y L as a subset of X L by: a. Eliminating a→b and a→c if there exists some a→(b#c). b. Eliminating b→c and b→d if there exists some (b#c)→d. 6. Connect start and end events in the following way: a. If there are multiple tasks in the set T I of first tasks, then draw a start event leading to an XOR-split, which connects to every task in T I. Otherwise, directly connect the start event with the only first task. b. For each task in the set T O of last tasks, add an end event and draw an arc from the task to the end event.

28 The Alpha-Algorithm (cont.) 7. Construct the flow arcs in the following way: a. Pattern (a): For each a→b in Y L, draw an arc a to b. b. Pattern (b): For each a→(b#c) in Y L, draw an arc from a to an XOR- split, and from there to b and c. c. Pattern (c): For each (b#c)→d in Y L, draw an arc from b and c to an XOR-join, and from there to d. d. Pattern (d) and (e): If a task in the so constructed process model has multiple incoming or multiple outgoing arcs, bundle these arcs with an AND-split or AND-join, respectively. 8. Return the newly constructed process model.

29 α-Algorithm Example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

30 α-Algorithm Example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D  (W): α-Algorithm

31 Log Completeness Level of completeness required for a log Assume for the execution sequence EF, there is a log missing Then, the correct process model cannot be derived Basic assumption: each execution sequence must be part of the log Consequence: the complete behaviour is visible Problem: amount of required instances grows dramatically Example: 10 activities are executed in parallel Amount of potential execution sequences: 10! = 3.628.800

32 Log Completeness Result For the α-Algorithm it is sufficient to have completeness in terms of the successor relationship ( > w ) Reason All other relations are derived from direct successorship Interpretation Each time two activities may succeed each other, this must be visible in at least one execution sequence Hint In case of highly concurrent process models, this reduces the amount of required execution sequences dramatically

33 Summary Execution Logs Process Mining using the Alpha-Algorithm


Download ppt "Process-oriented System Analysis Process Mining. BPM Lifecycle."

Similar presentations


Ads by Google