Patterns extraction from process executions 19 Feb 2015 Laura Genga
Outline Introduction Approach Experiments Conclusion and future works Building Instance Graphs Patterns Extraction Experiments BPI2013challenge CoseLog Conclusion and future works
Introduction Many real world domains are characterized by processes with little structure Typical process discovery approaches have problems when dealing with such processes «Spaghetti» models
Spaghetti processes analysis Schema simplification Trace clustering Patterns discovery
Patterns discovery Existing approaches: mining on traces Patterns abstraction Episodes discovery Episodes Discovery Patterns Abstraction P1: P1 : <Start,b,c,d,g> P2 : <e,f,h> … P2: P3:
Proposed Approach: Mining on Graphs Event Log Instance Graphs Set Patterns Set Case Id Trace 1 <Start,b,c,d,g,End> 2 <Start,a,b,d,c,g,i,End> 3 <Start,a,e,f,h,i,End> 4 <Start,b,c,d,g,e,f,h, End> 1 1 2 P1 2 P2 3 4
Building IGs set abcdgi The parallelism is hidden in the trace We need to know the causal relations between events Use of process discovery approaches abcdgi
Deriving causal relations from process discovery outcome CR set can be derived by means of some process discovery approach The mining techniques must be chosen carefully Source Target A B E C D F … …. A→B A→E
Instance Graphs building For each pair of events 𝑒 𝑖 , 𝑒 𝑗 for which 𝑒 𝑖 → 𝑒 𝑗 holds, add an edge if in the trace between 𝑒 𝑖 , 𝑒 𝑗 : (1) No successors of 𝑒 𝑖 OR (2) No predecessors of 𝑒 𝑗 Source Target A B I C D G K Source Target A B I C D G K Source Target A B I C D G K Source Target A B I C D G K Source Target A B I C D G K c T1: a b i c d g k a b g d 𝑎→𝑏 1 ok 𝑎→𝑖 1 no 2 ok 𝑏→𝑑 ok 1 no 2 ok 𝑏→𝑐 1 ok i k
Flower models problem Representing all possible behaviors can generate a flower model Using a flower model we obtain only sequence graphs Look only for most frequent relations Some traces will result “anomalous” t1 : <Start,a,e,f,h,End>
Graphs with anomalies 𝑡 1 : bacdg 𝑡 2 : afehi A B E C D G K F H I Source Target A B E C D G K F H I 𝑡 1 : bacdg 𝑡 2 : afehi
Use of conformance checking techniques Conformance checking technique provide precise information about the occurrence of an anomaly in a trace The corresponding graph explicitly represents the anomaly occurrence insertion deletion
Updated graphs with anomalies Source Target A B E C D G K F H I 𝑡 1 : bacdg 𝑡 2 : afehi
Proposed Approach: Mining on Graphs Event Log Instance Graphs Set Patterns Set Case Id Trace 1 <Start,b,c,d,g,End> 2 <Start,a,b,d,c,g,i,End> 3 <Start,a,e,f,h,i,End> 4 <Start,b,c,d,g,e,f,h, End> 1 1 2 P1 2 P2 3 4
Patterns extraction Frequent subgraph mining techniques Extraction of subgraphs whose “support” is above a threshold Support of a subgraph transaction-based: number of graphs involving the subgraph Occurrence-based: number of occurrences of the subgraph TB supp: 2 OB supp: 3
SUBDUE Algorithm Supported computed by using frequency and size Discovered patterns are arranged into a hierarchy
Experiments Two experiments: Log of BPI2013 (Incident Management) Wabo4 (CoseLog project) CR set derived by the Inductive Miner algorithm Patterns evaluation Support : transaction based Domain knowledge
BPI2013 Model mined by IMi
SUB1 Supp: 47% DK: the event “queued + awaiting assignment” is undesired
SUB7 Supp: 19% DK: high rate of incident management delegation
SUB12 Supp: 8% DK: this should be the “ideal” activities order
Wabo4: Process Model Mined by Imi
SUB1 Supp: 41% DK: Starting activities of an application management
SUB12 Supp: 16% DK: Final part of an application management. Unexpected parallelism
SUB3 Supp: 21% DK: can be considered a meaningful sub-process
Summing up The proposed method was able to detect interesting patterns, providing an alternative way to analyze complex, spaghetti processes The method is flexible; it can be used with any process discovery/ frequent subgraph mining technique Limits: The reliability of the results depends on the process discovery approach adopted The pattern interpretation support can be improved
Future works Improving the pattern interpretation support Providing for a pattern also its context Adding a performance evaluation based on patterns Single pattern evaluation: average costs, throughput time… Analyzing the pattern impact on the overall process performance
Thank you for your attention!