Download presentation
Presentation is loading. Please wait.
Published byHelen McDaniel Modified over 9 years ago
1
Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642) Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands w.m.p.v.d.aalst@tm.tue.nl
2
Outline Process Mining –overview –alpha algorithm –genetic mining ProM –Architecture –Convertors (e-mail, Staffware, InConcert, SAP, etc.) –Process mining plug-ins Alpha-algorithm Multi-phase mining Genetic mining –Analysis plug-ins –Conformance testing plug-in –LTL checker plug-in –Social network plug-in Conclusion
3
Process Mining
4
Motivation: Reversing the process Process mining can be used for: –Process discovery (What is the process?) –Delta analysis (Are we doing what was specified?) –Performance analysis (How can we improve?) process mining
5
Overview 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security www.processmining.org
6
Let us focus on mining process models … 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security... and a very simple approach: The alpha algorithm
7
Alpha algorithm α
8
Process log Minimal information in log: case id’s and task id’s. Additional information: event type, time, resources, and data. In this log there are three possible sequences: –ABCD –ACBD –EF case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
9
>, ,||,# relations Direct succession: x>y iff for some case x is directly followed by y. Causality: x y iff x>y and not y>x. Parallel: x||y iff x>y and y>x Choice: x#y iff not x>y and not y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D A>B A>C B>C B>D C>B C>D E>F ABACBDCDEFABACBDCDEF B||C C||B
10
Basic idea (1) xyxy
11
Basic idea (2) x y, x z, and y||z
12
Basic idea (3) x y, x z, and y#z
13
Basic idea (4) x z, y z, and x||y
14
Basic idea (5) x z, y z, and x#y
15
It is not that simple: Basic alpha algorithm Let W be a workflow log over T. (W) is defined as follows. 1.T W = { t T W t }, 2.T I = { t T W t = first( ) }, 3.T O = { t T W t = last( ) }, 4.X W = { (A,B) A T W B T W a A b B a W b a1,a2 A a 1 # W a 2 b1,b2 B b 1 # W b 2 }, 5.Y W = { (A,B) X (A,B) X A A B B (A,B) = (A,B) }, 6.P W = { p (A,B) (A,B) Y W } {i W,o W }, 7.F W = { (a,p (A,B) ) (A,B) Y W a A } { (p (A,B),b) (A,B) Y W b B } { (i W,t) t T I } { (t,o W ) t T O }, and (W) = (P W,T W,F W ). The alpha algorithm has been proven to be correct for a large class of free-choice nets.
16
Example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D (W) W
17
DEMO Alpha algorithm 48 cases 16 performers
18
Challenges Refining existing algorithm for (control-flow/process perspective) –Hidden tasks –Duplicate tasks –Non-free-choice constructs –Loops –Detecting concurrency (implicit or explicit) –Mining and exploiting time –Dealing with noise –Dealing with incompleteness Mining other perspectives (data, resources, roles, …) Gathering data from heterogeneous sources Visualization of results Delta analysis
19
Genetic mining
20
Approach
21
Genetic mining: The two main questions How to represent an individual? (Petri net?) How to define the genetic operators? (e.g., crossover)
22
How to represent an individual? Problems with Petri nets: –Places do not exist in log –difficulties defining mutation and crossover –problems describing subtle rules without adding transitions
23
Representation of the goal process trueAAADDE^FE^FBvCvG →ABCDEFGH A01110000BvCvD B00000001H C00000001H D00001100E^F E00000010G F00000010G G00000001H H00000000true
24
A more compact representation ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E},{F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{B,C,G}}{}
25
Any Petri net can be mapped onto a causal matrix: ACTIVITYINPUTOUTPUT A{...}{{C,D},...} B{...}{{C,D},...} C{{A,B},...}{...} D{{A,B},...}{...} but...
26
Mapping a causal matrix onto a Petri net? ACTIVITYINPUTOUTPUT A{{i 11,i 12,i 13 },{i 21,i 22,i 23 }}{{o 11,o 12,o 13 },{o 21,o 22,o 23 }}
27
Wiring based on input and output sets Using place fusion or silent transitions.
28
Example ACTIVITYINPUTOUTPUT A{}{{C,D}} B{}{{D}} C{{A}}{} D{{A,B}}{}
29
Wiring using silent transitions Always a solution?
30
Problem (the need to be lazy...) ACTIVITYINPUTOUTPUT A{}{{B},{C,D}} B{{A}}{{E,F}} C{{A}}{{E}} D{{A}}{{F}} E{{B},{C}}{{G}} F{{B},{D}}{{G}} G{{E},{F}}{}
31
However,... ACTIVITYINPUTOUTPUT A{}{{B},{C,D}} B{{A}}{{E,F}} C{{A}}{{E}} D{{A}}{{F}} E{{B},{C}}{{G}} F{{B},{D}}{{G}} G{{E},{F}}{}
32
Two approaches: CM2PN 1.A naive mapping using silent transitions. Always works Larger net Requires "lazy transition" semantics Implemented in ProM 2.A more sophisticated mapping based on "smart place fusion" Only for a subset of CMs Subclass can be characterized Not (yet) implemented
33
Example: Event log case idactivity idoriginatortimestamp case 1activity AJohn9-3-2004:15.01 case 2activity AJohn9-3-2004:15.12 case 3activity ASue9-3-2004:16.03 case 3activity DCarol9-3-2004:16.07 case 1activity BMike9-3-2004:18.25 case 1activity HJohn10-3-2004:9.23 case 2activity CMike10-3-2004:10.34 case 4activity ASue10-3-2004:10.35 case 2activity HJohn10-3-2004:12.34 case 3activity EPete10-3-2004:12.50 case 3activity FCarol11-3-2004:10.12 case 4activity DPete11-3-2004:10.14 case 3activity GSue11-3-2004:10.44 case 3activity HPete11-3-2004:11.03 case 4activity FSue11-3-2004:11.18 case 4activity EClare11-3-2004:12.22 case 4activity GMike11-3-2004:14.34 case 4activity HClare11-3-2004:14.38
34
Goal
35
Example: Starting point case idactivity id case 1activity A case 2activity A case 3activity A case 3activity D case 1activity B case 1activity H case 2activity C case 4activity A case 2activity H case 3activity E case 3activity F case 4activity D case 3activity G case 3activity H case 4activity F case 4activity E case 4activity G case 4activity H + 500 randomly generated initial individuals
36
Two individuals ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E}} E{{D}}{{G}} F{}{{G}} G{{E},{F}}{{H}} H{{C,B,G}}{} ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E,F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{C},{B},{G}}{}
37
Crossover ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E, F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{C,B,G}}{} ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E}} E{{D}}{{G}} F{}{{G}} G{{E},{F}}{{H}} H{{C},{B},{G}}{}
38
Resulting CM with fitness 1.0 trueAAADDE^FE^FBvCvG →ABCDEFGH A01110000BvCvD B00000001H C00000001H D00001100E^F E00000010G F00000010G G00000001H H00000000true
39
Mapping
40
ProM framework
41
ProM
42
Converter plug-in: EMailAnalyzer
43
XML format
44
ProM architecture
45
Mining plug-in: Alpha algorithm
46
Mining plug-in: Genetic Miner
47
Mining plug-in: Multi-phase mining
48
Step 1: Get instances
49
Step 2: Project
50
Step 3: Aggregate
51
Step 4: Map onto EPC
52
Step 5: Map onto Petri net (or other language)
53
Mining plug-in: Social network miner
55
Cliques
56
SN based on hand-over of work metric density of network is 0.225
57
SN based on working together (and ego network)
58
Analysis plug-in: LTL checker
60
Analysis plug-in: Conformance checker Do they agree?
62
Fitness is not enough
63
Screenshot (Also runs on Mac.)
64
Other analysis plug-ins
65
More demos?
66
Conclusion Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers. Involves multiple perspectives (process, data, resources, etc.) Get ProM-ed! You can contribute by applying ProM and developing plug-ins
67
More information http://www.workflowcourse.com http://www.workflowpatterns.com http://www.processmining.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.