Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei ( ) Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands
Outline Process Mining –overview –alpha algorithm –genetic mining ProM –Architecture –Convertors ( , Staffware, InConcert, SAP, etc.) –Process mining plug-ins Alpha-algorithm Multi-phase mining Genetic mining –Analysis plug-ins –Conformance testing plug-in –LTL checker plug-in –Social network plug-in Conclusion
Process Mining
Motivation: Reversing the process Process mining can be used for: –Process discovery (What is the process?) –Delta analysis (Are we doing what was specified?) –Performance analysis (How can we improve?) process mining
Overview 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security
Let us focus on mining process models … 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security... and a very simple approach: The alpha algorithm
Alpha algorithm α
Process log Minimal information in log: case id’s and task id’s. Additional information: event type, time, resources, and data. In this log there are three possible sequences: –ABCD –ACBD –EF case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
>, ,||,# relations Direct succession: x>y iff for some case x is directly followed by y. Causality: x y iff x>y and not y>x. Parallel: x||y iff x>y and y>x Choice: x#y iff not x>y and not y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D A>B A>C B>C B>D C>B C>D E>F ABACBDCDEFABACBDCDEF B||C C||B
Basic idea (1) xyxy
Basic idea (2) x y, x z, and y||z
Basic idea (3) x y, x z, and y#z
Basic idea (4) x z, y z, and x||y
Basic idea (5) x z, y z, and x#y
It is not that simple: Basic alpha algorithm Let W be a workflow log over T. (W) is defined as follows. 1.T W = { t T W t }, 2.T I = { t T W t = first( ) }, 3.T O = { t T W t = last( ) }, 4.X W = { (A,B) A T W B T W a A b B a W b a1,a2 A a 1 # W a 2 b1,b2 B b 1 # W b 2 }, 5.Y W = { (A,B) X (A,B) X A A B B (A,B) = (A,B) }, 6.P W = { p (A,B) (A,B) Y W } {i W,o W }, 7.F W = { (a,p (A,B) ) (A,B) Y W a A } { (p (A,B),b) (A,B) Y W b B } { (i W,t) t T I } { (t,o W ) t T O }, and (W) = (P W,T W,F W ). The alpha algorithm has been proven to be correct for a large class of free-choice nets.
Example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D (W) W
DEMO Alpha algorithm 48 cases 16 performers
Challenges Refining existing algorithm for (control-flow/process perspective) –Hidden tasks –Duplicate tasks –Non-free-choice constructs –Loops –Detecting concurrency (implicit or explicit) –Mining and exploiting time –Dealing with noise –Dealing with incompleteness Mining other perspectives (data, resources, roles, …) Gathering data from heterogeneous sources Visualization of results Delta analysis
Genetic mining
Approach
Genetic mining: The two main questions How to represent an individual? (Petri net?) How to define the genetic operators? (e.g., crossover)
How to represent an individual? Problems with Petri nets: –Places do not exist in log –difficulties defining mutation and crossover –problems describing subtle rules without adding transitions
Representation of the goal process trueAAADDE^FE^FBvCvG →ABCDEFGH A BvCvD B H C H D E^F E G F G G H H true
A more compact representation ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E},{F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{B,C,G}}{}
Any Petri net can be mapped onto a causal matrix: ACTIVITYINPUTOUTPUT A{...}{{C,D},...} B{...}{{C,D},...} C{{A,B},...}{...} D{{A,B},...}{...} but...
Mapping a causal matrix onto a Petri net? ACTIVITYINPUTOUTPUT A{{i 11,i 12,i 13 },{i 21,i 22,i 23 }}{{o 11,o 12,o 13 },{o 21,o 22,o 23 }}
Wiring based on input and output sets Using place fusion or silent transitions.
Example ACTIVITYINPUTOUTPUT A{}{{C,D}} B{}{{D}} C{{A}}{} D{{A,B}}{}
Wiring using silent transitions Always a solution?
Problem (the need to be lazy...) ACTIVITYINPUTOUTPUT A{}{{B},{C,D}} B{{A}}{{E,F}} C{{A}}{{E}} D{{A}}{{F}} E{{B},{C}}{{G}} F{{B},{D}}{{G}} G{{E},{F}}{}
However,... ACTIVITYINPUTOUTPUT A{}{{B},{C,D}} B{{A}}{{E,F}} C{{A}}{{E}} D{{A}}{{F}} E{{B},{C}}{{G}} F{{B},{D}}{{G}} G{{E},{F}}{}
Two approaches: CM2PN 1.A naive mapping using silent transitions. Always works Larger net Requires "lazy transition" semantics Implemented in ProM 2.A more sophisticated mapping based on "smart place fusion" Only for a subset of CMs Subclass can be characterized Not (yet) implemented
Example: Event log case idactivity idoriginatortimestamp case 1activity AJohn :15.01 case 2activity AJohn :15.12 case 3activity ASue :16.03 case 3activity DCarol :16.07 case 1activity BMike :18.25 case 1activity HJohn :9.23 case 2activity CMike :10.34 case 4activity ASue :10.35 case 2activity HJohn :12.34 case 3activity EPete :12.50 case 3activity FCarol :10.12 case 4activity DPete :10.14 case 3activity GSue :10.44 case 3activity HPete :11.03 case 4activity FSue :11.18 case 4activity EClare :12.22 case 4activity GMike :14.34 case 4activity HClare :14.38
Goal
Example: Starting point case idactivity id case 1activity A case 2activity A case 3activity A case 3activity D case 1activity B case 1activity H case 2activity C case 4activity A case 2activity H case 3activity E case 3activity F case 4activity D case 3activity G case 3activity H case 4activity F case 4activity E case 4activity G case 4activity H randomly generated initial individuals
Two individuals ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E}} E{{D}}{{G}} F{}{{G}} G{{E},{F}}{{H}} H{{C,B,G}}{} ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E,F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{C},{B},{G}}{}
Crossover ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E, F}} E{{D}}{{G}} F{{D}}{{G}} G{{E},{F}}{{H}} H{{C,B,G}}{} ACTIVITYINPUTOUTPUT A{}{{B,C,D}} B{{A}}{{H}} C{{A}}{{H}} D{{A}}{{E}} E{{D}}{{G}} F{}{{G}} G{{E},{F}}{{H}} H{{C},{B},{G}}{}
Resulting CM with fitness 1.0 trueAAADDE^FE^FBvCvG →ABCDEFGH A BvCvD B H C H D E^F E G F G G H H true
Mapping
ProM framework
ProM
Converter plug-in: Analyzer
XML format
ProM architecture
Mining plug-in: Alpha algorithm
Mining plug-in: Genetic Miner
Mining plug-in: Multi-phase mining
Step 1: Get instances
Step 2: Project
Step 3: Aggregate
Step 4: Map onto EPC
Step 5: Map onto Petri net (or other language)
Mining plug-in: Social network miner
Cliques
SN based on hand-over of work metric density of network is 0.225
SN based on working together (and ego network)
Analysis plug-in: LTL checker
Analysis plug-in: Conformance checker Do they agree?
Fitness is not enough
Screenshot (Also runs on Mac.)
Other analysis plug-ins
More demos?
Conclusion Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers. Involves multiple perspectives (process, data, resources, etc.) Get ProM-ed! You can contribute by applying ProM and developing plug-ins
More information