Process Mining: Discovering processes from event logs All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei ( ) Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands
Outline BPM research in Eindhoven Process Mining ProM –Architecture –Process mining plug-ins Alpha-algorithm Multi-phase mining Genetic mining –Analysis plug-ins –Conformance testing plug-in –LTL checker plug-in –Social network plug-in –Convertors ( , Staffware, InConcert, SAP, etc.) Conclusion
BPM research in Eindhoven
Computer Science/information Systems in Eindhoven TU/e TM (Technology Management) MCS (Math. and Comp. Science) IS (Information Systems) CS (Computer Science) BPM (Business Process Management) Math. ICTA SE IS (Information Systems) BETA
Research within the BPM group Workflow management systems Process modeling (design and redesign) Process analysis (validation, verification, and performance analysis) Process mining Often based on Petri nets
Analysis and improvement of resource-constrained workflow processes Mariska Netjes
Product-driven process design Irene Vanderfeesten
Improving work distribution mechanisms in WFM systems Maja Pesic
Patterns in process-aware information systems Nataliya Mulyar
Genetic process mining Ana Karla Alves de Medeiros
Process mining in case handling systems Christian Günther
Process mining: a formal approach Boudwijn van Dongen
Verification of workflow processes Eric Verbeek
YAWL: Yet Another Workflow Language
Arthur ter Hofstede (patterns, YAWL def.) Marlon Dumas (BABEL, SOC) Lachlan Aldred (YAWL engine, comm. pats) Lindsay Bradford (YAWL designer) Tore Fjellheim (YAWL logging, timers) Guy Redding (YAWL forms) Nick Russell (work distribution/transactions) Moe Wynn (verification, optimization) Michael Adams (flexibility)
Process Mining
Motivation: Reversing the process Process mining can be used for: –Process discovery (What is the process?) –Delta analysis (Are we doing what was specified?) –Performance analysis (How can we improve?) process mining
Overview 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security
Let us focus on mining process models … 1) basic performance metrics 2) process model3) organizational model4) social network 5) performance characteristics If …then … 6) auditing/security... and a very simple approach: The alpha algorithm
Process log Minimal information in log: case id’s and task id’s. Additional information: event type, time, resources, and data. In this log there are three possible sequences: –ABCD –ACBD –EF case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
>, ,||,# relations Direct succession: x>y iff for some case x is directly followed by y. Causality: x y iff x>y and not y>x. Parallel: x||y iff x>y and y>x Choice: x#y iff not x>y and not y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D A>B A>C B>C B>D C>B C>D E>F ABACBDCDEFABACBDCDEF B||C C||B
Basic idea (1) xyxy
Basic idea (2) x y, x z, and y||z
Basic idea (3) x y, x z, and y#z
Basic idea (4) x z, y z, and x||y
Basic idea (5) x z, y z, and x#y
It is not that simple: Basic alpha algorithm Let W be a workflow log over T. (W) is defined as follows. 1.T W = { t T W t }, 2.T I = { t T W t = first( ) }, 3.T O = { t T W t = last( ) }, 4.X W = { (A,B) A T W B T W a A b B a W b a1,a2 A a 1 # W a 2 b1,b2 B b 1 # W b 2 }, 5.Y W = { (A,B) X (A,B) X A A B B (A,B) = (A,B) }, 6.P W = { p (A,B) (A,B) Y W } {i W,o W }, 7.F W = { (a,p (A,B) ) (A,B) Y W a A } { (p (A,B),b) (A,B) Y W b B } { (i W,t) t T I } { (t,o W ) t T O }, and (W) = (P W,T W,F W ). The alpha algorithm has been proven to be correct for a large class of free-choice nets.
Example case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D (W) W
Challenges Refining existing algorithm for (control-flow/process perspective) –Hidden tasks –Duplicate tasks –Non-free-choice constructs –Loops –Detecting concurrency (implicit or explicit) –Mining and exploiting time –Dealing with noise –Dealing with incompleteness Mining other perspectives (data, resources, roles, …) Gathering data from heterogeneous sources Visualization of results Delta analysis
DEMO Alpha algorithm 48 cases 16 performers
ProM framework
ProM
Converter plug-in: Analyzer
XML format
ProM architecture
Mining plug-in: Multi-phase mining
Step 1: Get instances
Step 2: Project
Step 3: Aggregate
Step 4: Map onto EPC
Step 5: Map onto Petri net (or other language)
Mining plug-in: Genetic Miner
Overview of approach
Analysis plug-in: Social network miner
Cliques
SN based on hand-over of work metric density of network is 0.225
SN based on working together (and ego network)
Analysis plug-in: LTL checker
Analysis plug-in: Conformance checker Do they agree?
Fitness is not enough
Screenshot (Also runs on Mac.)
Other analysis plug-ins
More demos?
Conclusion Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers. Involves multiple perspectives (process, data, resources, etc.) Get ProM-ed! You can contribute by applying ProM and developing plug-ins
More information