Formal Methods in Business Process Management Karsten Wolf
Introduction
What is a Business Process? Examples: Process an insurance claim Process an application for a loan Call for bids for a public construction project Customer complaint management Maintenance of a technical device in a power station Software development in an IT company Just-in-time logistics of an automotive supplier ...
What is a business process? A business process is a sequence of activities, that Are logically correlated, Are self-contained Use ressources and incoming information Are executed by humans or machines Target a business goal
Traditional Company Business segment Procurement Production Billing Marketing
Modern Company Production specialist Billing specialist Marketing specialist Business segment Procurement specialist
Properties of a business process Is a phenomenon of real business life Can transcend functional units, hierarchies, or location A business process is characterized by A defined start and end, Required inputs (e.g. customer desires), Produced results (e.g. customer satisfaction)
Levels of business processes Primary: immediate for added value, core process Planning, production, marketing, ... Secundary: support for primary processes Compliance to laws, utilisation of side products, human ressources management, ... Tertiary: no contribution to added values Security, cleaning, ...
Why business processes? Exist independently of our recognition Concious recognition (i.e. modelling) permits Analysis, optimisation, refactoring, automation Reaction to new market situations, new legal requirements, etc. Increased quality (ISO 9000) Better use of rs esources More complex product portfolio
Semi-formal Modeling
Why model? Processes are. Conscious recognition permits Analysis (bottlenecks, inconsistencies) Reorganisation (e.g. after merger or take-over) Evaluation and Certification (e.g. ISO 9000) Better cooperation (interplay between units) Better communication (Management - Development - Customer - Test - Marketing) More effective use of resources (e.g. Just-In-Time) Protocol (e.g. legal requirements, e.g. BASEL II, Sarbanes/Oxley)
Why model graphically? A picture tells more than 1000 words Complex relations Online: Show/hide details Intuitive recognition of icons
Elements of a process model Activities, e.g. Create offer Process invoice Accept offer Events, e.g. Received offer Credit-worthiness confirmed Causal dependencies, e.g. - before - after - concurrently - alternatively
Properties of a process model event driven as opposed to - clocked - time triggered discrete - continuous - hybrid Ressource oriented (produce, consume) - value oriented (read, write)
Modelling in BPMN Business Process Model and Notation published 2004 Standard of OMG since 2006 (V 2.0 2011) Used in popular tools (e.g. SAP R3, ARIS) Freely accessible in web browser: Signavio Academic Initiative Semi-formal (no mathematically precise semantics)
Core element: Activity
... Logically connected
... with complex activities
...with start and end
Alternative threads (data driven)
Different icons for start event general By incoming message At certain point in time Upon error ...
Different icons for end events general By escalation abort By sent message Through throwing an error
Joining alternative threads
Several actors
Concurrent activities
Data flow
Data flow
Data flow
Control flow
Repetition ↺ III ~ Multiple times Multiple times In parallel sequentially Ad Hoc
Events Trigger React General Escalation Message Signal Link Business condition Time Error
Example
Example
Example
Transaction / Compensation
The OR-Join Problem ✔ Proceed or wait?
Formal Modeling
Petri net Places Represent state, event, resource, … Tokens current state, activate event, present ressource Transitions - Represent activities Occurrence = token game
Definitions Petri net: S – finite set of Places (German: „Stellen“) T – finite set of Transitions S T = F – set of Arcs F (SxT)(TxS) W – Arc weights W: F N \ {0} m0 – Initial marking [S,T,F,W,m0] Elements of S T are called nodes Marking: Distribution of tokens on places m: S N {0}
Definitions t Pre-set of node x: ●x = { y | [y,x] F} Post-set of node x: x● = { y | [x,y] F} Transition t is enabled (has Concession/ is fireable) in marking m: For all s ●x : W([s,t] m(s) Transition t fires in m and yields m´: t is enabled in m and for all s: m´(s) = m(s) – W([s,t]) + W([t,s]) (Assumption: W([x,y]) = 0 für [x,y] F) m [t> m´ m m´ t
Definitions Reachability … with transition sequence w m [> m If m [w>m1 and m1[t>m´ , then m [wt>m´ … arbitrary sequence m[*>m´ if there exists w such that m[w>m´ Set of markings reachable from m: RN(m) = {m´| m[*>m´} Reachability graph of net N = [S,T,F,W,m0]: Directed, annotated graph [V,E] V = RN(m0) [m,t,m´] E iff m[t>m´
Workflow nets A workflow net Is a Petri net N = (S, T, F), Has distinguished start place with empty pre-se (iP, i = ), Has distinguished end place with empty post-set (oP, o = ) and, complemented with transition t*, strongly connected.
Important property of workflow nets Soundness Rationale: Every started process instance can terminate At termination, everything is cleaned Every activity is possible
Soundness formally: For all m, reachable from [i]: [o] is reachable from m For all m such that m(o) > 0: m(p) = 0 for all p o For all t there is m, reachable from [i], where t is enabled Link to standard properties of Petri nets: N is sound iff underlying Petri is live and bounded.
Soundness as family of properties Soundness (as seen) Lazy soundness (is there a terminating execution?) Weak soundness (Every execution terminates, no garbage left) Relaxed soundness (Is every activity included in a terminating execution) K-Soundness (start with k initial tokens) Generalized soundness (k-sound for all k) Use general PN verification tools
Verification of liveness Will use: Strongly connected components (SCC): Let [V,E] be directed graph. Vertices v,v´ are strongly connected (v~v´), if v* v´ and v´* v. ~ is equivalence relation. Classes are called SCC. An SCC is terminal (TCC), if no other SCC is reachable from it.
Verification of Liveness Let N be Petri net and [RN(m0),E] its reachability graph. t live iff t appears in every TSCC. t m0 t
Boundedness If m0 [*> m [*> m´ and m´ > m, then N is unbounded. Reverse holds, too: If N unbounded, then there exist m and m´ with m0 [*> m [*> m´ and m´ > m. m´ > m means: for all s: m´(s) ≥ m(s) and for at least one s: m´(s) > m(s).
Verification of Boundedness (0,1,0,0) (0,0,0,0) (1,0,0,0) t3 t1 t2 t3 (0,1,w,0) (0,0,w,0) (1,0,w,0) t4 t2 t4 (0,1,w,w) (0,0,w,w) Simultaneously unbounded: t32k t1 t4k
Another example s1 t1 t2 s2 s3 2 2 t3 t4 (1,0,0) t1 t2 (0,1,0) (0,0,1) (0,w,0) (0,0,w) Both unbounded, but not simultaneously
Results Benchmark: 800 process models from IBM Checking soundness Average time: 200 ms / model Largest time: 900 ms Used tools: ...., LoLA , .... Fast enough to be used interactively State space reduction methods Structural methods
State space reduction I 3 4 1 2 111 211 121 112 444 131 122 113 321 231 222 132 213 312 123 322 331 232 313 133 223 332 323 233 333 311 221 212
Idea: delay concurrent transitions 111 211 121 112 311 221 212 444 131 122 113 321 231 222 132 213 312 123 322 331 232 313 133 223 332 323 233 333
Reduced transition system 111 121 122 222 223 323 333 444
Reduction by symmetry Idea: symmetrically structured systemes Show symmetric behaviour If behaviour at m is known and m’ is symmetric to m, We do not need to explore behaviour at m‘ formally: equivalence relation; quotient transition system
Example 1
Structural analysis: Place invariants Process 1 Prozess 2 leave cs leave cs s1 s2 s4 s5 s3 enter cs enter cs ( 0 , 1 , 1 , 1 , 0 ) is place invariant = weights for places s.t. weighted tokens sum is invariant Can be easily computed by solving system of equations
Boundedness If there is place invariant I such that. i(s) > 0 s is bounded i(s´) ≥ 0, for all other s´ Proof: m reachable i • m = i • m0 i(s) • m(s) ≤ i • m = i • m0 m(s) ≤ i • m0 / i(s)c Enter cs Leave cs Process 1 Process 2 s1 s2 s3 s4 s5 ( 1, 2 ,1 , 2 , 1 )
The verification tool LoLA Broadest set of state-of-the-art methods State space, structural and combined Has won several categories in the model checking contest Solves about 90% of the queries
Analysing quantitative aspects: Stochastic Petri nets Occurrence of a transition is interpreted as an event that obeys the laws of probability Probability of being in a marking Average throughput of transition Average token count on a place …
Stochastic process Stochastic process = Family {x(t) | tT} of random variables T = Nat discrete time T = [0,∞) dense time Domains of random variables countable „Chain“ Here: domain = reachable markings homogeneous Markov process = memoryless stoch. Process: P(x(t) = k | x(t1)=k1,…,x(tn)=kn) (t1<…<tn<t) = P(x(t) = k | x(tn)=kn)
Transition probability discrete Markov chain: qij(s) = P(x(n+s) = j | x(n) = i) matrix Q(s) Thm: qij(s) = Sk qik(m) qkj(s-m) for m<s Also: Q(s) = Q(m)Q(s-m) Q(s) = Q(1)s
Stochastic Petri nets [S,T,F,W,m0,l] l: T Real l(t) = Firing rate of t = Parameter of neg.exp. distribution, describes firing retard of t Motivation for neg.exp. distribution memoryless Markov theory applicable Many other distributions can be approximated
Example s1 (1,0,0,0,0) (0,1,1,0,0) (0,0,1,1,0) (0,1,0,0,1) (0,0,0,1,1) .75 t1 1 t1 t5 s3 s2 t4 t3 .4 1.2 t2 4.3 t4 t2 t3 t2 t3 t4 1 s5 s4 t5 2.2
Process Mining
Objective Models are built independantly of the actual process Often, we have event logs that represent traces of actually running processes Idea: construct process automatically from the logs Goals More realistic processes Comparison to models that are built offline
Logs Assumption: no noise Log contains at least: case id und task id’s case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D Assumption: no noise Log contains at least: case id und task id’s Additionally: type of event, time stamp, resource, data In Example: 4 traces ABCD ACBD EF
Oldest algorithm: a – relations >,,||,# immediate sequence: x>y iff x followed by y in at least one case Causality: xy iff x>y and not y>x. Parallel: x||y iff x>y and y>x Unrelated: x#y iff neither x>y nor y>x. case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B ... ABCD ACBD EF A>B A>C B>C B>D C>B C>D E>F AB AC BD CD EF B||C C||B
Idea (1) xy
Idea (2) xy, xz, and y||z
Idea (3) xy, xz, and y#z
Idea (4) xz, yz, and x||y
Idea (5) xz, yz, and x#y
Example AB AC BD CD EF ABCD ACBD EF B||C C||B
Limitations B>B and not B>B implies BB (impossible!) Length 1 A>B and B>A implies A||B and B||A instead of AB and BA
Meanwhile Dozens of algorithms Take care of noise, ressources, roles, social network, ... General problems Too detailed (fine granukarity) ..... – too general (coarse grnaularity) Overfitting (only logged event sequences represented) .... Underfitting (too many additional sequences)
Conformance checking Given model + event logs Question: To which degree can the model explain the event sequences? Approach: Match sequence to model Penalty for missing activity Penalty for surplus activity Minimize penalty „best alignment“
Conclusion Soundness Checking Workflow Management System Model Business Process Conformance Checking Process Mining Event logs