Download presentation
Presentation is loading. Please wait.
Published byGrant Sullivan Modified over 9 years ago
1
J. Carmona R. Gavaldà UPC (Barcelona, Spain) 1
2
Outline The Advent of Process Mining (PM) The challenge of Concept Drift (CD) Key ingredients Online strategy for CD in PM Experiments Work in progress 2
3
The Advent of Process Mining Process mining: BIG DATA in Information Systems Focus: formal analysis of the processes Software Engineering challenges: Process model alignment with reality Automation! Formal methods 3
4
[source: www.processmining.org] 4
5
Example: control flow discovery Information System CaseEventTimestamp 1reservation21-02-2009 12:20h 1arrival22-02-2009 21:05h 2reservation23-02-2009 14:00h 1payment23-02-2009 14:50h 2cancellation23-02-2009 16:00h Petri Net (PN) Event Log 5
6
Control Flow Discovery r p ac rj ap rs c sb em s Event Log (EL) Petri Net (PN) 6
7
The Challenge of Concept Drift MODEL time ≥ t+1 Time MODEL time ≤ t Drift ! r p ac rj ap rs c sb em s r p ac rj ap rs c sb em s MODEL time ≤ t MODEL time ≥ t + 1 7
8
The Challenge of Concept Drift [Bose-Aalst 11] Problem #1: Change Detection! “There is a drift in the previous log between traces 7 and 8” Problem #2: Change Localization and Characterization “The activities involved in the drift are em and s, for which the causality has changed” Problem #3: Unravel Process Evolution “In the new process, everything is the same but em and s, with em now preceding s” DISCLAIMER: We focus on ABRUPT changes. 8
9
Outline The Advent of Process Mining (PM) Key ingredients: Numerical Abstract Domains Concept Drift estimation and change detection Online strategy for CD in PM Experiments Work in progress 9
10
From log traces to points in R n 10
11
From points to convex polyhedra (Points2CP) Q = Convex Hull of the set of points the set of points mass(Q) = Probability of points in the log inside Q 11
12
Outline The Advent of Process Mining (PM) Key ingredients: Numerical Abstract Domains Concept Drift estimation and change detection Online strategy for CD in PM Experiments Work in progress 12
13
stream x 1,x 2,…,x t,… x t drawn from distribution D t, independently we model change by changes in the D t ’s Two basic problems Detect change (in the D t ) Estimate some statistic (on the D t ) E.g., if x t is a real numer, estimate E[x t ] Only possible if D t do not vary too wildly Setting 13
14
Windows & change detection Reference window + Sliding window Min-error window + growing windows Sliding window: keep consistent, no explicit change detection 14
15
Problem: What size windows? Large windows: Slow reaction to fast changes Small windows: Inaccurate estimates, noise sensitive, can’t detect small changes Optimal size depends on unknown rate of change User needs to guess Or else: detect rate from the stream? Windows & change detection 15
16
ADWIN: Adaptive Window Time-scale independent, data-adaptive User does not need to guess window size Behaves as if “best fixed-window size” known Keeps largest window consistent with statistical hypothesis “no change” Keeps window of size N in memory O(log N) O(1) amortized time per item, O(log N) worst case C++/JAVA implementation by A. Bifet available [Bifet-G 07] 16
17
Outline The Advent of Process Mining (PM) Key ingredients Online strategy for CD in PM Strategy for change detection Experiments Work in progress 17
18
Online Strategy for CD in PM LearningEstimationMonitoring LOG P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14... ONLINE CONCEPT DRIFT DETECTION Sequential Sampling 18
19
Learning Stage LOG Log Parikh vectors Points2CP Convex Polyhedron Q P1... PN 19
20
0 1 Estimation Stage LOG Log Parikh vectors P(N+1)... P(N+K) ADWIN P(N+1)... inside ? Yes No Estimate: mass(Q) Q 20
21
Monitoring Stage LOG Log Parikh vectors ADWIN P(N+K+1)... inside ? Yes No Q P(N+K+1)... DRIFT! 21
22
Algorithm Input: P1,P2,... sequence of log points 1.Select appropriate training size n 2.S = “Collect a random sample of m points out of the first n” 3.Q = Points2CP(S) 4.W = InitADWIN 5.i = m + 1 6.repeat 7. if “Pi included in Q” then W = W U {1} 8. else W = W U {0} 9. i = i + 1 10. until “Convergence criteria on W estimation” 11. while true do 12. update(Pi,Q,W) 13. i = i + 1 14. if “Drift detected on W” then “Emit Drift” and Jump to line 2 15. endwhile Learning Estimating Monitoring update(Pi,Q,W) 22
23
Experiments: setting Various models have been used to generate logs L = {L1,L2}, with L2 being the drifting part Drift have been created by perturbating the models: Flip: ordering between events is reversed Rem: one event is removed Conc: two ordered events become concurrent Conf: two ordered/concurrent events become in conflict 23
24
Experiments benchevents|L1|FLIPREMCONCCONF ShRes(6)2440001155418337 ShRes(8)3240001657338183 PC(8)414000337550262266 PC(9)464000256136323489 WMG(9)94000101167516 WMG(10)104000147285318 Cycles(4,2)1440005632366422 Cycles(5,2)2040005542284521 A12F0N0012620837611715 A22F0N002221323405699198 A32F0N003224836779258162 A42F0N004233081784118537 T32F0N003337661432839436 24
25
Outline The Advent of Process Mining (PM) Key ingredients: Online strategy for CD in PM Experiments Work in progress Tackling other problems 25
26
Problem #2: Change Localization In general: [Carmona-Cortadella 10] 26
27
b c a Problem #2: Change Localization 27
28
Producer-Consumer example EL points in R 8 28
29
Producer-Consumer example a + b ≤ e + 1 d ≤ b c ≤ a c ≤ a e ≤ c + d y ≤ x y ≤ c + d z ≤ y x ≤ z + 1 29
30
Problem #2: Change Localization a + b ≤ e + 1 d ≤ b c ≤ a c ≤ a e ≤ c + d y ≤ x y ≤ c + d z ≤ y x ≤ z + 1 ADWIN 1 ADWIN 2 ADWIN 3 ADWIN 4 ADWIN 5 ADWIN 6 ADWIN 7 ADWIN 8 Learning Estimation Monitoring 30
31
Problem #3: Unravel process evolution LearningEstimationMonitoring a + b ≤ e + 1 c ≤ a c ≤ a e ≤ c + d y ≤ x..... DRIFT! 31
32
Problem #3: Unravel process evolution LearningEstimationMonitoring a + b ≤ e + 1 c ≤ a c ≤ a e ≤ c + d y ≤ x..... x + b ≤ y + 1 y ≤ z new model 32
33
Conclusions & Future Work First online algorithm for CD in PM Several uses: segmenting the log for later process discovery, drift detection, … Able to find the majority of drifts in practice Ideas to tackle gradual drift Promising results: fast detection of concept drifts, even with simple abstract numerical domains (octagons) 33
34
Thanks! 34
35
Backup slides 35
36
The Advent of Process Mining Disciplines involved: Formal Methods and Models Algorithmics AI (e.g., Data Mining/Machine Learning) Information Systems Software Engineering Databases Bussiness... 36
37
Online Strategy for CD in PM Change Detection: Visual description of the algorithm (1-2 slides) Example (1-2 slides, with animation) Formal Description of the Algorithm (1 slide) Theorem enumeration on guarantees. (1 slide) Experiments (3-4 slides) More elaborated strategies (1 slide) Tackling the two other problems: Change localization (1-2 slides) Unraveling process evolution (1-2 slides) 37
38
Outline The Advent of Process Mining (PM) The challenge of Concept Drift (CD) Key ingredients: Process Discovery via Numerical Abstract Domains Concept Drift estimation and change detection Online strategy for CD in PM Strategy for change detection Experiments Work in progress More elaborated strategies Tackling other problems 38
39
R From log traces to points in R n From points in R n to convex polyhedra (Parikh2CP, used in this work) From convex polyhedra to inequalities From inequalities to Petri nets Process Discovery via Numerical Abstract Domains [Carmona & Cortadella, ECML/PKDD’2010] 39
40
From points to convex polyhedra Q = Convex Hull of the set of points the set of points mass(Q) = Probability of points in the log inside Q 40
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.