Download presentation
Presentation is loading. Please wait.
Published byClare Dover Modified over 9 years ago
1
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst
2
Introduction □ The major application of process mining Discovery ⇒ extraction of abstract process knowledge from event logs □ Real-life business processes are flexible Spaghetti model Single cases differ significantly from one another = ‘Diversity’ Discovering actual process which is being executed is valuable. □ Solution for diversity of cases Measure the similarity of cases and use the information to divide the set of cases into more homogeneous subsets. Trace clustering
3
Running Example □ Repair process of products within an electronic company that makes navigation and mobile phones Case: a specific row Trace: the sequence of events within a case Events: represented by the case identifier, activity identifier, and originator Case identifier Activity identifier Originator
4
Running Example Navigation system Mobile Phone Repair is Canceled □ Trace clustering can support the identification of process variants corresponding to homogenous subsets of cases
5
Trace profiles □ In the trace clustering approach, each case is characterized by a defined set of items, i.e., specific features which can be extracted from the corresponding trace. □ Items for comparing traces are organized in trace profiles, each addressing a specific perspective of the log
6
Trace profiles □ Information in Event log WorkflowLog group any number of process elements ProcessInstance a case AuditTrailEntry events WorkflowModelElement name of event mandatory event attribute EventType identify lifecycle transitions mandatory event attribute Timestamp, Originator optional data fields
7
Trace profiles □ Profile A set of related items which describe the trace from a specific perspective □ Every item is a metric ⇒ we can consider a profile with n items to be a function, which assigns to a trace a vector ( i 1, i 2, … i n ) □ Profiling a log can be described as measuring a set of traces with a number of profiles, resulting in an aggregate vector Resulting vectors can subsequently be used to calculate the distance between any two traces, using a distance metric
8
Trace profiles
9
Clustering Methods - Distance Measures □ Distance Measures To calculate the similarity between cases □ Three distance measures n : the number of items extracted from the process log case c j : corresponds to the vector ( i j1, i j2, … i jn ) i jk : the number of appearance of item k in the case j
10
Clustering Methods – C lustering Algorithm □ K-means clustering A method of cluster analysis aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. □ QT (quality threshold) clustering A method of partitioning data invented for gene clustering requires more computing power than k-means but does not require specifying the number of clusters a priori predictable - always returns the same result when run several times. guided by a quality threshold(determines the maximum diameter of clusters)
11
Clustering Methods – C lustering Algorithm □ Agglomerative hierarchical clustering Gradually generate clusters by merging nearest traces Smaller clusters are merged into large ones Example: we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance.
12
Clustering Methods – C lustering Algorithm □ The Self-Organizing Map (SOM) Used to map high dimensional data onto low dimensional spaces grouping similar cases close together in certain areas of the value range can be used to portray complex correlations in statistical data. Example: World Bank statistics of countries in 1992. 39 indicators describing various quality-of-life factors were used Countries that had similar values of the indicators place near each other on the map different clusters were automatically encoded with different bright colors each country was assigned a color describing its poverty type in relation to other countries The poverty structures of the world: each country on the geographic map has been colored according to its poverty type.
13
Case study □ ProM Support various process mining algorithm Implemented the trace clustering plug-in in ProM □ Process log from AMC hospital in Amsterdam, Netherlands 619 gynecological oncology patients (treated in 2005, 2006) = 619 cases 52 diagnostic activities 3,574 events, 34 departments are involved
14
Case study □ Process model for all cases obtained using the Heuristic Miner
15
Case study □ The result obtained by applying the trace clustering plug-in in ProM □ The cases in the same cell = belong to the same cluster cluster (1,2) 352 cluster (3,1) 113
16
Case study □ Result for cluster (1,2) 352 cases (more than half of the entire cases) Only 11 activities ⇒ Simple Patients who are diagnosed by another hospital and are referred to the AMC hospital for treatment
17
Case study □ Result for cluster (3,1) 113 cases Complex as the original process model Patients who are not diagnosed and need more complex and detailed diagnostic activities
18
Conclusion □ Process mining techniques can deliver valuable, factual insights into how processes are being executed in real life Important for analyzing flexible environments □ Trace clustering operates on the event log level Improve the results of any process mining algorithm □ Both the approach and implementation are straightforward to extend Ex: By adding domain-specific profiles or further clustering algorithm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.