Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Cluster Analysis: Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Introduction to Bioinformatics
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Tracking Video Objects in Cluttered Background
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Introduction to Bioinformatics - Tutorial no. 12
What is Cluster Analysis?
Jeremy Tantrum, Department of Statistics, University of Washington joint work with Alejandro Murua & Werner Stuetzle Insightful Corporation University.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Evaluating Performance for Data Mining Techniques
Image Segmentation by Clustering using Moments by, Dhiraj Sakumalla.
Data Mining Techniques
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Scientific Workflows Within the Process Mining Domain Martina Caccavale 17 April 2014.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
Hyperspectral Imaging Alex Chen 1, Meiching Fong 1, Zhong Hu 1, Andrea Bertozzi 1, Jean-Michel Morel 2 1 Department of Mathematics, UCLA 2 ENS Cachan,
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Presented by Tienwei Tsai July, 2005
A university for the world real R © 2009, Chapter 21 YAWL4Healthcare Ronny Mans Wil van der Aalst Nick Russell Arnold Moleman Piet.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Clustering Hongfei Yan School of EECS, Peking University 7/8/2009 Refer to Aaron Kimball’s slides.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Discovering Models for State-based Processes M.L. van Eck, N. Sidorova, W.M.P. van der Aalst.
Profiling and process mining What has been done???
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Computational Biology
Unsupervised Learning
What Is Cluster Analysis?
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Profiling based unstructured process logs
Data Mining K-means Algorithm
Patterns extraction from process executions
Data Clustering Michael J. Watts
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
A General Framework for Correlating Business Process Characteristics
Hierarchical clustering approaches for high-throughput data
CS 548 Sequence Mining Showcase By Bian Du, Wa Gao, and Cam Jones
CSE572, CBS598: Data Mining by H. Liu
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
DATA MINING Introductory and Advanced Topics Part II - Clustering
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
Unsupervised Learning
Presentation transcript:

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst

Introduction □ The major application of process mining  Discovery ⇒ extraction of abstract process knowledge from event logs □ Real-life business processes are flexible  Spaghetti model  Single cases differ significantly from one another = ‘Diversity’  Discovering actual process which is being executed is valuable. □ Solution for diversity of cases  Measure the similarity of cases and use the information to divide the set of cases into more homogeneous subsets.  Trace clustering

Running Example □ Repair process of products within an electronic company that makes navigation and mobile phones  Case: a specific row  Trace: the sequence of events within a case  Events: represented by the case identifier, activity identifier, and originator Case identifier Activity identifier Originator

Running Example Navigation system Mobile Phone Repair is Canceled □ Trace clustering can support the identification of process variants corresponding to homogenous subsets of cases

Trace profiles □ In the trace clustering approach, each case is characterized by a defined set of items, i.e., specific features which can be extracted from the corresponding trace. □ Items for comparing traces are organized in trace profiles, each addressing a specific perspective of the log

Trace profiles □ Information in Event log  WorkflowLog  group any number of process elements  ProcessInstance  a case  AuditTrailEntry  events  WorkflowModelElement  name of event  mandatory event attribute  EventType  identify lifecycle transitions  mandatory event attribute  Timestamp, Originator  optional data fields

Trace profiles □ Profile  A set of related items which describe the trace from a specific perspective □ Every item is a metric ⇒ we can consider a profile with n items to be a function, which assigns to a trace a vector ( i 1, i 2, … i n ) □ Profiling a log can be described as measuring a set of traces with a number of profiles, resulting in an aggregate vector  Resulting vectors can subsequently be used to calculate the distance between any two traces, using a distance metric

Trace profiles

Clustering Methods - Distance Measures □ Distance Measures  To calculate the similarity between cases □ Three distance measures  n : the number of items extracted from the process log  case c j : corresponds to the vector ( i j1, i j2, … i jn )  i jk : the number of appearance of item k in the case j

Clustering Methods – C lustering Algorithm □ K-means clustering  A method of cluster analysis  aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. □ QT (quality threshold) clustering  A method of partitioning data  invented for gene clustering  requires more computing power than k-means  but does not require specifying the number of clusters a priori  predictable - always returns the same result when run several times.  guided by a quality threshold(determines the maximum diameter of clusters)

Clustering Methods – C lustering Algorithm □ Agglomerative hierarchical clustering  Gradually generate clusters by merging nearest traces  Smaller clusters are merged into large ones  Example: we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance.

Clustering Methods – C lustering Algorithm □ The Self-Organizing Map (SOM)  Used to map high dimensional data onto low dimensional spaces  grouping similar cases close together in certain areas of the value range  can be used to portray complex correlations in statistical data.  Example: World Bank statistics of countries in  39 indicators describing various quality-of-life factors were used  Countries that had similar values of the indicators place near each other on the map  different clusters were automatically encoded with different bright colors  each country was assigned a color describing its poverty type in relation to other countries  The poverty structures of the world: each country on the geographic map has been colored according to its poverty type.

Case study □ ProM  Support various process mining algorithm  Implemented the trace clustering plug-in in ProM □ Process log from AMC hospital in Amsterdam, Netherlands  619 gynecological oncology patients (treated in 2005, 2006) = 619 cases  52 diagnostic activities  3,574 events, 34 departments are involved

Case study □ Process model for all cases obtained using the Heuristic Miner

Case study □ The result obtained by applying the trace clustering plug-in in ProM □ The cases in the same cell = belong to the same cluster cluster (1,2) 352 cluster (3,1) 113

Case study □ Result for cluster (1,2)  352 cases (more than half of the entire cases)  Only 11 activities ⇒ Simple  Patients who are diagnosed by another hospital and are referred to the AMC hospital for treatment

Case study □ Result for cluster (3,1)  113 cases  Complex as the original process model  Patients who are not diagnosed and need more complex and detailed diagnostic activities

Conclusion □ Process mining techniques can deliver valuable, factual insights into how processes are being executed in real life  Important for analyzing flexible environments □ Trace clustering operates on the event log level  Improve the results of any process mining algorithm □ Both the approach and implementation are straightforward to extend  Ex: By adding domain-specific profiles or further clustering algorithm