Discovering Social Networks from Enterprise Data Laks V.S. Lakshmanan Based on: Wil M.P. van der Aalst, Hajo A. Reijers, Minseok Song. Discovering Social.

Slides:



Advertisements
Similar presentations
February 20, Spatio-Temporal Bandwidth Reuse: A Centralized Scheduling Mechanism for Wireless Mesh Networks Mahbub Alam Prof. Choong Seon Hong.
Advertisements

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Use Case & Use Case Diagram
/faculteit technologie management 1 Process Mining: Organizational and Conformance Mining Algorithms Ana Karla Alves de Medeiros Ana Karla Alves de Medeiros.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
/faculteit technologie management PN-1 Petri nets refresher Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology, Faculty of Technology Management,
Small-World Graphs for High Performance Networking Reem Alshahrani Kent State University.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
What is “Routing”? Routing algorithm that part of the network layer responsible for deciding on which output line to transmit an incoming packet Adaptive.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Centrality and Prestige HCC Spring 2005 Wednesday, April 13, 2005 Aliseya Wright.
Boudewijn van Dongen /t Multi-phase process mining Building instance graphs.
Tirgul 9 Amortized analysis Graph representation.
Process Mining in CSCW Systems All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei ( )
Mining Social Networks Uncovering interaction patterns in business processes Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department.
Business Alignment Using Process Mining as a Tool for Delta Analysis Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information.
/faculteit technologie management Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst.
Discovering Coordination Patterns using Process Mining Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology.
Process Mining: An iterative algorithm using the Theory of Regions Kristian Bisgaard Lassen Boudewijn van Dongen Wil van.
The Shortest Path Problem
Systems Analysis I Data Flow Diagrams
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
Jorge Muñoz-Gama Universitat Politècnica de Catalunya (Barcelona, Spain) Algorithms for Process Conformance and Process Refinement.
Lesley Charles November 23, 2009.
Network Layer4-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Chapter 6: Information Retrieval and Web Search
Chapter 7 – Deadlock (Pgs 283 – 306). Overview  When a set of processes is prevented from completing because each is preventing the other from accessing.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Complex Networks Measures and deterministic models Philippe Giabbanelli.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Han-na Yang Rediscovering Workflow Models from Event-Based Data using Little Thumb.
Process-oriented System Analysis Process Mining. BPM Lifecycle.
Graphs & Matrices Todd Cromedy & Bruce Nicometo March 30, 2004.
CS212: Object Oriented Analysis and Design Lecture 32: Use case and Class diagrams.
"Decomposing Alignment- based Conformance Checking of Data-aware Process Models" Massimiliano de Leoni, Jorge Muñoz-Gama, Josep Carmona, Wil van der Aalst.
Spectrum Sensing In Cognitive Radio Networks
Paper_topic: Parallel Matrix Multiplication using Vertical Data.
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
15.082J & 6.855J & ESD.78J September 30, 2010 The Label Correcting Algorithm.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Collecting Copyright Transfers and Disclosures via Editorial Manager™ -- Editorial Office Guide 2015.
30 januari 2018 Mining Social Networks Uncovering interaction patterns in business processes Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology.
7 mei 2018 Process Mining in CSCW Systems All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei.
Automate Does Not Always Mean Optimize
MTAT Business Process Management (BPM) Lecture 11: Process Monitoring and Mining Fabrizio Maggi (based on lecture material by Marlon Dumas, Wil.
Profiling based unstructured process logs
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
June 2017 High Density Clusters.
Structural testing, Path Testing
Community detection in graphs
James B. Orlin Presented by Tal Kaminker
Decomposed Process Mining: The ILP Case
SAD ::: Spring 2018 Sabbir Muhammad Saleh
Results, Discussion, and Conclusion
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Using Use Case Diagrams
A Fast Algorithm For Finding Frequent Episodes In Event Streams
Compact routing schemes with improved stretch
3 mei 2019 Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst Ana Karla A. de Medeiros.
Business Alignment Using Process Mining as a Tool for Delta Analysis
5 juli 2019 Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst Ana Karla A. de Medeiros.
19 augustus 2019 Mining Social Networks Uncovering interaction patterns in business processes Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology.
Concepts of Computation
Presentation transcript:

Discovering Social Networks from Enterprise Data Laks V.S. Lakshmanan Based on: Wil M.P. van der Aalst, Hajo A. Reijers, Minseok Song. Discovering Social Networks from Event Logs. Full version of paper in Business Process Management (BPM) 2004.

General Remarks Discovering/mining SN from (some) data vs. mining a given SN for extracting some value (we’re talking about the former here). What kind of data: ◦ Can be ◦ Event log from a business process (this paper) ◦ Video capturing interactions Event log – also called audit trail, history, transaction file. Project opportunity here. 2

Model Events = {(case, activity, person),... ordered by time}. Case (process instance) = “thing” being handled: e.g., customer order, job app., building permit, license app., loan app, insurance claim etc. Activity (task, operation, action, work-item) = some operation performed on the case by a person: e.g., contact customer, check credit rating, contact references, visit site etc. What do you look for?: ◦ Is there a handover? (e.g., (c,a1,p1)  (c,a2,p2)). ◦ Does it happen often enough? ◦ What are the org roles of persons involved? 3

Process Mining – A Related Area Given an event log, mine a process, which can be: ◦ A Petri net. ◦ A model with org/temporal/info/social aspects (most relevant to us). ◦ Here is an example log and an example social graph we can extract right away, based on handover or “immediately followed by” on a case. 4

An example event log case activity performer case 1 activity A John case 2 activity A John case 3 activity A Sue case 3 activity B Carol case 1 activity B Mike case 1 activity C John case 2 activity C Mike case 4 activity A Sue case 2 activity B John case 2 activity D Pete case 5 activity A Sue case 4 activity C Carol case 1 activity D Pete case 3 activity C Sue case 3 activity D Pete case 4 activity B Sue case 5 activity E Clare case 5 activity D Clare case 4 activity D Pete 5

Social Graph Mined John Sue Mike Carol Pete Clare Mined graph could be enhanced with (relative frequencies), handoff delays [not mentioned in paper]. SNA can be done on it: who are the “power centers”? Which are the cliques? Can we (org) enable other interactions to improve efficiency?... This graph need not be viewed purely conjunctively (my thoughts). E.g., “Jack always hands over to Jane or Peter, depending on activity type, or case type” (assuming meta-data on both. Most interesting case: when meta- data+timing info. is available. ) 6

Some challenges (in)completeness: log may not exhibit all possible orders (when concurrency is in the underlying model); rare occurrences and exceptions (both + and –) should be handled with care. noise: data could be missing and/or erroneous. legal issues: affect quality/utility/granularity of data available, if at all. 7

Social Network Analysis Here are some interesting measuremenets one can make on the mined SN. Convention: distance for us = distance of geodesic, unless otherwise stated; d uv = distance between u and v. What is the density of the whole graph (sociocentric) or of a person’s neighborhood network (egocentric)?: density = #edges/possible no.; what is the diameter? What is the average distance of v to other nodes? What proportion of the geodesics between other node pairs passes through v? 8

SNA (contd.) Bavelas-Leavitt index of centrality of node u, BL(u) = ∑ v,w d vw / ∑ v,w (d vu + d uw ). ◦ captures how much a shortest route through u stretches an “average” geodesic. Paper doesn’t say this, but makes more sense w/ “v≠u≠w, v≠w”. Will assume this below. 1 BL(1) = ( )/(6x2) = 9/12. Closeness(u) = 1/∑ v d vu 9

Digression into process mining Efficient algorithms exist for mining a process model from an event log. Can reveal if true causality exists between activities. E.g., can say process = A followed by one of {B,C} in any order OR E, then followed by D. Note: Will use this later for discriminating between causal and non- causal transfers of work. 10

A slightly more general def. of event log Let A be a set of activities and P a set of performers. E = A×P is the set of (possible) events, i.e., combinations of an activity and a performer ((a, p) denotes the execution of activity a by performer p). C = E ∗ is the set of possible event sequences (traces describing a case). L ∈ B(C) is an event log, where B(C) is the set of all bags (multi-sets) over C. How does this def. abstract actual event logs? Notation: π $a (e) = a and π $p (e) = p for event e = (a, p). 11

Mining SN from an event log Can use a mining algorithm analogous to frequent itemset mining or more specifically episode mining (to be overviewed soon). Key is choosing the right metric for filtering arcs. Some metrics look at just transfer of work, some insist on causality (need knowledge of process). 12

Metrics based on (possible) causality Direct and indirect succession (direct a special case): e.g., John-1->Mike, John-3- >Pete. With or without checking causality: e.g., Mike=1=>John is false and Mike=2=>Pete is true (taking causality into account). Boolean vs count version: |John-1->Mike| = 2; |Mike=2=>Pete| = 2. (Verify using log table.) 13

Metrics based on work transfer p-X->q = #times p transferred work to q/total #possible such times: e.g., John-X- >Mike = 2/( ). p-.X->q = #cases in which p transferred work to q at least once/length of log. p- βX->q = same as p-X->q, except longer successions (length n) are penalized by β n-1, where 0<β<1. P-β.X->q = same as p- βX->q, except only count distinct successions within each case. β – “causality fall factor”. 14

“In between” metrics p- ◊n->q = p did some action at i and some other at i+n, and q did some action at j: i<j<i+n. ||p- ◊2->q|| = total #times a “◊2-in- between” occurred between p and q/total #possible such occurrences. We can inject causality into this. We can introduce causality fall factor β here too. 15

Working together metric p c q = p and q do some action (not necessarily same) for case c. Then p L q = #cases on which they worked together/#cases on which p worked (does that remind you of some familiar measure?). E.g., John L Pete = 2/2 whereas Pete L John = 2/4. Can compute a matrix of users x actions with M[u,a] = #times u did a (e.g.). Then use row vectors (users) to define similarity (similarly to what we will do in RecSys!) 16

Patterns Found Conducted on Dutch national public works dept. Responsible for road and water infrastructure. ◦ 17 activities, 4,988 cases, 33,603 lines of log, and 43 employees (users). 17

SN based on handover metrics 43 nodes, 406 edges, density= can conduct SNA on this graph. 18

Concluding Remarks See paper for other SN mined by using different metrics. Challenges: ◦ scalability of actually mining SN using different metrics. ◦ Scalability of conducting requisite SNA on the mined networks. 19

Other Questions Can you think of other things worth measuring in event logs? Key is measured patterns/quantities should be actionable and should yield value for business. 20

Other Social Network Discovery Papers (for your talks) Ting Yu; Lim, S.-N.; Patwardhan, K.; Krahnstoever, N., "Monitoring, recognizing and discovering social networks," Computer Vision and Pattern Recognition, CVPR IEEE Conference on, vol., no., pp.1462,1469, June Sinisa Pajevic and Dietmar Plenz. Efficient Network Reconstruction from Dynamical Cascades Identifies Small-World Topology of Neuronal Avalanches. PLoS Comput Biol. 5(1),