Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China 2 University of Texas - Pan American, USA 3 Peking University, China 1/21 SIGMOD 2014
Motivation Event Matching Similarity Structural Similarity Function Iterative Computation Estimation Matching Composite Events Experiments Conclusion 2/21 SIGMOD 2014
Information systems play an important role in large enterprises: Enterprise Resource Planning (ERP) Office Automation (OA) These systems record the business history in their event logs. 3/21 SIGMOD 2014 Trace IDTraceTrace IDTrace 1ACDEFACDEF6BCDEFBCDEF 2BCDFEBCDFE7BCDFEBCDFE 3ACDFEACDFE8BCDEFBCDEF 4ACDFEACDFE9BCDFEBCDFE 5ACDEFACDEF10BCDFEBCDFE ACDEFACDEF Event IDTrace IDEvent NameTimestamp 11Pay by Cash (A) :33:34 21Check Inventory (C) :18:11 31Validate (D) :31:50 41Ship Goods (E) :14:26 51 Customer (F) :17:18
Complex event processing Provenance analysis Decision support 4/21 Business Data Warehouse Event Logs Beijing Subsidiary Event Logs Shanghai Subsidiary Event Logs Hong Kong Subsidiary Information systems SIGMOD 2014 Exploring the correspondence among events
Different events may represent the same activity 5/21 IDTrace t1Pay by Cash (A) Check Inventory (C) Validate (D) Ship Goods (E) Customer (F) t2Pay by Credit Card (B) Check Inventory (C) Validate (D) Customer (F) Ship Goods (E) …… IDTrace s1Order Accepted (1) Pay by Cash (2) Inventory Checking & Validation (4) ????????? (5) Send Notification (6) s2Order Accepted (1) Pay by Credit Card (3) Inventory Checking & Validation (4) Send Notification (6) ???????? (5) …… SIGMOD 2014 Linguistic Matching Dislocated Matching Semantic Matching Opaque Matching Composite Events Matching
Text Similarity fails Statistics and structural information Event Log Event Dependency Graph (V, E, f) 6/21 Trace IDTrace 1ACDEFACDEF 2BCDFEBCDFE 3ACDFEACDFE 4ACDFEACDFE 5ACDEFACDEF 6BCDEFBCDEF 7BCDFEBCDFE 8BCDEFBCDEF 9BCDFEBCDFE 10BCDFEBCDFE A B C D E F f(B,C)= f(A)=0.4 frequency of appearance frequency of consecutive events SIGMOD 2014
Linguistic Matching Semantic Matching Opaque Matching Dislocated Matching Composite Events Graph Edit Distance Opaque Schema Matching Behavioral Matching Event Matching Similarity 7 1. R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.
8/21 A B C D E F Event Logs Dependency Graphs Event Matching Similarities Corresponde nces Composite Event Matching Trace IDTrace 1ACDEFACDEF …… Trace IDTrace 11245612456 …… A B C D E F A 2, B 3, C 4, D 1 E 5, F 6 A 2, B 3, {C,D} 4, E 5, F 6 Event Matching Similarities SIGMOD 2014
Motivation Event Matching Similarity Intuition Iterative Computation Estimation Matching Composite Events Experiments Conclusion 9/21 SIGMOD 2014
Intuition of evaluating the similarity of two events v 1 and v 2 : 1. S(v 1,v 2 )=1, if both v 1 and v 2 have no input neighbor; 2. v 1 is similar to v 2, if they frequently share similar input neighbors. 10/21 SIGMOD 2014 * G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, pages 538–543, A B C D E F Problem: Cannot deal with dislocated matching
11/21 SIGMOD 2014 A B C D E F
12/21 SIGMOD 2014 A B C D E F A B C D E F I = 0 I = 1 I = 2 I = A B C D E F A B C D E F A B C D E F
13/21 SIGMOD 2014 Trade-off between accuracy and efficiency.
Motivation Event Matching Similarity Structural Similarity Function Iterative Computation Estimation Matching Composite Events Experiments Conclusion 14/21 SIGMOD 2014
Candidates of Composite Events: C and D, E and F… Pre-defined or discovered automatically Heuristics: Which candidate improves the average similarity 15/21 SIGMOD 2014 A B C D E F A B C,D E F A B C D E,F
Motivation Event Matching Similarity Structural Similarity Function Iterative Computation Estimation Matching Composite Events Experiments Conclusion 16/21 SIGMOD 2014
Real Life Data Set: employed from a real bus manufacturer True event matching is generated manually by domain experts. Criteria: to evaluate the accuracy of event matching, F-measure of precision and recall. Baseline: Graph Edit Distance 1, Opaque matching 2, Behavioral Matching R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, /21 No. of Event Logs149Min Event Size2 No. of Traces6000Max Event Size11 ICDE 2014
18/21 ICDE 2014 Our Approach
19/21 ICDE 2014
Event matching framework: Work well with dislocated matching. Work well with opaque event names. An estimative function for trade-off. Heuristics on matching composite events. 20/21 SIGMOD 2014
Thanks ! 21/21 SIGMOD 2014