Jianmin Wang 1, Shaoxu Song 1, Xiaochen Zhu 1, Xuemin Lin 2 1 Tsinghua University, China 2 University of New South Wales, Australia 1/23 VLDB 2013.

Slides:



Advertisements
Similar presentations
From Local Patterns to Global Models: Towards Domain Driven Educational Process Mining Nikola Trčka Mykola Pechenizkiy.
Advertisements

Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance [1] Pirooz Chubak May 22, 2008.
Problems and Their Classes
A university for the world real R © 2009, Chapter 3 Advanced Synchronization Moe Wynn Wil van der Aalst Arthur ter Hofstede.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Sequential Patterns & Process Mining Current State of Research Edgar de Graaf LIACS.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Process discovery: Inductive Miner
Aligning Event Logs And Declare Models for Conformance Checking Massimiliano de Leoni, Fabrizio Maggi Wil van der Aalst.
Aligning Event Logs and Process Models for Multi- perspective Conformance Checking: An Approach Based on ILP Massimiliano de Leoni Wil M. P. van der Aalst.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.
Introduction Combining two frameworks
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
*Department of Computing Science University of Newcastle upon Tyne **Institut für Informatik, Universität Augsburg Canonical Prefixes of Petri Net Unfoldings.
History-Dependent Petri Nets Kees van Hee, Alexander Serebrenik, Natalia Sidorova, Wil van der Aalst ?
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.
Rule Generation [Chapter ]
Process Mining Control flow process discovery Fabrizio Maria Maggi (based on Process Mining book – Springer copyright 2011 and lecture material by Marlon.
1 A Petri Net Siphon Based Solution to Protocol-level Service Composition Mismatches Pengcheng Xiong 1, Mengchu Zhou 2 and Calton Pu 1 1 College of Computing,
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.
Jorge Muñoz-Gama Universitat Politècnica de Catalunya (Barcelona, Spain) Algorithms for Process Conformance and Process Refinement.
Process Mining Control flow process discovery
Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths Mugurel Ionut Andreica Polytechnic University of Bucharest Computer.
Access Path Selection in a Relational Database Management System Selinger et al.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Parallel #2 Paper – Phylogeny and Branch and Bound Algorithms George McGinn
Jianmin Wang 1, Shaoxu Song 1, Xuemin Lin 2, Xiaochen Zhu 1, Jian Pei 3 1 Tsinghua University, China 2 University of New South Wales, Australia 3 Simon.
Reporter :PCLee The decisions on when to acquire debug data during post-silicon validation are determined by trigger events that are programmed.
Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China 2 University of Texas - Pan American, USA 3 Peking.
MILP algorithms: branch-and-bound and branch-and-cut
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Optimization of Wavelength Assignment for QoS Multicast in WDM Networks Xiao-Hua Jia, Ding-Zhu Du, Xiao-Dong Hu, Man-Kei Lee, and Jun Gu, IEEE TRANSACTIONS.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Han-na Yang Rediscovering Workflow Models from Event-Based Data using Little Thumb.
Process-oriented System Analysis Process Mining. BPM Lifecycle.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Decomposing Data-aware Conformance Checking Massimiliano de Leoni, Jorge Munoz-Gama, Josep Carmona, Wil van der Aalst PAGE 0.
Lagrangean Relaxation
CompSci On the Limits of Computing  Reasons for Failure 1. Runs too long o Real time requirements o Predicting yesterday's weather 2. Non-computable.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Shaoxu Song 1, Aoqian Zhang 1, Lei Chen 2, Jianmin Wang 1 1 Tsinghua University, China 2Hong Kong University of Science & Technology, China 1/19 VLDB 2015.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Experience Report: System Log Analysis for Anomaly Detection
David Redlich, Thomas Molka, Wasif Gilani, Awais Rashid, Gordon Blair
RE-Tree: An Efficient Index Structure for Regular Expressions
On Efficient Graph Substructure Selection
Automatic Physical Design Tuning: Workload as a Sequence
Efficient Subgraph Similarity All-Matching
Normalization cs3431.
Multi-phase process mining
3 mei 2019 Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst Ana Karla A. de Medeiros.
Search.
Search.
5 juli 2019 Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance Wil van der Aalst Ana Karla A. de Medeiros.
Wei Wang University of New South Wales, Australia
Presentation transcript:

Jianmin Wang 1, Shaoxu Song 1, Xiaochen Zhu 1, Xuemin Lin 2 1 Tsinghua University, China 2 University of New South Wales, Australia 1/23 VLDB 2013

 Motivation  Recovery Algorithms  Filling gaps in causal net  Branching framework  Dealing with loops  Experiments  Conclusion  Ongoing Work 2/23 VLDB 2013

 Information systems record the business history in their event logs. 3/23 VLDB 2013 Huge Amount of Event Data: Corporation Products No. of Event Sequences 1,230,000 Power Generatior 3,260,000 Machinery 2,600,000 Train E IDT IDEvent NameTimestamp 11Order :33:34 21Pay by Cash :10:17 31Check Inventory :18:11 41Validate :31:50 51Delivery :14:26 Event Sequence: Order  Pay by cash  Check inventory  Validate  Delivery

 Business events often follow certain business rules or constraints 4/23 VLDB Order  Pay by cash  Check inventory  Validate  Delivery Order  Check inventory  Pay by credit card  Check credit history  Validate  Delivery Order  Check inventory  Pay by cash  Re-check  Check inventory  Validate  Delivery Process specification Event sequences follow Constraints by Petri net: Sequence Parallel Choice A B CD E H FG Order Pay by credit card Check credit history Check inventory Pay by cash Re-check Validate Delivery XOR split XOR join AND split AND join Order  Pay by cash Pay by cash  Check inventory Check inventory  Pay by cash Pay by cash Pay by credit card

5/23 VLDB 2013  The Causes of Missing Events: Man-made errors (typo); System failures (power down); Lost during system integration (mismatching).  Survey in one division of CNR: 47% events are missed. Sequence of events A B CD E H FG Order Pay by credit card Check credit history Check inventory Pay by cash Re-check Validate Delivery complete incomplete B F G A E A C E FG D

 Why Need Recovery? Incomplete provenance answer Inaccurate query answer 6/23 VLDB Order  Check inventory  Re-check  Check inventory  Validate  Delivery SEQ(Pay by cash, Validate), which means “Pay by cash” followed by “Validate”. What are the prerequisites of “Validate” in sequence 1? Event sequence 1 is lost. “Pay by cash” is lost. Answer: “Order”, “Check Inventory” and “Re-check”. Answer: sequence 2. Pay by cash  2 Order  Pay by cash  Check inventory  Validate  Delivery

 Recovery Insert the missing prerequisites into the gap; Based on the specification (follows constraints). 7/23 VLDB 2013 Event sequence has missing event between ACE and F, called a GAP (ACE, F). A possible recovery is A B CD E H FG Order Pay by credit card Check credit history Check inventory Pay by cash Re-check Validate Delivery A E C F D

 Multiple Possible Recoveries 8/23 Loop structures: To recover (AB,F), we have ABEF, ABEHEF, ABEHEHEF... VLDB 2013 Choice structures: To recover (AE,F), we have AEBF, AECDF What is the most likely recovery? A B CD E H FG Order Pay by credit card Check credit history Check inventory Pay by cash Re-check Validate Delivery A E F B CD A B F E H

 Intuition: Generally, people try to make the minimum mistakes.  Principle: Follow the minimum change law in improving data quality.  Problem: The cost of a recovery, which is the number of inserted events should be minimized. 9/23 VLDB 2013 To recover : is a minimum recovery, while is not. A B CD E H FG Order Pay by credit card Check credit history Check inventory Pay by cash Re-check Validate Delivery

 Hardness:  Owing to choices and parallelization of flows, there exist vast alternatives to enumerate in the recovery;  We prove the NP-hardness of this problem.  Existing Approach: The Alignment algorithm*  Enumerates a factorial number of redundant recoveries.  Our Observation:  All the sequences in S are equivalent w.r.t. the specification;  Any of S is a minimum recovery;  Not necessary to enumerate. *M. de Leoni, F. M. Maggi, and W. M. P. van der Aalst. Aligning event logs and declarative process models for conformance checking. In BPM, pages 82–97, /23 VLDB 2013 To recover (A,E), it generates S: {ABCDE, ABDCE, ACBDE, ACDBE, ADBCE, ADCBE} A B C D E A E B C D

 Motivation  Recovery Algorithms 1.Filling gaps upon causal net 2.Branching framework 3.Dealing with loops  Experiments  Conclusion  Ongoing Work 11/23 VLDB Net with Choice Structures B 3. Net with Loop Structures H 1. Causal Net A C D E F G

 Causal Net: net without XOR structures.  It can be translated into a equivalent DAG.  Based on parallel constraints & sequence constraints:  Each complete sequence must be one of the topological sorts on the DAG.  A recovery of incomplete sequence is a subsequence of a complete sequence.  is a subsequence of  Hints: fills the missing prerequisites by backtracking. 12/23 VLDB 2013 A B C D E A B C D E

 Problem: topological sort cannot be applied on choice structure.  Hints:  Enumerate all the possible execution of the choice structures  Each execution is a causal net, topological sort can be applied. 13/23 VLDB 2013 To recover (A, E) (A, BE) (A, CDE) A B C E D A B C E D E Recovery with minimum cost

1.Branching Index: prune the branches not including the right-side event of gap. 2.Path Reachability Pruning: prune the branches that the left-side of gap are not reachable to the right-side of gap. 3.Branch and Bound: prune the branches that may lead to possible recoveries but cannot be the minimum one. 4.Local Optimality: identify groups of transitions that always share the same branching, thus only one of them needs to be computed. 14/23 VLDB 2013 A B C E D E Gap (A,E) Gap (A,D) Gap (AC,E) A E C LB = 4 LB = 3 E E A E E A D

 Problem: when to terminate?  Hints: considers loop at most once (minimum cost). 15/23 To recover (ABC, B) VLDB 2013 A E BDC A E BDC A BC B A A BC B E (ABC, AB) Not valid (ABC, EB) A BDC E A BC (ABC, EBCEB) Not minimum E B C E B

 Motivation  Recovery Algorithms  Filling gaps in causal net  Branching framework  Dealing with loops  Experiments  Conclusion  Ongoing Work 16/23 VLDB 2013

 Real Life Data Set: employed from CNR*  Setting  we randomly remove 10%-90% events from the complete sequences;  apply the recovery methods to recover the removed events.  Criteria: to evaluate the accuracy of recovery,  F-measure of precision and recall.  Baseline: the Alignment approach. * 17/23 VLDB 2013 No. of Process Specifications 149 Max Specification Size 63 transitions 79 places No. of Event Sequences 4470 Max Sequence Size 1000 events Causal net 25 Net with choices (no loop) 57 Net with loops 67

 The accuracy is no worse than Alignment algorithm.  Our algorithm shows great improvement in time cost. 18/23 VLDB 2013

 The pruning methods are effective. 19/23 VLDB 2013 Best approach

 Local+Reach+Bound can always achieve the lowest time cost. 20/23 VLDB 2013 Best approach

 We define the specification-based minimum recovery problem for missing events.  We prove the NP-hardness of this problem.  To efficiently find the optimal recovery:  We propose a backtracking idea to reduce the redundant sequences with respect to parallel events.  We construct a branching index, and develop reachability checking and lower bounds of recovery distances to further accelerate the computation.  The local optimal method can reduce the number of intermediate results. 21/23 VLDB 2013

 In this paper of missing events, we assume that the logged event data are clean.  However, the logged event data may also be dirty. Encoding example: UTF-8: Re-check  GB2312: □ e □ ? □  Without addressing erroneous events, Inaccurate provenance answer Inaccurate query answer  By the insertion technique proposed in this paper, we cannot elimate the erroneous events.  Instead, we need to modify the mistakely recorded events Non-trivial, also NP-hard Heuristic approach may apply 22/23 VLDB 2013 A Order Pay by credit card Check credit history Check inventory Validate Delivery Check inventory □e□?□□e□?□ C D E E F G

Thanks ! 23/23 VLDB 2013