Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China 2 University of Texas - Pan American, USA 3 Peking.

Slides:



Advertisements
Similar presentations
String Similarity Measures and Joins with Synonyms
Advertisements

A lightweight framework for testing database applications Joe Tang Eric Lo Hong Kong Polytechnic University.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
A Multiresolution Symbolic Representation of Time Series
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Jianmin Wang 1, Shaoxu Song 1, Xuemin Lin 2, Xiaochen Zhu 1, Jian Pei 3 1 Tsinghua University, China 2 University of New South Wales, Australia 3 Simon.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Developing Trust Networks based on User Tagging Information for Recommendation Making Touhid Bhuiyan et al. WISE May 2012 SNU IDB Lab. Hyunwoo Kim.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Jianmin Wang 1, Shaoxu Song 1, Xiaochen Zhu 1, Xuemin Lin 2 1 Tsinghua University, China 2 University of New South Wales, Australia 1/23 VLDB 2013.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
V 1.0Slide 1 How to generate compliment and warning letter ? Award & Punishment – Generate letters.
Feature Detection in Ajax-enabled Web Applications Natalia Negara Nikolaos Tsantalis Eleni Stroulia 1 17th European Conference on Software Maintenance.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
UML’s StateChart FSM, EFSM in UML Concurrent states Tool support.
Multi-object Similarity Query Evaluation Michal Batko.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Text Clustering Hongning Wang
AIFB Ontology Mapping I3CON Workshop PerMIS August 24-26, 2004 Washington D.C., USA Marc Ehrig Institute AIFB, University of Karlsruhe.
Ferdowsi University of Mashhad 1 Automatic Semantic Web Service Composition based on owl-s Research Proposal presented by : Toktam ghafarian.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Shaoxu Song 1, Aoqian Zhang 1, Lei Chen 2, Jianmin Wang 1 1 Tsinghua University, China 2Hong Kong University of Science & Technology, China 1/19 VLDB 2015.
Glen Jeh & Jennifer Widom KDD  Many applications require a measure of “similarity” between objects.  Web search  Shopping Recommendations  Search.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.
Experience Report: System Log Analysis for Anomaly Detection
Automatically Labeled Data Generation for Large Scale Event Extraction
Applying Deep Neural Network to Enhance EMPI Searching
Distributed voting application for handheld devices
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Patterns extraction from process executions
UML’s StateChart FSM, EFSM in UML Concurrent states Tool support.
Probably Approximately
Property consolidation for entity browsing
Sequential Data Cleaning: A Statistical Approach
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Approaching an ML Problem
Efficient Subgraph Similarity All-Matching
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Actively Learning Ontology Matching via User Interaction
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China 2 University of Texas - Pan American, USA 3 Peking University, China 1/21 SIGMOD 2014

 Motivation  Event Matching Similarity  Structural Similarity Function  Iterative Computation  Estimation  Matching Composite Events  Experiments  Conclusion 2/21 SIGMOD 2014

 Information systems play an important role in large enterprises:  Enterprise Resource Planning (ERP)  Office Automation (OA)  These systems record the business history in their event logs. 3/21 SIGMOD 2014 Trace IDTraceTrace IDTrace 1ACDEFACDEF6BCDEFBCDEF 2BCDFEBCDFE7BCDFEBCDFE 3ACDFEACDFE8BCDEFBCDEF 4ACDFEACDFE9BCDFEBCDFE 5ACDEFACDEF10BCDFEBCDFE ACDEFACDEF Event IDTrace IDEvent NameTimestamp 11Pay by Cash (A) :33:34 21Check Inventory (C) :18:11 31Validate (D) :31:50 41Ship Goods (E) :14:26 51 Customer (F) :17:18

 Complex event processing  Provenance analysis  Decision support 4/21 Business Data Warehouse Event Logs Beijing Subsidiary Event Logs Shanghai Subsidiary Event Logs Hong Kong Subsidiary Information systems SIGMOD 2014 Exploring the correspondence among events

 Different events may represent the same activity 5/21 IDTrace t1Pay by Cash (A)  Check Inventory (C)  Validate (D)  Ship Goods (E)  Customer (F) t2Pay by Credit Card (B)  Check Inventory (C)  Validate (D)  Customer (F)  Ship Goods (E) …… IDTrace s1Order Accepted (1)  Pay by Cash (2)  Inventory Checking & Validation (4)  ????????? (5)  Send Notification (6) s2Order Accepted (1)  Pay by Credit Card (3)  Inventory Checking & Validation (4)  Send Notification (6)  ???????? (5) …… SIGMOD 2014 Linguistic Matching Dislocated Matching Semantic Matching Opaque Matching Composite Events Matching

 Text Similarity fails  Statistics and structural information  Event Log  Event Dependency Graph (V, E, f) 6/21 Trace IDTrace 1ACDEFACDEF 2BCDFEBCDFE 3ACDFEACDFE 4ACDFEACDFE 5ACDEFACDEF 6BCDEFBCDEF 7BCDFEBCDFE 8BCDEFBCDEF 9BCDFEBCDFE 10BCDFEBCDFE A B C D E F f(B,C)= f(A)=0.4 frequency of appearance frequency of consecutive events SIGMOD 2014

Linguistic Matching Semantic Matching Opaque Matching Dislocated Matching Composite Events Graph Edit Distance Opaque Schema Matching Behavioral Matching Event Matching Similarity 7 1. R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.

8/21 A B C D E F Event Logs Dependency Graphs Event Matching Similarities Corresponde nces Composite Event Matching Trace IDTrace 1ACDEFACDEF …… Trace IDTrace 11245612456 …… A B C D E F A  2, B  3, C  4, D  1 E  5, F  6 A  2, B  3, {C,D}  4, E  5, F  6 Event Matching Similarities SIGMOD 2014

 Motivation  Event Matching Similarity  Intuition  Iterative Computation  Estimation  Matching Composite Events  Experiments  Conclusion 9/21 SIGMOD 2014

 Intuition of evaluating the similarity of two events v 1 and v 2 :  1. S(v 1,v 2 )=1, if both v 1 and v 2 have no input neighbor;  2. v 1 is similar to v 2, if they frequently share similar input neighbors. 10/21 SIGMOD 2014 * G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, pages 538–543, A B C D E F Problem: Cannot deal with dislocated matching

11/21 SIGMOD 2014 A B C D E F

12/21 SIGMOD 2014 A B C D E F A B C D E F I = 0 I = 1 I = 2 I = A B C D E F A B C D E F A B C D E F

13/21 SIGMOD 2014 Trade-off between accuracy and efficiency.

 Motivation  Event Matching Similarity  Structural Similarity Function  Iterative Computation  Estimation  Matching Composite Events  Experiments  Conclusion 14/21 SIGMOD 2014

 Candidates of Composite Events:  C and D, E and F…  Pre-defined or discovered automatically  Heuristics:  Which candidate improves the average similarity 15/21 SIGMOD 2014 A B C D E F A B C,D E F A B C D E,F

 Motivation  Event Matching Similarity  Structural Similarity Function  Iterative Computation  Estimation  Matching Composite Events  Experiments  Conclusion 16/21 SIGMOD 2014

 Real Life Data Set: employed from a real bus manufacturer  True event matching is generated manually by domain experts.  Criteria: to evaluate the accuracy of event matching,  F-measure of precision and recall.  Baseline: Graph Edit Distance 1, Opaque matching 2, Behavioral Matching R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, /21 No. of Event Logs149Min Event Size2 No. of Traces6000Max Event Size11 ICDE 2014

18/21 ICDE 2014 Our Approach

19/21 ICDE 2014

 Event matching framework:  Work well with dislocated matching.  Work well with opaque event names.  An estimative function for trade-off.  Heuristics on matching composite events. 20/21 SIGMOD 2014

Thanks ! 21/21 SIGMOD 2014