Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Ch 4. Heuristic Search 4.0 Introduction(Heuristic)
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Chapter 5: Introduction to Information Retrieval
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.
Artificial Intelligence Lecture
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
Xyleme A Dynamic Warehouse for XML Data of the Web.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
SST:an algorithm for finding near- exact sequence matches in time proportional to the logarithm of the database size Eldar Giladi Eldar Giladi Michael.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Min-Max Trees Based on slides by: Rob Powers Ian Gent Yishay Mansour.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Taylor Expansion Diagrams (TED): Verification EC667: Synthesis and Verification of Digital Systems Spring 2011 Presented by: Sudhan.
Fa05CSE 182 CSE182-L5: Scoring matrices Dictionary Matching.
CBLOCK: An Automatic Blocking Mechanism for Large-Scale Deduplication Tasks Ashwin Machanavajjhala Duke University with Anish Das Sarma, Ankur Jain, Philip.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Equality Function Computation (How to make simple things complicated) Nitin Vaidya University of Illinois at Urbana-Champaign Joint work with Guanfeng.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Backtracking. N-Queens The object is to place queens on a chess board in such a way as no queen can capture another one in a single move –Recall that.
Experiments An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints Entity Extraction A Document An Efficient Filter.
Querying Structured Text in an XML Database By Xuemei Luo.
Jianmin Wang 1, Shaoxu Song 1, Xuemin Lin 2, Xiaochen Zhu 1, Jian Pei 3 1 Tsinghua University, China 2 University of New South Wales, Australia 3 Simon.
Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China 2 University of Texas - Pan American, USA 3 Peking.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
Jianmin Wang 1, Shaoxu Song 1, Xiaochen Zhu 1, Xuemin Lin 2 1 Tsinghua University, China 2 University of New South Wales, Australia 1/23 VLDB 2013.
Clustering XML Documents for Query Performance Enhancement Wang Lian.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Speeding Up Enumeration Algorithms with Amortized Analysis Takeaki Uno (National Institute of Informatics, JAPAN)
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Arc Consistency CPSC 322 – CSP 3 Textbook § 4.5 February 2, 2011.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Local Search. Systematic versus local search u Systematic search  Breadth-first, depth-first, IDDFS, A*, IDA*, etc  Keep one or more paths in memory.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
On Schema Matching with Opaque Column Names and Data Values Jaewoo Kang NC State (Aug 2003) Jeffrey F. Naughton Univ. of Wisconsin-Madison.
2010 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT) Hierarchical Cost-sensitive Web Resource Acquisition.
R-customizers Goal: define relation between graph and its customizers, study domains of adaptive programs, merging of interface class graphs.
Shaoxu Song 1, Aoqian Zhang 1, Lei Chen 2, Jianmin Wang 1 1 Tsinghua University, China 2Hong Kong University of Science & Technology, China 1/19 VLDB 2015.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Multiple-goal Search Algorithms and their Application to Web Crawling Dmitry Davidov and Shaul Markovitch Computer Science Department Technion, Haifa 32000,
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Experience Report: System Log Analysis for Anomaly Detection
Associative Query Answering via Query Feature Similarity
Mining Frequent Itemsets over Uncertain Databases
Comparative RNA Structural Analysis
Sequential Data Cleaning: A Statistical Approach
MCN: A New Semantics Towards Effective XML Keyword Search
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Actively Learning Ontology Matching via User Interaction
Presentation transcript:

Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29 ICDE 2014

 Motivation  Event Matching Framework  A* Search Algorithm  Computing the Normal Distance G  Simple Upper Bound of H  Advanced Bounding Function  Pay-As-You-Go Matching  Experiments  Conclusion 2/29 ICDE 2014

 Information systems play an important role in large enterprises:  Enterprise Resource Planning (ERP)  Office Automation (OA)  These systems record the business history in their event logs. 3/29 ICDE 2014 Trace IDTraceTrace IDTrace 1ABCDEFABCDEF6ACBDEFACBDEF 2ACBDEFACBDEF7ACBDFEACBDFE 3ACBDFEACBDFE8ACBDFEACBDFE 4ABCDFEABCDFE9ACBDFEACBDFE 5ACBDEFACBDEF10ACBDFEACBDFE ABCDEFABCDEF Event IDTrace IDEvent NameTimestamp 11Order Received (A) :33:34 21Payment (B) :10:17 31Check Inventory (C) :18:11 41Ship Goods (D) :31:50 51Record Order (E) :14:26 61Send Notification (F) :17:18

 Complex event processing  Provenance analysis  Decision support  Exploring the correspondence among events 4/29 ICDE 2014 Business Data Warehouse Event Logs Beijing Subsidiary Event Logs Shanghai Subsidiary Event Logs Guangzhou Subsidiary Information systems

 Different events may represent the same activity 5/29 Event NameTimestamp Order Received (A) :33:34 Payment (B) :10:17 Check Inventory (C) :18:11 Ship Goods (D) :31:50 Record Order (E) :14:26 Send Notification (F) :17:18 ICDE 2014 Event NameTimestamp JD (1) :12:07 YD (2) :27:14 TJD (3) :30:18 CK (5) :35:32 ZF (4) :50:12 FH (6) :30:47 DL (7) :31:12 FT (8) :40:40 Abbreviation of Chinese phonetic representation English name

 Text similarity fails  statistics and structural information  Event Log  Event Dependency Graph (V, E, f) 6/29 ICDE 2014 Trace IDTrace 1ABCDEFABCDEF 2ACBDEFACBDEF 3ACBDFEACBDFE 4ABCDFEABCDFE 5ACBDEFACBDEF 6ACBDEFACBDEF 7ACBDFEACBDFE 8ACBDFEACBDFE 9ACBDFEACBDFE 10ACBDFEACBDFE A B C D E F f(A,C)= f(A,A) =1.0 frequency of appearance frequency of consecutive events

7/29 Event Log 1 Event Log 2 A B C G1G G2G2 ICDE 2014 A B C G1G G2G2 A B C G1G G2G2 A B C G1G G2G2 How to evaluate the best mapping?

8/29 ICDE 2014 A B C G1G G2G2 B 2 B C A A  1, B  2, C  3 A, B, C (A,B), (A,C), (C,B)

9/29 * J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, ICDE 2014

10/29 ICDE 2014 A B C G1G G2G2 B C A B C A

11/29 ICDE 2014 A B C D E F G1G G2G2 A B C D E F A B C D E F Vertex+Edge is not discriminative enough Fail !

 Event Pattern: particular orders of event occurrence 12/29 ICDE 2014 Trace IDTrace 1ABCDEFABCDEF 2ACBDEFACBDEF 3ACBDFEACBDFE 4ABCDFEABCDFE 5ACBDEFACBDEF 6ACBDEFACBDEF 7ACBDFEACBDFE 8ACBDFEACBDFE 9ACBDFEACBDFE 10ACBDFEACBDFE not match match

13/29 ICDE 2014

14/29 ICDE 2014 A B C D E F G1G G2G2 A B C D E F A B C D E F Patterns: Vertex pattern: A, B, C, D, E, F Edge pattern: SEQ(A,B), SEQ(A,C), SEQ(B,C), SEQ(C,B), SEQ(B,D), SEQ(C,D), SEQ(D,E), SEQ(D,F), SEQ(E,F), SEQ(F,E) Complex pattern: SEQ(A, AND(B, C), D) SEQ(A, AND(B, C), D)  SEQ(3, AND(4, 5), 6)

15/29 ICDE 2014 Key issue is efficiency

 Motivation  Event Matching Framework  A* Search Algorithm  Computing the Normal Distance G  Simple Upper Bound of H  Advanced Bounding Function  Pay-As-You-Go Matching  Experiments  Conclusion 16/29 ICDE 2014

17/29 ICDE 2014

18/29 ICDE 2014 Root node node 1node 2 node 3 node 5 node 6 node 7 node 10 node 4 g: 0.8 h: 3.0 g+h: 3.8 g: 1.0 h: 3.0 g+h: 4.0 g: 0.7 h: 3.0 g+h: 3.7 g: 0.5 h: 3.0 g+h: 3.5 g: 1.8 h: 2.0 g+h: 3.8 g: 2.0 h: 2.0 g+h: 4.0 g: 1.2 h: 2.0 g+h: 3.2 g: 4.0 h: 0.0 g+h: 4.0 1,2,3,41,2,3,4 A C 1,3,41,3,4 Terminate when U 1 or U 2 is empty

19/29 ICDE 2014 AB C D Patterns: A, B, C, D, SEQ(A,B), SEQ(B,C), SEQ(C,B), SEQ(C,D), SEQ(A,B,C), SEQ(B,C,D) G1G1 G2G2 1. newly introduced patterns:, SEQ(C,B) C, SEQ(B,C), SEQ(A,B,C) 2. prune unmapped patterns: 3. compute similarities: 3, SEQ(2,3), SEQ(1,2,3), SEQ(C,B) Parent node: Child node:

20/29 ICDE 2014 AB C D Patterns: A, B, C, D, SEQ(A,B), SEQ(B,C), SEQ(C,B), SEQ(C,D), SEQ(A,B,C), SEQ(B,C,D) G1G1 G2G2 Remaining Patterns: D, SEQ(C,D), SEQ(B,C,D)

21/29 ICDE 2014 Upper Bound a general pattern a complex pattern

 Motivation  Event Matching Framework  A* Search Algorithm  Computing the Normal Distance G  Simple Upper Bound of H  Advanced Bounding Function  Pay-As-You-Go Matching  Experiments  Conclusion 22/29 ICDE 2014

 Motivation:  Interesting event patterns are gradually identified.  Best matching may change.  Two heuristic strategy:  Continue  Restart 23/29 ICDE 2014 Materialize leaf nodes Materialize previous answer for pruning

 Motivation  Event Matching Framework  A* Search Algorithm  Computing the Normal Distance G  Simple Upper Bound of H  Advanced Bounding Function  Pay-As-You-Go Matching  Experiments  Conclusion 24/29 ICDE 2014

 Real Life Data Set: employed from the bus manufacturer  True-mapping is generated manually by domain experts.  Criteria: to evaluate the accuracy of event matching,  F-measure of precision and recall.  Baseline: Opaque matching 1, Iterative Matching J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, /29 No. of Event Logs38Min Event Size2 No. of Traces3000Max Event Size11 ICDE 2014

26/29 ICDE 2014 Our Approach

 More patterns, higher accuracy;  Pay-as-you-go strategies accelerate the re-computation of new event matching. 27/29 ICDE 2014

 Pattern based generic framework  (Vertex+Edge+Complex) Patterns  Compatible with existing methods.  An advanced bounding function.  Support matching in a pay-as-you-go style. 28/29 ICDE 2014

Thanks ! 29/29 ICDE 2014