1 † Prog. Lang. & Sys. Lab Dept of Comp. Science National Uni. of Singapore Current: (Sch. of Info. Systems, Singapore Management Uni.) Efficient Mining.

Slides:

Advertisements

Similar presentations

Recap: Mining association rules from large datasets

Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

A distributed method for mining association rules

Data Mining Techniques Association Rule

Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Frequent Closed Pattern Search By Row and Feature Enumeration

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.

1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Association Analysis: Basic Concepts and Algorithms.

Discrete Structures Chapter 2 Part B Mathematical Induction

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.

David Lo Siau-Cheng Khoo Chao Liu DASFAA 2008 Efficient Mining of Recurrent Rules from a Sequence Database 1.

Embedded Systems Laboratory Department of Computer and Information Science Linköping University Sweden Formal Verification and Model Checking Traian Pop.

Mining Association Rules

Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}

Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)

Mining Association Rules

1 Formal Engineering of Reliable Software LASER 2004 school Tutorial, Lecture1 Natasha Sharygina Carnegie Mellon University.

A Short Introduction to Sequential Data Mining

What Is Sequential Pattern Mining?

October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.

Sequential PAttern Mining using A Bitmap Representation

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Yang Liu, Jun Sun and Jin Song Dong School of Computing National University of Singapore.

Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.

Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.

Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang

Sequential Pattern Mining

CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.

1 David Lo 1,2 Siau-Cheng Khoo 2 Chao Liu 3 1 Singapore Management University 2 National University of Singapore 3 Microsoft Research, Redmond Mining Past-Time.

Safety-Critical Systems 5 Testing and V&V T

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.

1 Efficient Mining of Iterative Patterns for Software Specification Discovery David Lo † Joint work with: Siau-Cheng Khoo † and Chao Liu ‡ † Prog. Lang.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

Temporal Database Paper Reading R 資工碩一馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.

Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.

HACNet Simulation-based Validation of Security Protocols Vinay Venkataraghavan Advisors: S.Nair, P.-M. Seidel HACNet Lab Computer Science and Engineering.

CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur

Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

10/23/ /23/2017 Presented at KDD’09 Classification of Software Behaviors for Failure Detection: A Discriminative Pattern Mining Approach David Lo1,

G10 Anuj Karpatne Vijay Borra

Association rule mining

Association Rules Repoussis Panagiotis.

Frequent Pattern Mining

Market Basket Many-to-many relationship between different objects

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

DIRECT HASHING AND PRUNING (DHP) ALGORITHM

Association Rule Mining

Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.

Data Mining Association Analysis: Basic Concepts and Algorithms

Programming Languages 2nd edition Tucker and Noonan

Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS

Association Analysis: Basic Concepts

Programming Languages 2nd edition Tucker and Noonan

Presentation transcript:

1 † Prog. Lang. & Sys. Lab Dept of Comp. Science National Uni. of Singapore Current: (Sch. of Info. Systems, Singapore Management Uni.) Efficient Mining of Recurrent Rules from a Sequence Database ‡ Data Mining Group Department of Computer Science Uni. of Illinois at Urbana- Champaign Current: (Microsoft Research, Redmond) David Lo †* Joint work with: Siau-Cheng Khoo † and Chao Liu ‡

2 Motivation o Huge amount of data exists, we want to mine knowledge from data. o Recurrent Rules “Whenever a series of precedent events (pre) occurs, eventually another series of consequent events (post) occurs.” Denoted as: pre->post o We want to mine for recurrent rules from a sequence database.

3 Recurrent Rules – Intuitive Examples o Locking Protocol o Internet Banking “Whenever a lock is acquired, eventually it is released” “Whenever a connection to a bank server is made and authentication is completed, money transfer command is issued and verified, eventually money is transferred and notification is displayed.”

4 Soft. Specifications & Recurrent Rule o Recurrent rule – Corresponds to a family of program properties useful for software verification o Formalized in Linear Temporal Logic o Mining for these software specs are often incomplete, outdated [ABL02,DSB04,LKL07] o Mining specifications helps in: – Understanding existing/legacy systems – Help verification tools to ensure correctness of systems and detect bugs.

5 Problem Statements “Given a set of sequences, find rules that recur (are satisfied) a significant number of times within a sequence and across multiple sequences. A rule is significant if it satisfies minimum thresholds of supports and confidence. ” Problem 2 “Mine a set of non-redundant significant recurrent rules.” Problem 1

6 Extending Sequential Rules [S99] oSequential rule pre->post: – Rules formed by composing sequential patterns [AS95,YHA03,WH04]: series of events supported (i.e. a sub-sequence of) by a significant number of sequences. – Whenever a sequence is a super-seq. of pre it will also be a super-seq. of pre++post oRecurrent rule: - Multiple occurrences of the rule’s premise and consequent both within a sequence and across multiple sequences are considered

7 Extending Episode Rules [MTV97] oEpisode rule pre->post: – Episode: series of events occurring close together (e.g., in a window). – Whenever a window is a super-seq. of pre it will also be a super-seq. of pre++post. oRecurrent rule: – Handle multiple sequences – We want to break the window barrier – It is hard to tell the right window size –Lock separated frm unlock by arbitrary no of evs – We mine a non-redundant set of rules

8 Preliminaries

9 Linear Temporal Logic (LTL) oFormalism to precisely specify temporal requirements. oIt works on paths [HR03] oThere are a number of operators: oG p – Globally at every point in time p holds oF p – At that point in time or eventually (Finally) p holds oX p – p holds at the neXt point in time RuleLTL a -> bG(a->XF(b)) -> G(a->XG(b->XF(c^XF(d))))

10 Checking or Verifying Temporal Logics Automata Model main lock use unlock lock use unlock lockend To Check Violation LTL property to check -> Transform Possible Traces or Sequences main lock use unlock lock end main lock use unlock lock use unlock end main lock use unlock end … main(x){ if (lock=0) lock;use;unlock;lock; else for i: 1 to 10 lock;use;unlock } Program 10

11 Concepts, Definitions And Rules Semantics

12 Temporal Points “Whenever a series of precedent events occurs at a point in time or temporal point, eventually another series of consequent events occurs.” -Peek at interesting temporal points & see what series of evs are likely to happen next -Temporal points in a sequence S - The indices in S, starting from 1. - Consider a sequence. There are 6 temporal points in the sequence. -For a temporal point j in S=, the prefix of S is called j-prefix of S.

13 Occurrences & Instances oConsider a pattern P, and a sequence S oThe set of all occurrences of P in S, Occ(P,S) is the set: {j| P j-prefix of S && last (P) = S[j] } oThe set of all instances of P in S, Inst(P,S) is the set: {j-prefix of S | j is in Occ(P,S)} oConsider the sequence – The set of occurrences of is {2,4,6} – Instances of is: {,, } – Correspond to temporal points to be checked for rules with as premise

14 Projected and Projected-all DB oA database SeqDB projected on pattern P is defined as: SeqDB P = {(j,sx)| s = SeqDB[j], s = px++sx, where px is the minimal prefix of s containing P} ID.Sequence S1 S2 ID.Sequence S1 S2 SeqDB

15 Projected and Projected-all DB oA database SeqDB projected-all on pattern P is defined as: SeqDB P = {(j,sx)| s = SeqDB[j], s = px++sx, where px is an instance of P} oReturn temporal points to check all ID.Sequence S1 S2 SeqDB ID.Sequence S1 i S1 ii S2 i S2 ii SeqDB all

16 Counting Supports and Confidence oConsider the rule pre->post oSequence Support (s-sup): The number of sequences where the prefix pre appears. oInstance support (i-sup): The number of instances of pre++post. oConfidence (conf): The likelihood that post appears after pre. This can be found by computing the ratio: Instances of pre, where post eventually occurs afterwards = |Instances of pre| |(SeqDB pre ) post | |SeqDB pre | all

17 Counting Supports and Confidence s-sup ( -> ) = 2 i-sup ( -> ) = 3 conf( -> ) = 1.0 conf( -> ) = 0.5 Seq ID.Sequence S1 S2 X X

18 Properties, Theorems, and Algorithms

19 Apriori Properties – Support & Conf. Theorem 1. Consider two rule Rx = p->c & Ry = q -> c. If p q and s-sup(Rx) < min-s-sup, then s-sup(Ry) < min-s-sup. Rx: a -> z ; s-sup(Rx) < min_s-sup a,b -> z a,b,c -> z a,c -> z a,b,d -> z …. Non- significant Ry s Theorem 2. Consider two rule Rx = p->c & Ry = p -> d. If c d and conf(Rx) < min-conf, then conf(Ry) < min-conf. Rx: a -> z ; conf(Rx) < min_conf a -> b,z a -> b,c,z a -> c,z a -> b,d,z …. Ry s

20 Rule Redundancy oConsider two rules Rx = p->c and Ry = q -> d. Rx is redundant if the following conditions hold: 1.Rx is a sub-seq. of Y (i.e., p++c q++d) 2.Rx & Ry have the same sup. and conf. values. Redundant rules are identified and removed early during mining process. a -> b a -> c a -> b,c a -> b,d …. Redundant iff sup and conf are the same Rx: a -> b,c,d Ry s

21 Theorem 3. Given two pre-conditions PX and PY where PX PY, if SeqDB PX = SeqDB PY then for all sequences of events post, rules PX -> post is rendered redundant by PY -> post. -> post Redundant Rules: …. Theorem 4. Given two rules RX (pre -> CX) and RY (pre -> CY ) if CX CY and (SeqDB pre ) CX = (SeqDB pre ) CY then RX is rendered redundant by RY and can be pruned. all pre -> Redundant Rules: ….

22 Algorithm oStep 1: Mine a pruned set of pre-conditions – Satisfy min-s-sup threshold – Use Theorems 1 & 3 oStep 2: For each pre-cond. pre, create SeqDB pre. oStep 3: Mine a pruned set of post-conditions – Corresponding rules satisfy min-conf. – Use Theorems 2 & 4 oStep 4: Remove rules that don’t satisfy min-i-sup. oStep 5: Filter any remaining redundant rules. all

23 Equiv. Proj DB & LS-Set Patterns oFrom Theorem 3 (& 4), a pre- (post-) condition is not pruned iff: there does not exist any super-sequence pattern having the same projected database. oAlso referred to as projected-database closed or LS-Set (Yan and Han, 2003) oWe generate this set by modifying BIDE (Wang and Han, 2004) - Keep the search space pruning strategy - Remove the closure checks - Proof of completeness in technical report

24 Mine Pruned Pre-Conds Mine Pruned Post-Conds Check Instance Support & Remove Remaining Red. Rules

25 Performance & Case Study

26 Synthetic Dataset D5C20N10S20 147x Faster, 8500x More Compact

27 Gazelle Dataset KDD Cup 2000 Full-set of significant rules is not minable

28 JBoss Security Premise Consequent XLoginConfImpl.getCfgEntry() AuthenticationInfo.getName() ClientLoginModule.initialize() ClientLoginModule.login() ClientLoginModule.commit() SecAssocActs.setPrincipalInfo() SetPrincipalInfoAction.run() SecAssocActs.pushSubjectCtx() SubjectThdLocalStack.push() SimplePrincipal.toString() SecAssoc.getPrincipal() SecAssoc.getCredential() SecAssoc.getPrincipal() SecAssoc.getCredential() Whenever login configuration information is checked, eventually invocations of authentication events, binding of principal to subject, utilization of subject & principal information occur

29 Conclusion oWe propose a novel framework to mine a non- redundant set of significant recurrent rules: “Whenever a series of precedent events occurs, eventually a series of consequent events occurs” oEmploy 2 apriori properties and 2 redundancy thms oMajor speedup and reduction of rules by non- redundant rule mining strategy. oWe show the utility in mining behavior of JBoss Security Future Work o Improve mining speed o More case studies and apps to DM/SE problems