Zhou Zhao, Da Yan and Wilfred Ng

Slides:



Advertisements
Similar presentations
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Advertisements

Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Sequential Patterns & Process Mining Current State of Research Edgar de Graaf LIACS.
Frequent Closed Pattern Search By Row and Feature Enumeration
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Multi-dimensional Sequential Pattern Mining
Sequential Pattern Mining
Sequence Databases & Sequential Patterns
Association Analysis: Basic Concepts and Algorithms.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
RECURSIVE PATTERNS WRITE A START VALUE… THEN WRITE THE PATTERN USING THE WORDS NOW AND NEXT: NEXT = NOW _________.
What Is Sequential Pattern Mining?
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
你的一小步,我的一大步 Jen-Wei Huang 黃仁暐 National Taiwan University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
ICDE 2012 Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data Yongxin Tong 1, Lei Chen 1, Bolin Ding 2 1 Department of Computer.
SPLASH: Structural Pattern Localization Analysis by Sequential Histograms A. Califano, IBM TJ Watson Presented by Tao Tao April 14 th, 2004.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology CONTOUR: an efficient algorithm for discovering discriminating.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
Frequent Sequential Attack Patterns of Malware in Botnets Nur Rohman Rosyid.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Sequential Pattern Mining
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
G10 Anuj Karpatne Vijay Borra
Sequential Pattern Mining Using A Bitmap Representation
Efficient Closed Pattern Mining in Strongly Accessible Set Systems
Frequent Pattern Mining
Data Mining: Concepts and Techniques
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Frequent Itemsets over Uncertain Databases
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Association Rule Mining
The General Triangle C B A.
Discriminative Pattern Mining
Data Warehousing Mining & BI
The General Triangle C B A.
Presentation transcript:

Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases Zhou Zhao, Da Yan and Wilfred Ng The Hong Kong University of Science and Technology

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Background Uncertain data are inherent in many real world applications Sensor network RFID tracking Prob. = 0.9 Sensor 2: AB Readings: C B A Prob. = 0.1 Sensor 1: BC

Background Uncertain data are inherent in many real world applications Sensor network RFID tracking t1: (A, 0.95) Reader A t2: (B, 0.95), (C, 0.05) Reader B Reader C

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Problem Definition  

Pruning rules for p-FSP  

Early Validating Suppose that pattern α is p-frequent on D’ ⊆ D, then α is also p-frequent on D If α is p-FSP in D11, then α is p-FSP in D.

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Sequence-level probabilistic model DB: Possible World Space: Sequence ID Instances Probability s1 s11= ABC 1 s2 s21 = AB s22 = BC 0.9 0.05 Possible World Probability pw1 = {s11, s12} pw2 = {s11, s22} pw3 = {s11}  

Prefix-projection of PrefixSpan SID Sequence s1 _BCBC s2 _BC s3 _B SID Sequence s1 ABCBC s2 BABC s3 AB s4 BC SID Sequence s1 _CBC s2 _C s3 _ A B D|A D|AB D

P-FSP anti-monotonicity.  

SeqU-PrefixSpan Algorithm SeqU-PrefixSpan recursively performs pattern-growth from the previous pattern α to the current β = αe, by appending an p-frequent element e ∈ D |α We can stop growing a pattern α for examination, once we find that α is p-infrequent

Sequence Projection A B si si|A si|B Seq-Instances Prob. si1 = ABCBC 0.3 si2 = BABC 0.2 si3 = AB 0.4 si4 = BC 0.1 si A Seq-Instances Prob. si1 = _CBC 0.3 si2 = _BC 0.2 si3 = _ 0.4 Seq-Instances Prob. si1 = _BCBC 0.3 si2 = _BC 0.2 si3 = _B 0.4 B si|A si|B

    Seq-Instances Prob. si1 = _BCBC 0.3 si2 = _BC 0.2 si3 = _B 0.4

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Element-level probabilistic model DB: Possible World Space: Sequence ID Probabilistic Elements s1 s1[1]={(A,0.95)} s1[2]={(B,0.95),(C,0.05)} s2 s2[1]={(A,1)}, s2[2] = {(B,1)} Possible World Probability pw1 = {B,AB} pw2 = {C,AB} pw3 = {AB,AB} pw4 = {AC,AB}  

Possible world explosion Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} # of possible instances is exponential to sequence length Seq-Instance Prob. pw1(si)=ABCB pw2(si)=ABCA pw3(si)=ABAB pw4(si)=ABAA pw5(si)=ACCB pw6(si)=ACCA pw7(si)=ACAB pw8(si)=ACAA 0.0056 0.0504 0.0084 0.0756 0.0224 0.2016 0.0336 0.3024 pw9(si)=BBCB pw10(si)=BBCA pw11(si)=BBAB pw12(si)=BBAA pw13(si)=BCCB pw14(si)=BCCA pw15(si)=BCAB pw16(si)=BCAA 0.0024 0.0216 0.0036 0.0324 0.0096 0.0864 0.0144 0.1296

ElemU-PrefixSpan Algorithm  

Probabilistic Elements Sequence Projection   Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} pos suffix Pr. 1 _si[2]si[3]si[4] 2 _si[3]si[4] 4 _ pos suffix Pr. _si[1]si[2]si[3]si[4] 1 B    

Probabilistic Elements Sequence Projection   Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} pos suffix Pr. 1 _si[2]si[3]si[4] 2 _si[3]si[4] 4 _  

Probabilistic Elements Sequence Projection   Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} pos suffix Pr. 1 _si[2]si[3]si[4] 2 _si[3]si[4] 4 _   A     pos suffix Pr. 3 _si[4]  

Probabilistic Elements Sequence Projection   Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} pos suffix Pr. 1 _si[2]si[3]si[4] 2 _si[3]si[4] 4 _   A     pos suffix Pr. 3 _si[4] 4 _ 0.1584  

   

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Efficiency of SeqU-PrefixSpan Efficiency on the effects of size of database number of seq-instances length of sequence

Efficiency of ElemU-PrefixSpan Efficiency on the effects of size of database number of element-instances length of sequence

ElemU-PrefixSpan v.s. Full Expansion Efficiency on the effects of size of database number of element-instances length of sequence

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

Conclusion We formulate the problem of mining p-SFP in uncertain databases. We propose two new U-PrefixSpan algorithms to mine p- FSPs from data that conform to our probabilistic models. Experiments show that our algorithms effectively avoid the problem of “possible world explosion”.

Thank you!