Sequential PAttern Mining using A Bitmap Representation

Slides:



Advertisements
Similar presentations
1
Advertisements

Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
2 |SharePoint Saturday New York City
Green Eggs and Ham.
Association Rule Mining
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
LO: Count up to 100 objects by grouping them and counting in 5s 10s and 2s. Mrs Criddle: Westfield Middle School.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
CpSc 3220 Designing a Database
Sequential PAttern Mining using A Bitmap Representation
Sequential Pattern Mining Using A Bitmap Representation
Presentation transcript:

Sequential PAttern Mining using A Bitmap Representation 2014/11/20 Sequential PAttern Mining using A Bitmap Representation Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick Dept. of Computer Science Cornell University (SIGKDD 2002) Presenter 李佩書 P76034525 楊璨瑜 P76034672 陳奕廷 P78031125 李昕純 Q56034035

Outline Introduction The SPAM algorithm Data representation 2014/11/20 Outline Introduction The SPAM algorithm Data representation Experimental Conclusion & Discussion

2014/11/20 Introduction

Sequential Patterns R. Agrawal and R. Srikant.(In ICDE 1995) 2014/11/20 Sequential Patterns R. Agrawal and R. Srikant.(In ICDE 1995) Algorithm:AprioriALL, AprioriSOME, PrefixSpan…

Problem Mining sequential patterns Given a minimum support minSup 2014/11/20 Problem Mining sequential patterns Given a minimum support minSup Find all frequent sequential patterns Sa supD(Sa) ≥ minSup

SPAM Algorithm Sequential PAttern Mining Algorithm 2014/11/20 SPAM Algorithm Sequential PAttern Mining Algorithm The first DFS(depth-first search) strategy for mining sequential patterns Vertical bitmap representation for simple, efficient counting.

2014/11/20 The SPAM Algorithm

Lexicographic Tree Sequence-extended Sequence (S-step) 2014/11/20 Lexicographic Tree Sequence-extended Sequence (S-step) Generate by adding a new transaction consisting of a single item to the end of sequence Ex: ({a, b, c}, {a, b})→({a, b, c}, {a, b}, {a}) Itemset-extended sequence (I-step) Generate by adding an item to the last itemset in the sequence Ex 1: ({a, b, c}, {a, b}) →({a, b, c}, {a, b, d}) Ex 2: ({a, b, c}, {a, b, d}) →({a, b, c}, {a, b, d, c}) Identifies two sets of each node n Sn: the set of candidate items for S-step extensions In: the set of candidate items for I-step extensions

2014/11/20 I={a,b}

Pruning Apriori-Based Minimizing the size of Sn and In 2014/11/20 Pruning Apriori-Based Minimizing the size of Sn and In Pruning candidate by DFS. S-step Pruning I-step Pruning

S-step Pruning S({a}) = {a, b, c, d} I({a}) = {b, c, d} 2014/11/20 S-step Pruning S({a}) = {a, b, c, d} I({a}) = {b, c, d} S({a}, {a}) = S({a}, {b}) = {a, b, c, d} I({a}, {a}) = {b, c, d} I({a}, {b}) = {c, d}

I-step Pruning S({a, b}) = S({a, d}) = {a, b} I({a}, {b}) = {c, d} 2014/11/20 I-step Pruning S({a, b}) = S({a, d}) = {a, b} I({a}, {b}) = {c, d} I({a}, {d}) = {}

2014/11/20

2014/11/20 Data Representation

We store each candidate sequence as a vertical bitmap 2014/11/20 We store each candidate sequence as a vertical bitmap Each customer is assigned a fixed slice of each bitmap for all of its transactions If the size of a sequence between 2k+1 and 2k+1 2k+1-bit sequence

2014/11/20 Bitmap of itemset {a} {b} {a,b} 1 1 1 &

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c})

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c}) 1

Bitmap of sequence Define B(s) as the bitmap for sequence s. Example1: 2014/11/20 Bitmap of sequence Define B(s) as the bitmap for sequence s. In sequence s If the last itemset is in transaction j and the other itemsets is in transaction before j Then set 1,otherwise set 0 Example1: Customer ID Transaction ID Itemset 1 {b} 2 {d} 3 {e} 4 {c} ({b},{c}) 1

Example2 ({a},{b,d}) Customer ID Transaction ID Itemset 1 {a,b,d} 3 2014/11/20 Example2 Customer ID Transaction ID Itemset 1 {a,b,d} 3 {b,c,d} 6 -- ({a},{b,d}) 1

2014/11/20 S-step Process Step 1 : S-Step Process to construct the transformed bitmap ({a})s Step 2 : ANDing B({a})s and B({b})s Support=2

2014/11/20 S-step Process Step 1:S-Step Process to construct the transformed bitmap ({a})s Step 2:ANDing B({a}) s and B({b})s

2014/11/20 I-step Process Support=2

2014/11/20 I-step Process

2014/11/20 Experimental

Comparison With SPADE and PrefixSpan 2014/11/20 Comparison With SPADE and PrefixSpan Method-1 Compare for various minimum support values on Small datasets Medium datasets Large datasets Methods-2 Compare several parameters in the dataset Number of customers Number of transactions per customer Number of items per transaction Average length of the maximal sequences

2014/11/20

Conclusion & Discussion 2014/11/20 Conclusion & Discussion

CONCLUSION ALGORITHM DATA REPRESENTATION 2014/11/20 CONCLUSION ALGORITHM Outperforms SPADE and PrefixSpan on large datasets Faster then SPADE and PrefixSpan DATA REPRESENTATION Bitmap representation S-step/I-step traversal S-step/I-step pruning Especially efficient when the sequential patterns are very long

Implement SPAM algorithm 2014/11/20 Implement SPAM algorithm SPMF is an mining mining framework Written in Java/Open-source data http://www.philippe-fournier-viger.com/spmf/index.php Philippe-Fournier-Viger, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Cheng-Wei Wu and Vincent S. Tseng, "SPMF: a Java Open-Source Pattern Mining Library," accepted and to appear in Journal of Machine Learning Research.  

2014/11/20 DISCUSSION SPAM assumes that the entire database completely fit into main memory, what is the solution ? Why they set the size of a sequence between 2k+1 and 2k+1 ?