Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Sequential PAttern Mining using A Bitmap Representation
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
A distributed method for mining association rules
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Data e Web Mining Paolo Gobbo
Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo.
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
Mining Multiple-level Association Rules in Large Databases
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Fast Algorithms for Association Rule Mining
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Secure Incremental Maintenance of Distributed Association Rules.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Modul 8: Sequential Pattern Mining
Sequential Pattern Mining
External data structures
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Association Rule Mining
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: İlkcan Keleş.
Spring 2016 Presentation by: Julianne Daly
Mining Sequential Patterns
Mining Sequential Patterns
Association Rule Mining
Mining Sequential Patterns
Presentation transcript:

Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.

Introduction What is a “sequential pattern”? Answers to final exam questions.

What is a “sequential pattern”? Requires set of attributes deciding each tuple’s class. Call this the class set. Requires set of attributes deciding each tuple’s class. Call this the class set. Example: class set = {customer-id} Example: class set = {customer-id} Tuples are sorted into classes. Tuples are sorted into classes.

Requires set of attributes used for ordering tuples. Call this the order set. Requires set of attributes used for ordering tuples. Call this the order set. Example: order set = {transaction-time} Example: order set = {transaction-time} Tuples within classes are sorted according to an order defined over order set codomain. Tuples within classes are sorted according to an order defined over order set codomain.

Specifying a value for each attribute in (class set U order set) must be specifying at most one tuple. (class set U order set forms primary key.) Specifying a value for each attribute in (class set U order set) must be specifying at most one tuple. (class set U order set forms primary key.) Support and confidence measure classes now, not tuples. Support and confidence measure classes now, not tuples.

Example: order set = {transaction-time} class set = {customer-id} Example: order set = {transaction-time} class set = {customer-id}

Classes: {Joe, Sarah}

Ordering within classes according to order set:

A large sequence (support = 100%) is (intuitive) (intuitive) I,,,,, and,, and are also large sequences.

Example: order set = {year, month} class set = {} Example: order set = {year, month} class set = {}

Classes: {}

Ordering within classes (class!) according to order set:

Intuition suggests large sequence but this is not considered any “larger” than and and because there is only one class.

One more point about the previous example. Having recorded <{goldfish}, {lobster}, {monkey}> as a large sequence, why record subsequences? and, though and, though large sequences, are not informative. “Maximal sequence”.

final exam questions Root of each algorithm: (1) Group into classes and order. (2) Find all large itemsets. (3) For each tuple, drop everything except a record of the large itemsets contained in that tuple. (4) Find all large sequences (of large itemsets). (5) Discard large sequences not maximal.

Consider a previous example.

Large itemsets (min-sup = 100%): {knife}, {beer}, {knife, beer}, {Band-Aids}. Set {knife} to 1, {beer} to 2, {knife, beer} to 3, and {Band-Aids} to 4. Transform tuples to ((1 2 3) (1 4)) ((1 2 3) (4))

Large sequences (actually with 100% support) are ((1)), ((2)), ((3)), ((4)), ((1) (4)), ((2) (4)), and ((3) (4)) But, since ((3) (4)) implies all the others, only ((3) (4)) is a maximal large sequence.

Potentially large vs. Definitely large (candidate sequences vs. large sequences). Potentially large – no counting, but many. Definitely large – counting, but few. Algorithms similar to Apriori, but with sequences of large itemsets instead of large sets of items.

AprioriAll – Counts every large sequence, including those not maximal. AprioriSome – Generates every candidate sequence, but skips counting some large sequences (Forward Phase). Then, discards candidates not maximal and counts remaining large sequences (Backward Phase).

AprioriAll scans the database more, taking more time. AprioriSome keeps more potentially large sequences in memory, degenerating to AprioriAll when requests for memory fail.

“There were two types of algorithms presented to find sequential patterns, CountSome and CountAll. What was the main difference between the two algorithms?”

CountAll (AprioriAll) is careful with respect to minimum support, careless with respect to maximality. CountSome (AprioriSome) is careful with respect to maximality, careless with respect to minimum support.

“What was the greatest hardware concern regarding the algorithms contained in the paper?”

Main memory capacity. When there is little main memory, or many potentially large sequences, the benefits of AprioriSome vanish.

“How did the two best sequence mining algorithms (AprioriAll and AprioriSome) perform compared with each other? Take into consideration memory, speed, and usefulness of the data.”

Memory: In terms of main memory usage, AprioriAll is better. In terms of secondary storage access, AprioriSome is better.

Speed: With sufficient memory, as minimum support decreases the difference between AprioriAll and AprioriSome increases. (AprioriSome is better.) More large sequences not maximal are generated.

Usefulness of the data: For the problem of finding maximal large sequences, the answer is “Precisely the same.”. However, AprioriAll finds all large sequences, while AprioriSome discards some large sequences that aren’t maximal. AprioriAll, then, generates more “useful” data. “The user may want to know the ratio of the number of people who bought the first k + 1 items in a sequence to the number of people who bought the first k items.”