Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.

Slides:



Advertisements
Similar presentations
Data Mining Techniques Association Rule
Advertisements

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
ICS 421 Spring 2010 Data Mining 1 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/6/20101Lipyeow Lim.
Data Mining Association Analysis: Basic Concepts and Algorithms
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Chapter 2: Association Rules & Sequential Patterns.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Data Mining Association Rules: Advanced Concepts and Algorithms
Modul 8: Sequential Pattern Mining
Sequential Pattern Mining
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Data Mining Find information from data data ? information.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: İlkcan Keleş.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Data Mining Find information from data data ? information.
Spring 2016 Presentation by: Julianne Daly
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association rule mining
Association Rules Repoussis Panagiotis.
Mining Sequential Patterns
Spatio-temporal Rule Mining: Issues and Techniques
Mining Sequential Patterns
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Rules: Advanced Concepts and Algorithms
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Context Level DFD Video Purchase System Video Information Management
Mining Sequential Patterns
Market Basket Analysis and Association Rules
Presentation transcript:

Data Mining Techniques Sequential Patterns

Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data A record in such data typically consists of the transaction date and the items bought in the transaction Very often, data records also contain customer-id, particularly when the purchase has been made using a credit card or a frequent-buyer card Catalog companies also collect such data using the orders they receive

Sequential Pattern Mining An example of such a pattern is that customers typically rent “Star Wars ( 星際大戰 )”, then “Empire Strikes Back ( 帝國大反擊 )”, and then “Return of the Jedi ( 絕地大反攻 )” These rentals need not be consecutive –Customers who rent some other videos in between also support this sequential pattern Elements of a sequential pattern need not be simple items –“Computer Science and Programming Language”, followed by “Data Structure”, followed by “System Programs and Operating Systems” is an example of a sequential pattern in which the elements are sets of items

Sequential Pattern Mining Given Transaction Time, Customer Id, Items Bought Original Database Answer Set

Definition The length of a sequence is the number of itemsets in the sequence A sequence of length k is called a k-sequence The support for an itemset i is defined as the fraction of customers who bought the items in i in a single transaction The itemset i and the 1-sequence have the same support An itemset with minimum support is called a large (frequent) itemset or litemset

AprioriAll Algorithm Each itemset in a large sequence must have minimum support Any large sequence must be a list of litemsets Finding all sequential patterns in five phases –Sort Phase –Litemset Phase –Transformation Phase –Sequence Phase –Maximal Phase

AprioriAll Algorithm: Sort Phase Customer-Sequence Version of the Database

AprioriAll Algorithm: Litemset Phase Apriori/DHP FP Growth min_sup_count=2

AprioriAll Algorithm: Transformation Phase

AprioriAll Algorithm: Sequence Phase Customer SequencesLarge 1-Sequences Large 2-Sequences Large 3-Sequences Large 4-Sequences Maximal Large Sequences 2

Sequence Phase: Candidate Generation

AprioriAll Algorithm: Maximal Phase The sequence is contained in, since (3)  (3 8), (4 5)  (4 5 6) and (8)  (8) The sequence is not contained in (and vice versa) –The former represents items 3 and 5 being bought one after the other –The latter represents items 3 and 5 being bought together. In a set of sequences, a sequence s is maximal if s is not contained in any other sequence.

AprioriAll Algorithm With minimum support set to 25%, i.e., a minimum support of 2 customers – and are maximal – which is only supported by customer 2 does not have minimum support –,,,,, and, though having minimum support, are not in the answer because they are not maximal. Answer Set

Summary

Discussions AprioriAll algorithm will generate a huge set of candidate sequences –If there are 1000 frequent sequences of length-1, the algorithm will generate 1000 × (1000 × 999) / 2 = 1,499,500 candidate sequences Many scans of databases in mining Difficulties at mining long sequential patterns

Research Topics Time-Interval Sequential Patterns Time-Gap Sequential Patterns Non-redundant Sequential Patterns Constrained Sequential Pattern Mining Multi-dimensional Sequential Patterns Generalized Sequential Patterns Incremental Mining Sequential Patterns Data Stream Sequential Pattern Mining Interactive Mining Sequential Patterns

Exercise 6 A Sequence Database (min-sup = 50%) Customer sequence SID