A Short Introduction to Sequential Data Mining

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Graph Mining Laks V.S. Lakshmanan
gSpan: Graph-based substructure pattern mining
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Rakesh Agrawal Ramakrishnan Srikant
Multi-dimensional Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining
Frequent Item Based Clustering M.Sc Student:Homayoun Afshar Supervisor:Martin Ester.
Sequence Databases & Sequential Patterns
Mining Sequential Patterns Dimitrios Gunopulos, UCR.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
Association Analysis: Basic Concepts and Algorithms.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
Data Mining: Concepts and Techniques 1 Mining Sequence Patterns in Transactional Databases CS240B --UCLA Notes by Carlo Zaniolo Based on those by J. Han.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Presented by Yaron Gonen. Outline Introduction Problems definition and motivation Previous work The CAMLS Algorithm Overview Main contributions Results.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
Panagiotis Papapetrou Department of Computer Science Boston University Constraint-based Mining of Frequent Arrangements of Temporal Intervals Master Thesis.
What Is Sequential Pattern Mining?
實驗室研究暨成果說明會 Content and Knowledge Management Laboratory (B) Data Mining Part Director: Anthony J. T. Lee Presenter: Wan-chuen Lin.
Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He X. Ao et al. Online Frequent Episode Mining1.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
Frequent Sequential Attack Patterns of Malware in Botnets Nur Rohman Rosyid.
Data Mining Association Rules: Advanced Concepts and Algorithms
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Panagiotis Papapetrou, George Kollios, Stan Sclaroff, Dimitrios Gunopulos Department of Computer Science Boston University University of California, Riverside.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Chapter 6: Mining Frequent Patterns, Association and Correlations
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Data Mining: Principles and Algorithms Mining Sequence Patterns
Sequential Pattern Mining
Information Management course
Association rule mining
Advanced Pattern Mining 02
Data Mining: Concepts and Techniques
Jiawei Han Department of Computer Science
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Data Mining: Concepts and Techniques — Chapter 8 — 8
Data Warehousing Mining & BI
FP-Growth Wenlong Zhang.
Presentation transcript:

A Short Introduction to Sequential Data Mining Koji IWANUMA Hidetomo NABESHIMA University of Yamanashi The First Franco-Japanese Symposium on Knowledge Discovery in System Biology, September 17, Aix-en-Provence

Two Main Frameworks of Sequential Mining Sequential pattern mining for multiple data sequences Sequential pattern mining for a single data sequence Sequence ID Purchase data record 1 <bread, cheese> 2 <(wheat, milk), bread, (berry, sausage)> 3 <(bread, pumpkin, sausage)> 4 <bread, cheese, sausage> 5 <cheese> Data sequence <S1 S2 S3 S4 S5 S6 S7 … … Sn>

What Is Sequential Pattern Mining? J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji What Is Sequential Pattern Mining? Given a set of sequences, find the complete set of frequent subsequences A sequence : < (ef) (ab) (df) c b > A sequence database SID sequence 10 <a(abc)(ac)d(cf)> 20 <(ad)c(bc)(ae)> 30 <(ef)(ab)(df)cb> 40 <eg(af)cbc> An element may contain a set of items. Items within an element are unordered and we list them alphabetically. <a(bc)dc> is a subsequence of <a(abc)(ac)d(cf)> Given support threshold min_sup =2, <(ab)c> is a sequential pattern

Challenges on Sequential Pattern Mining A huge number of possible sequential patterns are hidden in databases A mining algorithm should find the complete set of patterns, when possible, satisfying the minimum support (frequency) threshold be highly efficient, scalable, involving only a small number of database scans be able to incorporate various kinds of user-specific constraints J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji

Sequential Pattern Mining Algorithms for Multiple Data Sequences Apriori-based method: GSP (Generalized Sequential Patterns: Srikant & Agrawal @ EDBT’96) Pattern-growth methods: FreeSpan & PrefixSpan (Han et al.@KDD’00; Pei, et al.@ICDE’01) Vertical format-based mining: SPADE (Zaki@Machine Leanining’00) Constraint-based sequential pattern mining (SPIRIT: Garofalakis, Rastogi, Shim@VLDB’99; Pei, Han, Wang @ CIKM’02) Mining closed sequential patterns: CloSpan (Yan, Han & Afshar @SDM’03) J. Han and M. Kamber. Data Mining: Concepts and Techniques, www.cs.uiuc.edu/~hanji

Mining Sequential Patterns from a Very-Long Single Sequence A series of daily news paper articles < > typhoon flood, landslide typhoon flood, landslide <typhoon (flood, landslide)>

Sequential Pattern Mining Algorithms for a Single data Sequence Discovery of frequent episodes in event sequences, based on a sliding window system [Mannila 1998]:  The frequency measure becomes anti-monotonic, but has a problem, i.e., a duplicate counting of an occurrence. Asynchronous periodic pattern mining [Yang et.al 2000, Huang 2004]: Any anti-monotonic frequency measures are not investigated. On-line approximation algorithm for mining frequent items, not for frequent subsequences Lossy counting algorithm [Manku and Motwani, VLDB’02]

Research in Our Laboratory Sequential Data Mining from a very-large single data sequence. Main target: sequential textual data, especially, newspaper-articles corpora Objectives: to generate a robust and useful large-scale event-sequences corpus. Application 1: topic tracking/detection in information retrieval. Application 2: automated content-tracking in WEB. Application 3: scenario/story semi-automatic creation  Ordinary temporal data analysis: various log data in computer systems, genetic information, etc.

Technical Topics (1/2) A new framework for extracting frequent subsequences from a single long data sequence: in IEEE Inter. Conf. on Data Mining 2005 (ICDM2005): A new rational frequency measures, which satisfies the Apriori (anti-monotonic) property and has no duplicate counting. A fast on-line algorithm for a some limited case

Technical Topics (1/2) On-going current works and future work On-line rational filters based on confidence criteria and/or information-gain for eliminating redundant valueless sequences from system output Methods for finding meta-structures embedded in huge amount of frequent sequences generated by a system A method using compression based on context-free grammar-inference/learning More fast extraction algorithm based on a method for simultaneously searching multiple strings over compressed data.

References: Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques (Chapter 8). www.cs.uiuc.edu/~hanj

Thanks for your attention!!