Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Frequent Closed Pattern Search By Row and Feature Enumeration
COMP5318 Knowledge Discovery and Data Mining
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Fast Algorithms for Association Rule Mining
SEG Tutorial 2 – Frequent Pattern Mining.
What Is Sequential Pattern Mining?
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Ch5 Mining Frequent Patterns, Associations, and Correlations
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Sequential Pattern Mining
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association rule mining
Frequent Pattern Mining
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Basket Analysis and Association Rules
Association Analysis: Basic Concepts
Presentation transcript:

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

2 Outlines Introduction Frequent calling path graph Graph construction Mining calling path patterns Experimental results Conclusions

3 Mining association rules Agrawal, Imielinski, and Swami first addressed the issue of mining association rules in 1993.

4 Mining association rules Let I={i 1, i 2,..., i n } be the set of all distinct items, which are labeled by the lexicographic order. The association rules can be represented as “A  B” where A and B are subsets of I. This rule infers that if item A appears in one transaction, it is most likely that item B also occurs in the same transaction. For example, “Bread  Milk” “Beer  Diaper”

5 Mining association rules A transaction T in a database supports an itemset S if S is contained in T. All combinations of items that have fractional transaction support above a certain threshold, called minimum support, are termed large itemsets.

6 Mining association rules The problem of association rule mining can be decomposed into two sub-problems. Find all large itemsets. For a given large itemset, generate all rules. For every large itemset L, find all non-empty subsets of L. For every such subset A, output a rule of the form “A  (L-A)” if the ratio of support(L) to support(A) is at least minconf.

7 Mining association rules Apriori algorithm The first pass determines the large 1-itemsets. A subsequence pass k consists of two phases. First, the large itemsets L k-1 are used to generate the candidate itemsets C k. Next, the database is scanned and the support of candidates in C k is counted. Apriori property: any subset of a large itemset must be large.

8 關聯式規則 J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers (2001). Min_sup = 20%

9 Motivation Traditional methods for mining sequential patterns such as Apriori-like algorithms may suffer from two problems: Large number of candidate sequences. Repeated database scans.

10 Motivation Traditional mining methods can not extract PMFCPs from the database directly. a b c d e f g h i j k l T100 T200 T300 T400 T500 T600 PMFCP TIDCalling paths T100 T200 T300 T400 T500 T600 Let the minimum support be 50%.

11 Our solutions A novel graph data structure is proposed to contain the information of calling paths. The database is scanned only once. An efficient mining algorithm based on the proposed graph structure is devised to mine the PMFCPs in GSM networks.

12 Introduction The cell structure of a GSM network.

13 Introduction The cell structure of a real network. GSM: Switching, Services and Protocols, John Wiley & Sons Ltd., Chichester, England (1999).

14 Introduction A mobile phone user may make a phone call at one cell and then move to the other cells during the phone call. The sequence of visited cells during the phone call is termed a calling path.

15 Introduction The support of a calling path P is the ratio of transactions in the calling path database that contain P. A calling path with support not less than the user-specified minimum support is termed a frequent calling path. A frequent calling path is maximal if it is not contained by any other frequent calling paths.

16 Problem definition Let P 1 and P 2 be calling paths where P 1 = and P 2 =. If = where h  2, 0  i  n-h, and 0  j  m-h, we define the merge operation  as if i=n-h and j=0, if i=0 and j=m-h, {,otherwise } For example, P 1 =, P 2 =, P 1  P 2 = P 1 =, P 2 =, P 1  P 2 = P 1 =, P 2 =, P 1  P 2 ={, } P1P2 =P1P2 =

17 Problem definition The potential maximal frequent calling paths (PMFCPs) in the database D are defined as {P|P  FP +, and P is maximal in FP + }, where FP + is the closure of FP under the merge operation  and FP={P|P is a maximal frequent calling path in D}.

18 Frequent calling path graph A calling path graph is a directed graph containing the necessary information of mining PMFCPs in a calling path database. A calling path graph consists of vertices, out-edges, and in- out paths. A vertex in the calling path graph represents a cell in the GSM network. An out-edge of vertex v in a calling path graph is an edge starting at v. An in-edge of vertex v is an edge ending at v. An in-out path of vertex v in a calling path graph is a calling path formed by one in-edge of v and one out-edge of v.

19 Frequent calling path graph In a frequent calling path graph G, all out- edges and in-out paths in G are all frequent. A calling path can be decomposed into an out-edge, or an out-edge plus several in-out paths by which the corresponding calling path graph can be constructed. The decomposed out-edge and in-out paths can be merged to generate the original calling path.

20 Frequent calling path graph For example, the calling path can be decomposed into an out-edge plus three in-out paths, and. On the contrary, the decomposed out-edge, and in-out paths,, and can be merged into.

21 Graph construction The cell structure of the GSM network may be required to be divided into several partitions so that the corresponding calling path sub-graph of each partition can be held in the main memory and then the mining algorithm is applied to each sub-graph.

22 Graph construction Q1Q1 Q3Q3 Q2Q2 Q6Q6 Q5Q5 Q4Q4 Q9Q9 Q8Q8 Q7Q7 Partition line 1 Partition line 2 Partition line 3 Partition line 4 Q1Q1 Q3Q3 Q2Q2 Q5Q5 Q7Q7 Q4Q4 Q9Q9 Q8Q8 Q6Q6 Example of graph partition.

23 Graph construction The algorithm of graph construction first examine whether the cell structure of the GSM network is partitioned. Then, the calling paths are retrieved from the database and decomposed into out-edges and in-out paths. The graph is constructed by the out-edges and in-out paths.

24 Graph construction a b e f g h m i k l n c d j TIDCalling pathsTIDCalling paths T001 T011 T002 T012 T003 T013 T004 T014 T005 T015 T006 T016 T007 T017 T008 T018 T009 T019 T010 T020 Example 1: Let the minimum support be 10%.

25 Graph construction vOut-edgesIn-out paths a :3, :1 b :5 :2 c :1 d :1, :4 :4, :1 f :1 g :4 :2, :2, :2 h :1 j :2 k :2, :3 l :1 m n :1, :1, :1 a b d g f j k l m n T001 T011 T002 T012 T003 T013 T004 T014 T005 T015 T006 T016 T007 T017 T008 T018 T009 T019 T010 T020 The frequent calling path graph.

26 Mining PMFCPs The algorithm of mining PMFCPs is based on a depth-first search approach, which is one of the natural ways to visit vertices in a graph systematically. First, find all local PMFCPs and then merge all local PMFCPs extracted from sub-graphs into global PMFCPs.

27 Mining PMFCPs Example 1 (cont.): Copied Path(1): Path(2): Path(3): Path(4): Copied Appended Copied a b d g f j k l m n

28 Mining PMFCPs Partition line ab c d e f gh i j k l m n ab c d e f gh i c d e f gh i j k l m n Local PMFCPs in PartitionU Local PMFCPs in PartitionL Global PMFCPs (a) Original cell structure. (b) PartitionU.(c) PartitionL. If the cell structure of the GSM network is divided into two partitions, a b d g f j k l m n

29 Experimental results (PMFCPs) Two synthetic datasets were simulated. For a GSM network with N cells, the cell structure of the GSM network is arranged to be semi-square shape that contains and +1 cells at each consecutive level. For example, the cell structure of a GSM network with 22 cells is shown as follows:

30 Experimental results (PMFCPs) The cells are labeled in sequential order from 0 to N-1, if the GSM network contains N cells. The starting cell of each mobile phone call is determined from a uniform distribution U( ,  ), where  denotes the smallest cell ID, and  denotes the largest cell ID. The next cell of a calling path is also determined from a uniform distribution that selects one of the six neighboring cells uniformly. The length of a calling path is determined from an exponential distribution with the parameter of mean .

31 Experimental results (PMFCPs)

32 Experimental results (PMFCPs)

33 Experimental results (PMFCPs)

34 Experimental results (PMFCPs)

35 Conclusions The interesting issue of mining calling path patterns in GSM networks is addressed. A new concept of interesting and effective patterns (PMFCPs) is derived from the calling path patterns. The PMFCPs can be mined efficiently by using our proposed graph structure.