Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Similar presentations


Presentation on theme: "Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )"— Presentation transcript:

1 Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

2 2 Outlines Introduction Frequent calling path graph Graph construction Mining calling path patterns Experimental results Conclusions

3 3 Mining association rules Agrawal, Imielinski, and Swami first addressed the issue of mining association rules in 1993.

4 4 Mining association rules Let I={i 1, i 2,..., i n } be the set of all distinct items, which are labeled by the lexicographic order. The association rules can be represented as “A  B” where A and B are subsets of I. This rule infers that if item A appears in one transaction, it is most likely that item B also occurs in the same transaction. For example, “Bread  Milk” “Beer  Diaper”

5 5 Mining association rules A transaction T in a database supports an itemset S if S is contained in T. All combinations of items that have fractional transaction support above a certain threshold, called minimum support, are termed large itemsets.

6 6 Mining association rules The problem of association rule mining can be decomposed into two sub-problems. Find all large itemsets. For a given large itemset, generate all rules. For every large itemset L, find all non-empty subsets of L. For every such subset A, output a rule of the form “A  (L-A)” if the ratio of support(L) to support(A) is at least minconf.

7 7 Mining association rules Apriori algorithm The first pass determines the large 1-itemsets. A subsequence pass k consists of two phases. First, the large itemsets L k-1 are used to generate the candidate itemsets C k. Next, the database is scanned and the support of candidates in C k is counted. Apriori property: any subset of a large itemset must be large.

8 8 關聯式規則 J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers (2001). Min_sup = 20%

9 9 Motivation Traditional methods for mining sequential patterns such as Apriori-like algorithms may suffer from two problems: Large number of candidate sequences. Repeated database scans.

10 10 Motivation Traditional mining methods can not extract PMFCPs from the database directly. a b c d e f g h i j k l T100 T200 T300 T400 T500 T600 PMFCP TIDCalling paths T100 T200 T300 T400 T500 T600 Let the minimum support be 50%.

11 11 Our solutions A novel graph data structure is proposed to contain the information of calling paths. The database is scanned only once. An efficient mining algorithm based on the proposed graph structure is devised to mine the PMFCPs in GSM networks.

12 12 Introduction The cell structure of a GSM network.

13 13 Introduction The cell structure of a real network. GSM: Switching, Services and Protocols, John Wiley & Sons Ltd., Chichester, England (1999).

14 14 Introduction A mobile phone user may make a phone call at one cell and then move to the other cells during the phone call. The sequence of visited cells during the phone call is termed a calling path.

15 15 Introduction The support of a calling path P is the ratio of transactions in the calling path database that contain P. A calling path with support not less than the user-specified minimum support is termed a frequent calling path. A frequent calling path is maximal if it is not contained by any other frequent calling paths.

16 16 Problem definition Let P 1 and P 2 be calling paths where P 1 = and P 2 =. If = where h  2, 0  i  n-h, and 0  j  m-h, we define the merge operation  as if i=n-h and j=0, if i=0 and j=m-h, {,otherwise } For example, P 1 =, P 2 =, P 1  P 2 = P 1 =, P 2 =, P 1  P 2 = P 1 =, P 2 =, P 1  P 2 ={, } P1P2 =P1P2 =

17 17 Problem definition The potential maximal frequent calling paths (PMFCPs) in the database D are defined as {P|P  FP +, and P is maximal in FP + }, where FP + is the closure of FP under the merge operation  and FP={P|P is a maximal frequent calling path in D}.

18 18 Frequent calling path graph A calling path graph is a directed graph containing the necessary information of mining PMFCPs in a calling path database. A calling path graph consists of vertices, out-edges, and in- out paths. A vertex in the calling path graph represents a cell in the GSM network. An out-edge of vertex v in a calling path graph is an edge starting at v. An in-edge of vertex v is an edge ending at v. An in-out path of vertex v in a calling path graph is a calling path formed by one in-edge of v and one out-edge of v.

19 19 Frequent calling path graph In a frequent calling path graph G, all out- edges and in-out paths in G are all frequent. A calling path can be decomposed into an out-edge, or an out-edge plus several in-out paths by which the corresponding calling path graph can be constructed. The decomposed out-edge and in-out paths can be merged to generate the original calling path.

20 20 Frequent calling path graph For example, the calling path can be decomposed into an out-edge plus three in-out paths, and. On the contrary, the decomposed out-edge, and in-out paths,, and can be merged into.

21 21 Graph construction The cell structure of the GSM network may be required to be divided into several partitions so that the corresponding calling path sub-graph of each partition can be held in the main memory and then the mining algorithm is applied to each sub-graph.

22 22 Graph construction Q1Q1 Q3Q3 Q2Q2 Q6Q6 Q5Q5 Q4Q4 Q9Q9 Q8Q8 Q7Q7 Partition line 1 Partition line 2 Partition line 3 Partition line 4 Q1Q1 Q3Q3 Q2Q2 Q5Q5 Q7Q7 Q4Q4 Q9Q9 Q8Q8 Q6Q6 Example of graph partition.

23 23 Graph construction The algorithm of graph construction first examine whether the cell structure of the GSM network is partitioned. Then, the calling paths are retrieved from the database and decomposed into out-edges and in-out paths. The graph is constructed by the out-edges and in-out paths.

24 24 Graph construction a b e f g h m i k l n c d j TIDCalling pathsTIDCalling paths T001 T011 T002 T012 T003 T013 T004 T014 T005 T015 T006 T016 T007 T017 T008 T018 T009 T019 T010 T020 Example 1: Let the minimum support be 10%.

25 25 Graph construction vOut-edgesIn-out paths a :3, :1 b :5 :2 c :1 d :1, :4 :4, :1 f :1 g :4 :2, :2, :2 h :1 j :2 k :2, :3 l :1 m n :1, :1, :1 a b d g f j k l m n T001 T011 T002 T012 T003 T013 T004 T014 T005 T015 T006 T016 T007 T017 T008 T018 T009 T019 T010 T020 The frequent calling path graph.

26 26 Mining PMFCPs The algorithm of mining PMFCPs is based on a depth-first search approach, which is one of the natural ways to visit vertices in a graph systematically. First, find all local PMFCPs and then merge all local PMFCPs extracted from sub-graphs into global PMFCPs.

27 27 Mining PMFCPs Example 1 (cont.): Copied Path(1): Path(2): Path(3): Path(4): Copied Appended Copied a b d g f j k l m n

28 28 Mining PMFCPs Partition line ab c d e f gh i j k l m n ab c d e f gh i c d e f gh i j k l m n Local PMFCPs in PartitionU Local PMFCPs in PartitionL Global PMFCPs (a) Original cell structure. (b) PartitionU.(c) PartitionL. If the cell structure of the GSM network is divided into two partitions, a b d g f j k l m n

29 29 Experimental results (PMFCPs) Two synthetic datasets were simulated. For a GSM network with N cells, the cell structure of the GSM network is arranged to be semi-square shape that contains and +1 cells at each consecutive level. For example, the cell structure of a GSM network with 22 cells is shown as follows:

30 30 Experimental results (PMFCPs) The cells are labeled in sequential order from 0 to N-1, if the GSM network contains N cells. The starting cell of each mobile phone call is determined from a uniform distribution U( ,  ), where  denotes the smallest cell ID, and  denotes the largest cell ID. The next cell of a calling path is also determined from a uniform distribution that selects one of the six neighboring cells uniformly. The length of a calling path is determined from an exponential distribution with the parameter of mean .

31 31 Experimental results (PMFCPs)

32 32 Experimental results (PMFCPs)

33 33 Experimental results (PMFCPs)

34 34 Experimental results (PMFCPs)

35 35 Conclusions The interesting issue of mining calling path patterns in GSM networks is addressed. A new concept of interesting and effective patterns (PMFCPs) is derived from the calling path patterns. The PMFCPs can be mined efficiently by using our proposed graph structure.


Download ppt "Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )"

Similar presentations


Ads by Google