Download presentation
Presentation is loading. Please wait.
Published byRodney Bryant Modified over 8 years ago
1
A New Method to Forecast Enrollments Using Fuzzy Time Series and Clustering Techniques Kurniawan Tanuwijaya 1 and Shyi-Ming Chen 1, 2 1 Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C. 2 Department of Computer Science and Information Engineering, Jinwen University of Science and Technology, Taipei County, Taiwan, R.O.C.
2
Outline 1.Introduction 2.A Review of Fuzzy Time Series 3.The Proposed Clustering Algorithm 4.A New method for Forecasting Enrollments Based on the Fuzzy Time Series and the Proposed Clustering Algorithm 5.Experimental Results 6.Conclusions
3
1. Introduction [1993] Song and Chissom proposed the concepts of fuzzy time series. Use two fuzzy time series models (i.e., time-variant and time- invariant fuzzy time series) to forecast the enrollments of the University of Alabama [1994] Sullivan and Woodall compared Song and Chissom’s methods with a time-invariant Markov Model using linguistic label [1996] Chen proposed a simple arithmetic operations [2001] Huarng presented a heuristic model by integrating Chen’s model
4
1. Introduction (cont.) [2002] Chen presented high-order fuzzy time series. [2006] Hwang, et. al. presented time-variant fuzzy time series. In this paper, we present a new method to forecast the enrollments of the University of Alabama based on fuzzy time series and clustering techniques.
5
2. A Review of Fuzzy Time Series A fuzzy set A of the universe of discourse U, U = {u 1, u 2, …, u n }, is defined as follows: where f A is the membership function of the fuzzy set A, f A (u i ) denotes the grade of membership of u i in the fuzzy set A, and 1≤ i ≤ n. Let Y(t) (t = …, 0, 1, 2, …) be the universe of discourse in which fuzzy set f i (t) (i = 1, 2, …) are defined. Let F(t) be a collection of f i (t) (i = 1, 2, …). Then, F(t) is a fuzzy time series on Y(t) (t = …, 0, 1, 2, …).
6
2. A Review of Fuzzy Time Series (cont.) Assume that there is fuzzy relationship R(t, t-1) between F(t-1) and F(t), such that F(t) = F(t-1) R(t, t- 1), where “” is the Max-Min composition operator, then F(t) is called caused by F(t-1) and it is denoted by a fuzzy logical relationship, shown as follows: where both F(t-1) and F(t) are fuzzy sets and “ F(t-1)” and “ F(t)” are called the current state and next state, respectively.
7
2. A Review of Fuzzy Time Series (cont.) Let F(t) be a fuzzy time series. If F(t) is caused by F(t-1), F(t-2), …, and F(t-n), then the fuzzy logical relationship between them can be represented by a high-order fuzzy logical relationship, shown as follows: where F(t-n), …, F(t-2), and F(t-1) are fuzzy sets, respectively, and “ F(t-n), …, F(t-2), F(t-1) ” and “ F(t) ” are called the current state and the next state of the high order fuzzy logical relationship, respectively. The fuzzy logical relationships having the same current state are grouped into a fuzzy logical relationship group.
8
3. The Proposed Clustering Algorithm The proposed clustering algorithm is used to partition universe of discourse into different lengths of intervals. Step 1: Sort the numerical data in ascending sequence, shown as follows: Calculate the threshold value for stopping condition of the proposed clustering algorithm, shown as follows: (1)
9
3. The Proposed Clustering Algorithm (cont.) Step 2: Put each datum into a cluster, shown as follows where the symbol “ {} ” denotes a cluster. Step 3: Assume that there are p clusters, calculate the cluster center cluster_center k of each cluster Cluster k as follows: (2) where d j is the data in Cluster k, r is the number of the data in Cluster k, and 1≤ k ≤ p.
10
3. The Proposed Clustering Algorithm (cont.) Calculate the distance distance m,m+1 between any two adjacent cluster centers cluster_center m and cluster_center m+1, shown as follows: (3) where m = 1, 2, …, p-1. Step 4: Find the smallest distance smallest_distance : (4) Step 5: If smallest_distance <, then combine the clusters having the smallest distance between them into a cluster and go to Step 3. Otherwise, go to Step 6.
11
3. The Proposed Clustering Algorithm (cont.) Step 6: Calculate the upper bound cluster_uBound m of Cluster m and the lower bound cluster_lBound m+1 of Cluster m+1 : (5) (6) where m = 1, 2, …, p-1. Because there is no previous cluster before the first cluster and there is no next cluster after the last cluster, the lower bound cluster_lBound 1 of the first cluster and the upper bound cluster_uBound p of the last cluster can be calculated as follows: (7) (8)
12
3. The Proposed Clustering Algorithm (cont.) Step 7: Let each cluster Cluster k form an interval interval k, which means that the upper bound cluster_uBound k and the lower bound cluster_lBound k of the cluster Cluster k are also the upper bound interval_uBound k and the lower bound interval_lBound k of the interval interval k, respectively. Calculate the middle value mid_value k of the interval interval k as follows: (9) where 1≤ k ≤ p.
13
4. A New Method for Forecasting Enrollments Based on the Fuzzy Time Series and The Proposed Clustering Algorithm Step 1: Apply the proposed clustering algorithm to partition the universe of discourse. Step 2: Assume that there are n intervals u 1, u 2, …, u n obtained in Step 1, then define linguistic terms A 1, A 2, …, A n represented by fuzzy sets, shown as follows: Step 3: Fuzzify each historical datum into a fuzzy set. If the datum is belonging to u i, then the datum is fuzzified into A i, where 1≤ i ≤ n.
14
4. A New Method for Forecasting Enrollments Based on the Fuzzy Time Series and The Proposed Clustering Algorithm (cont.) Step 4: Construct the fuzzy logical relationship based on the fuzzified data obtained in Step 3. (Note: If the first order fuzzy time series is used and the fuzzified values of time t-1 and t are A j and A k, respectively, then construct the fuzzy logical relationship “ A j → A k ”, where “ A j ” and “ A k ” are called the current state and the next state of the fuzzy logical relationship. If the n th order fuzzy time series is used and the fuzzified values of time t-n, …, t-2, t-1 and t are A j,n, …, A j,2, A j,1 and A k, respectively, then construct the fuzzy logical relationship “ A j,n, …, A j,2, A j,1 → A k ”, where “ A j,n, …, A j,2, A j,1 ” and “ A k ” are called the current state and the next state of the n th order fuzzy logical relationship). Based on the current state of the fuzzy logical relationships, let the fuzzy logical relationships having the same current state to form a fuzzy logical relationship group.
15
4. A New Method for Forecasting Enrollments Based on the Fuzzy Time Series and The Proposed Clustering Algorithm (cont.) Step 5: Calculate the forecasted output at time t by using the following principles: Principle 1: If the fuzzified values at time t-n, …, t-2, and t-1 are A j,n, …, A j,2, and A j,1, respectively, and there is only one fuzzy logical relationship in the fuzzy logical relationship groups, shown as follows: then the forecasted value of time t is m k, where m k is the middle value of the interval u k and the maximum membership value of A k occurs at interval u k.
16
4. A New Method for Forecasting Enrollments Based on the Fuzzy Time Series and The Proposed Clustering Algorithm (cont.) Principle 2: If the fuzzified values at time t-n, …, t-2, and t-1 are A j,n, …, A j,2, and A j,1, respectively, and there is only one fuzzy logical relationship in the fuzzy logical relationship groups, shown as follows: then the forecasted value of time t is calculated as follows: where x i denotes the number of fuzzy logical relationships “ A j,n, …, A j,2, A j,1 → A ki ” in the fuzzy logical relationship group, 1≤ i ≤ p ; m k1, m k2,…, and m kp are the middle value of the intervals u k1, u k2,…, and u kp, respectively, and the maximum membership values of A k1, A k2,…, and A kp occur at interval u k1, u k2,…, and u kp, respectively.
17
4. A New Method for Forecasting Enrollments Based on the Fuzzy Time Series and The Proposed Clustering Algorithm (cont.) Principle 3: If the fuzzified values at time t-n, …, t-2, and t-1 are A j,n, …, A j,2, and A j,1, respectively, and there is only one fuzzy logical relationship in the fuzzy logical relationship groups, shown as follows: then the forecasted value of time t is calculated as follows: where m j,n, …, m j,2 and m j,1 are the middle values of the intervals u j,n, …, u j,2 and u j,1, respectively, and the maximum membership values of A j,n, …, A j,2 and A j,1 occur at intervals u j,n, …, u j,2 and u j,1, respectively.
18
5. Experimental Results A. The Proposed Method using the First Order Fuzzy Time Series [Step 1] Apply the proposed clustering algorithm to partition UoD into different lengths of intervals: [Sub-Step 1] Sorting the numerical data: 13055, 13563, 13867, 14696, 15145, 15163, 15311, 15433, 15460, 15497, 15603, 15861, 15984, 16388, 16807, 16859, 16919, 18150, 18876, 18970, 19328, 19337. Calculate the threshold for stopping condition of the proposed clustering algorithm: YearActual Enrollments 197113055 197213563 197313867 197414696 197515460 197615311 197715603 197815861 197916807 198016919 198116388 198215433 198315497 198415145 198515163 198615984 198716859 198818150 198918970 199019328 199119337 199218876 Table 1. Historical Enrollments of the University of Alabama
19
5. Experimental Results (cont.) [Sub-Step 2] Put each datum in a cluster, shown as follows: {13055}, {13563}, {13867}, {14696}, {15145}, {15163}, {15311}, {15433}, {15460}, {15497}, {15603}, {15861}, {15984}, {16388}, {16807}, {16859}, {16919}, {18150}, {18876}, {18970}, {19328}, {19337}. [Sub-Step 3] Based on Eq. (2), calculate each cluster center cluster_center k, 1≤ k ≤ 22, shown as follows: cluster_center 1 = 13055,cluster_center 9 = 15460,cluster_center 17 = 16919, cluster_center 2 = 13563,cluster_center 10 = 15497,cluster_center 18 = 18150, cluster_center 3 = 13867,cluster_center 11 = 15603,cluster_center 19 = 18876, cluster_center 4 = 14696,cluster_center 12 = 15861,cluster_center 20 = 18970, cluster_center 5 = 15145,cluster_center 13 = 15984,cluster_center 21 = 19328, cluster_center 6 = 15163,cluster_center 14 = 16388,cluster_center 22 = 19337. cluster_center 7 = 15311,cluster_center 15 = 16807, cluster_center 8 = 15433,cluster_center 16 = 16859,
20
5. Experimental Results (cont.) Based on Eq. (3), calculate the distance distance m,m+1, 1≤ m ≤ 21, shown as follows: [Sub-Step 4] Find the smallest distance smallest_distance, i.e., 9 (the distance distance 21,22 between cluster_center 21 and cluster_center 22 ). [Sub-Step 5] Because the smallest_distance <, i.e., 9 < 299 is true, then cluster 21 (i.e., {19328}) and cluster 22 (i.e., {19337}) are combined into one cluster (i.e., {19328, 19337}), and go to Sub-Step 3. distance 1,2 = 508,distance 8,9 = 27,distance 15,16 = 52, distance 2,3 = 304,distance 9,10 = 37,distance 16,17 = 60, distance 3,4 = 829,distance 10,11 = 106,distance 17,18 = 1231, distance 4,5 = 449,distance 11,12 = 258,distance 18,19 = 726, distance 5,6 = 18,distance 12,13 = 123,distance 19,20 = 94, distance 6,7 = 148,distance 13,14 = 404,distance 20,21 = 358, distance 7,8 = 122,distance 14,15 = 419,distance 21,22 = 9.
21
5. Experimental Results (cont.) The iterations of Sub-Step 3 to Sub-Step 5 are repeteadly done until the condition “ smallest_distance < ” is false. The final clustering results are shown as follows: {13055}, {13563}, {13867}, {14696}, {15145, 15163, 15311, 15433, 15460, 15497, 15603}, {15861, 15984}, {16388}, {16807, 16859, 16919}, {18150}, {18876, 18970}, {19328, 19337}.. [Sub-Step 6] Based on Eqs. (5) and (6), the upper bound and lower bound of each Cluster k, 1≤ k ≤ 11. For example:
22
5. Experimental Results (cont.) Because there is no previous cluster before Cluster 1, the lower bound of cluster_lBound 1 of Cluster 1 is calculated using Eq. (8) and because there is no next cluster after the last cluster, i.e., Cluster 11, the upper bound cluster_uBound 11 is calculated using Eq. (7).
23
5. Experimental Results (cont.) [Sub-Step 7] Let each Cluster k form an interval k and calculate the middle value using Eq. (9). Table 2. The Interval Generations from the Clusters ClusterDataCluster CenterLower BoundUpper BoundMiddle Value Cluster 1 {13055} 13055128011330913055 Cluster 2 {13563} 13563133091371513512 Cluster 3 {13867} 138671371514281.513998 Cluster 4 {14696} 1469614281.515034.614658 Cluster 5 {15145, 15163, 15311, 15433, 15460, 15497, 15603} 15373.115034.615647.815341 Cluster 6 {15861, 15984} 15922.515647.816155.315902 Cluster 7 {16388} 1638816155.316624.816390 Cluster 8 {16807, 16859, 16919} 16861.716624.817505.817065 Cluster 9 {18150} 1815017505.818536.518021 Cluster 10 {18876, 18970} 1892318536.519127.818832 Cluster 11 {19328, 19337} 19332.519127.819537.319333
24
5. Experimental Results (cont.) For simplicity, after rounding the real values in Table 2 into integer, the following intervals can be get, shown as follows: [Step 2] Define the linguistic term A 1, A 2, …, and A 11, shown as follows: u 1 =[12801, 13309),u 5 =[15035, 16155),u 9 =[17506, 18537), u 2 =[13309, 13715),u 6 =[15648, 16155),u 10 =[18537, 19128), u 3 =[13715, 14282),u 7 =[16155, 16625),u 11 =[19128, 19333). u 4 =[14282, 15035),u 8 =[16625, 17506),
25
5. Experimental Results (cont.) [Step 3] Fuzzify each datum that is belonging to u i, where 1≤ i ≤ 11 into A i. [Step 4] Obtain the fuzzy logical relationships (FLR) of the first order fuzzy time series. Let the FLR having the same current state to form a FLR group (FLRG). YearActual EnrollmentsFuzzified Enrollments 197113055A1A1 197213563A2A2 197313867A3A3 197414696A4A4 197515460A5A5 197615311A5A5 197715603A5A5 197815861A6A6 197916807A8A8 198016919A8A8 198116388A7A7 198215433A5A5 198315497A5A5 198415145A5A5 198515163A5A5 198615984A6A6 198716859A8A8 198818150A9A9 198918970A 10 199019328A 11 199119337A 11 199218876A 10 Table 3. Fuzzified Enrollments of the University of Alabama Group 1: A 1 → A 2 Group 2: A 2 → A 3 Group 3: A 3 → A 4 Group 4: A 4 → A 5 Group 5: A 5 → A 5 (5), A 6 (2) Group 6: A 6 → A 8 (2) Group 7: A 7 → A 5 Group 8: A 8 → A 7, A 8, A 9 Group 9: A 9 → A 10 Group 10: A 10 → A 11 Group 11: A 11 → A 10, A 11 Table 4. FLRG of the First Order of Fuzzy Time Series
26
5. Experimental Results (cont.) [Step 5] Calculate the forecasting value. For example, the forecasted enrollment of the year 1978 is calculated as follows: From Table 3, we can see that the fuzzified enrollment of year 1977 is A 5. From Table 4, there is a FLR “ A 5 → A 5 (5), A 6 (2) ” in Group 5. Therefore the forecasted enrollment of year 1978 is calculated as follows: where 15341 and 15902 are the middle values of the intervals u 5 and u 6, respectively.
27
5. Experimental Results (cont.) Year Actual Enrollments Song and Chissom’s method Sulllivan and Woodall’s method Chen’s methodHuarng’s methodThe proposed method 197113055 197213563140001350014000 13512 197313867140001450014000 13998 197414696140001450014000 14658 197515460155001523115500 15341 1976153111600015563160001550015501 197715603160001556316000 15501 197815861160001550016000 15501 197916807160001550016000 17065 1980169191681316684168331750017159 1981163881681316684168331600017159 1982154331678915500168331600015341 198315497160001556316000 15501 1984151451600015563160001550015501 198515163160001556316000 15501 198615984160001556316000 15501 198716859160001550016000 17065 1988181501681316577168331750017159 198918970190001950019000 18832 199019328190001950019000 19333 1991193371900019500190001950019083 199218876Not forecasted 19000 19083 MSE423027386055407507226611122085 Table 6. A MSE Comparison of the Proposed Method Using the First Order Fuzzy Time Series With the Existing Methods
28
5. Experimental Results (cont.) B. The Proposed Method using the Second Order Fuzzy Time Series The results of Steps 1-3 of the proposed method using the second order fuzzy time series are the same as the Steps 1-3 of the proposed method using the first order fuzzy time series. In the following, we illustrate the results of Step 4 and Step 5 of the proposed method using the second order fuzzy time series.
29
5. Experimental Results (cont.) [Step 4] Based on Table 3, we can construct the FLR of the second order Fuzzy Time Series. Let the FLR having the same current state to form a FLR group (FLRG). YearActual EnrollmentsFuzzified Enrollments 197113055A1A1 197213563A2A2 197313867A3A3 197414696A4A4 197515460A5A5 197615311A5A5 197715603A5A5 197815861A6A6 197916807A8A8 198016919A8A8 198116388A7A7 198215433A5A5 198315497A5A5 198415145A5A5 198515163A5A5 198615984A6A6 198716859A8A8 198818150A9A9 198918970A 10 199019328A 11 199119337A 11 199218876A 10 Table 3. Fuzzified Enrollments of the University of Alabama Group 1: A 1, A 2 → A 3 Group 2: A 2, A 3 → A 4 Group 3: A 3, A 4 → A 5 Group 4: A 4, A 5 → A 5 Group 5: A 5, A 5 → A 5 (3), A 6 (2) Group 6: A 5, A 6 → A 8 (2) Group 7: A 6, A 8 → A 8, A 9 Group 8: A 8, A 8 → A 7 Group 9: A 8, A 7 → A 5 Group 10: A 7, A 5 → A 5 Group 11: A 8, A 9 → A 10 Group 12: A 9, A 10 → A 11 Group 13: A 10, A 11 → A 11 Group 14: A 11, A 11 → A 10 Table 5. FLRG of the Second Order of Fuzzy Time Series
30
5. Experimental Results (cont.) [Step 5] Calculate the forecasting value. For example, the forecasted enrollment of the year 1988 is calculated as follows: From Table 3, we can see that the fuzzified enrollment of years 1986 and 1987 are A 6 and A 8, respectively. From Table 5, there is a FLR “ A 6, A 8 → A 8, A 9 ” in Group 7. Therefore, the forecasted enrollment of year 1988 is calculated as follows: where 17065 and 18021 are the middle values of the interval u 8 and u 9, respectively. In the same way, the forecasted enrollments of the University of Alabama of the other years using the second order fuzzy time series can be obtained.
31
5. Experimental Results (cont.) Method Order Hwang, Chen, and Lee’s method Chen’s methodThe proposed method 2 3331718909377847 3 2996348669475926 4 3154898937660159 5 2789199453962865 6 2969509821521746 7 31672010405618619 8 30122810217919829 9 30648510278916234 Table 7. A MSE Comparison of the Proposed Method Using High-Order Fuzzy Time Series With the Existing Methods
32
6. Conclusions In this paper, we have presented a new method to forecast the enrollments of the University of Alabama using the first order fuzzy time series and the high-order fuzzy time series, respectively. The proposed method uses the proposed clustering algorithm to partition the universe of discourse into different lengths of intervals. The proposed method gets higher average forecasting accuracy rates than the existing methods, due to the fact that the proposed method gets smaller mean square errors (MSEs).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.