Download presentation
Presentation is loading. Please wait.
1
Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo November 10, 2005 North American Meetings of Regional Science Association International
2
Outlines Research objectives Methodology: specification Methodology: illustration Evaluating the performance of fuzzy clustering Conclusions
3
Research objectives Demonstrate the use of fuzzy c-means (FCM) algorithm for delineating housing submarkets –Comparison to K-means Discuss empirical characteristics of FCM applied to given applications, in particular choice of parameters –Cluster validity index
4
Challenges Are the boundaries of clusters crisp? Cluster A Cluster C X1X1 X2X2 Housing market in metropolitan area q Cluster B Cluster A Cluster B Cluster C X1X1 X2X2 Housing market in metropolitan area p
5
Methodology: specification
6
Our task is to group census tracts to homogeneous housing submarkets within a metropolitan area Using fuzzy c-means algorithm In order to examine whether fuzzy set-based clustering can do the better job Implemented in 85 metropolitan areas Most of data set are public (e.g. 2000 Census) The whole procedure is automated in GIS
7
Methodology: flow chart National Regional Local … Census Tract Layer #x1x1 x2x2 x3x3 …xmxm 1 2 3 … n #y1y1 y2y2 …ykyk 1 2 3 … n Cluster Analysis #U1U1 U2U2 …UcUc 110…0 201…0 …01…0 n00…1 #U1U1 U2U2 …UcUc 10.850.05…0.10 20.120.80..0.05 …0.020.74…0.12 n0.400.03…0.50 K-means Fuzzy C- means Candidate variables Significant variables Stepwise regression(k ≤ m) Metro Hard Cluster Layer (c ≤ n) Fuzzy Cluster Layer … 12c12c k: # selected variables c: # submarkets For each metropolitan area U j : membership to cluster j
8
Explanatory variables for house price Var_NameVariable DefinitionDataYearSpatial Unit Socioeconomic/demographic Characteristics of Residents pcincomeper capita incomeCensus2000Census Tract college% college degreeCensus2000Census Tract managep% management workersCensus2000Census Tract prodp% production workersCensus2000Census Tract famcpchl% family with childrenCensus2000Census Tract nfmalone% nonfamily living aloneCensus2000Census Tract black_p% blackCensus2000Census Tract nhwht_p% non-hispanic whiteCensus2000Census Tract nativebr% native bornCensus2000Census Tract Structural Characteristics of Housing Units medroommedian number of roomCensus2000Census Tract hudetp% detached housing unitCensus2000Census Tract yrhubltmedian year structure builtCensus2000Census Tract Locational Characteristics (Amenities) of Neighborhoods ptratiopupil to teacher ratioNCES*2002School District schexpschool expenditure per studentNCES2002School District vrlcrimeviolent crime rateFBI**2003Designated Place prpcrimeproperty crime rateFBI2003Designated Place jobacmjob accessibility (Hansen 1959)CTPP***2000Census Tract *National Center for Education Statistics; **FBI annual report “Crime in the U.S. 2003”; *** CTPP: Census Transportation Planning Package Dependent variables: median home value of owner-occupied housing units
9
Study set: 85 metropolitan areas
10
Clustering method that minimizes the following objective function: Updates cluster means v i and membership degree u ik until the algorithm converges Vectors of data point, 1 ≤ k ≤ n Center of cluster i, 1 ≤ i ≤ c Membership degree of data point k with cluster i; [0,1] Fuzziness amount associated with assigning data point k to cluster i, 1≤ m ≤ ∞ Source: Bezdek 1981 x1x1 x2x2 What is fuzzy c-means (FCM)? (III-3a) (III-3b)
11
FCM: missing elements Optimal number of clusters c* Optimal fuzziness amount m* m c FCM
12
Extended fuzzy c-means algorithm Step 1: Initialize the parameters related to fuzzy partitioning: c = 2 (2 ≤ c cmax), m = 1 (1 ≤ m mmax), where c is an integer, m is a real number; Fix minc where minc is incremental value of m ( 0 < minc ≤ 0.1); Fix cut-off threshold L; Choose validity index v Step 2: Given c and m, initialize U(0) so that it becomes the fuzzy matrix. Then at step l, l = 0, 1, 2, ….; Step 3: Calculate the c fuzzy cluster centers {vi(l)} with (III-3a) and U(l) Step 4: Update U(l+1) using (III-3b) and {vi(l)} Step 5: Compare U(l) to U(l+1) in a convenient matrix norm; if || U(l+1) – U(l) || ≤ L to go step 6; otherwise return to Step 3. Step 6: Compute the validity index for given c and m Step 7: If c < cmax, then increase c c + 1 and go to step 3; otherwise go to step 8 Step 8: If m < mmax, then increase m m + minc and go to step 3; otherwise go to step 9 Step 9: Obtain the optimal validity index from, optimal number of clusters c*, and optimal amount of fuzziness exponent m*; The optimal fuzzy partition U is obtained given c* and m*
13
Cluster validity indices Partition coefficientPartition entropy Xie-Beni index SVi index where w is set to 2 in this study
14
Selected validity indices are calibrated over the study set Xie-Beni index is recommended as a validity index Average m* is 1.38 Determining c* and m*
15
Histogram of m* for FCM
16
Methodology: illustration
17
Median home value of Buffalo, NY
18
Dimensionality of Buffalo housing market PredictorCoefficientStandard Errort-statisticsp-value Constant-1455768164417-8.850.000 Per capita income2.36670.27918.480.000 % college degree88221113467.780.000 % family: couple with children65735187753.500.001 % detached housing unit-312605527-5.660.000 Housing age (year)692.8880.268.630.000 % non-hispanic white1118639142.860.005 % native born status130039311114.180.000 Job accessibility-0.052660.02227-2.360.019 Hedonic regression equation of median home value in Buffalo, NY Adjusted R sq = 84.3%
19
Optimal number of housing submarkets c*, Optimal fuzziness amount m*, Buffalo, NY c m1.01.11.2 1.3 1.41.51.61.71.81.9 20.47350.45700.43808.098310.411512.547814.433416.063417.464518.6721 3 0.41360.38890.3460 0.3385 10.786412.913714.793916.421717.829019.0553 40.78020.71160.60800.52411.31546.88377.48078.04418.56329.0391 50.55600.56220.59400.61210.46830.34040.64890.68500.72060.7555 60.62230.75781.01870.81730.69071.33931.40741.48191.55951.6382 70.88360.69030.68810.60160.61480.95152.43972.63062.83173.0383 80.59810.58880.57030.52320.39920.73810.89101.23881.29261.3538 90.96450.61600.48360.48660.84491.40201.41981.83171.86391.9161 100.70530.60040.66190.58730.58681.34651.50811.68751.82151.8591 c*3333855555 Values in the cell represent Xie-Beni index given c and m
20
c* = 3; m* = 1.3 Membership to Cluster 1Membership to Cluster 2 Membership to Cluster 3Defuzzified Clusters Buffalo housing submarkets
21
Evaluating the performance of fuzzy clustering
22
Compare the sum of squared error derived from KM (m=1) and FCM (m=m*) given c* Fuzzy clustering outperforms crisp clustering Compare FCM with K-means (KM)
23
Conclusions Fuzzy set theory provides a mechanism for uncertainty handling involved in classification task Fuzzy c-means algorithm is of practical use in delineating housing submarkets Fuzzy set theory needs further attention in social science fields More works on the choice of parameters are needed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.