Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude.

Slides:



Advertisements
Similar presentations
Classical Linear Regression Model
Advertisements

A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Indianapolis-Carmel MSA
Chapter 7 – Classification and Regression Trees
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Analysis of Economic Data
SES 2007 A Multiresolution Approach for Statistical Mobility Prediction of Unmanned Ground Vehicles 44 th Annual Technical Meeting of the Society of Engineering.
Mutual Information Mathematical Biology Seminar
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
University at BuffaloThe State University of New York Cluster Validation Cluster validation q Assess the quality and reliability of clustering results.
Geog 458: Map Sources and Errors Uncertainty January 23, 2006.
Julie Sungsoon Hwang & Jean-Claude Thill Department of Geography State University of New York at Buffalo U.S.A. August 24, th Int’l Symposium of.
Labor Statistics in the United States Grace York March 2004.
August 12, 2003 IV. FUZZY SET METHODS - CLUSTER ANALYSIS: Math Clinic Fall IV. FUZZY SET METHODS for CLUSTER ANALYSIS and (super brief) NEURAL NETWORKS.
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect5_1.
Radial Basis Function Networks
Evaluating Performance for Data Mining Techniques
GIS in Prevention, County Profiles, Series 3 (2006) A. Census Definitions The following is an excellent source of definitions and explanations of geography-related.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
CPSC 386 Artificial Intelligence Ellen Walker Hiram College
Radial Basis Function Networks
Indiana GIS Conference, March 7-8, URBAN GROWTH MODELING USING MULTI-TEMPORAL IMAGES AND CELLULAR AUTOMATA – A CASE STUDY OF INDIANAPOLIS SHARAF.
Digital Image Processing In The Name Of God Digital Image Processing Lecture8: Image Segmentation M. Ghelich Oghli By: M. Ghelich Oghli
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Old Louisville by the Numbers A Statistical Profile by Michael Price Urban Studies Institute University of Louisville Spring 2006.
Chapter 9 – Classification and Regression Trees
Generalized Fuzzy Clustering Model with Fuzzy C-Means Hong Jiang Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, US.
American Community Survey Maryland State Data Center Affiliate Meeting September 16, 2010.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Fuzzy C-Means Clustering
Machine Learning Queens College Lecture 7: Clustering.
Flat clustering approaches
1 CLUSTER VALIDITY  Clustering tendency Facts  Most clustering algorithms impose a clustering structure to the data set X at hand.  However, X may not.
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Fuzzy C-means Clustering Dr. Bernard Chen University of Central Arkansas.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
PROVIDING INTERNATIONAL COMPARABILITY OF POVERTY ASSESSMENTS
Data Driven Resource Allocation for Distributed Learning
Deep Feedforward Networks
John W. Sipple, PhD Joe D. Francis, PhD Development Sociology
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Clustering (3) Center-based algorithms Fuzzy k-means
Dr. Unnikrishnan P.C. Professor, EEE
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
DATA MINING Introductory and Advanced Topics Part II - Clustering
Labour Price Index Labour Market Statistics (LAMAS) Working Group
Fuzzy Clustering Algorithms
Presentation transcript:

Delineating Metropolitan Housing Submarkets with Fuzzy Clustering Methods Julie Sungsoon Hwang Department of Geography, University of Washington Jean-Claude Thill Department of Geography, State University of New York at Buffalo November 10, 2005 North American Meetings of Regional Science Association International

Outlines Research objectives Methodology: specification Methodology: illustration Evaluating the performance of fuzzy clustering Conclusions

Research objectives Demonstrate the use of fuzzy c-means (FCM) algorithm for delineating housing submarkets –Comparison to K-means Discuss empirical characteristics of FCM applied to given applications, in particular choice of parameters –Cluster validity index

Challenges Are the boundaries of clusters crisp? Cluster A Cluster C X1X1 X2X2 Housing market in metropolitan area q Cluster B Cluster A Cluster B Cluster C X1X1 X2X2 Housing market in metropolitan area p

Methodology: specification

Our task is to group census tracts to homogeneous housing submarkets within a metropolitan area Using fuzzy c-means algorithm In order to examine whether fuzzy set-based clustering can do the better job Implemented in 85 metropolitan areas Most of data set are public (e.g Census) The whole procedure is automated in GIS

Methodology: flow chart National Regional Local … Census Tract Layer #x1x1 x2x2 x3x3 …xmxm … n #y1y1 y2y2 …ykyk … n Cluster Analysis #U1U1 U2U2 …UcUc 110…0 201…0 …01…0 n00…1 #U1U1 U2U2 …UcUc … … …0.12 n …0.50 K-means Fuzzy C- means Candidate variables Significant variables Stepwise regression(k ≤ m) Metro Hard Cluster Layer (c ≤ n) Fuzzy Cluster Layer … 12c12c k: # selected variables c: # submarkets For each metropolitan area U j : membership to cluster j

Explanatory variables for house price Var_NameVariable DefinitionDataYearSpatial Unit Socioeconomic/demographic Characteristics of Residents pcincomeper capita incomeCensus2000Census Tract college% college degreeCensus2000Census Tract managep% management workersCensus2000Census Tract prodp% production workersCensus2000Census Tract famcpchl% family with childrenCensus2000Census Tract nfmalone% nonfamily living aloneCensus2000Census Tract black_p% blackCensus2000Census Tract nhwht_p% non-hispanic whiteCensus2000Census Tract nativebr% native bornCensus2000Census Tract Structural Characteristics of Housing Units medroommedian number of roomCensus2000Census Tract hudetp% detached housing unitCensus2000Census Tract yrhubltmedian year structure builtCensus2000Census Tract Locational Characteristics (Amenities) of Neighborhoods ptratiopupil to teacher ratioNCES*2002School District schexpschool expenditure per studentNCES2002School District vrlcrimeviolent crime rateFBI**2003Designated Place prpcrimeproperty crime rateFBI2003Designated Place jobacmjob accessibility (Hansen 1959)CTPP***2000Census Tract *National Center for Education Statistics; **FBI annual report “Crime in the U.S. 2003”; *** CTPP: Census Transportation Planning Package Dependent variables: median home value of owner-occupied housing units

Study set: 85 metropolitan areas

Clustering method that minimizes the following objective function: Updates cluster means v i and membership degree u ik until the algorithm converges Vectors of data point, 1 ≤ k ≤ n Center of cluster i, 1 ≤ i ≤ c Membership degree of data point k with cluster i; [0,1] Fuzziness amount associated with assigning data point k to cluster i, 1≤ m ≤ ∞ Source: Bezdek 1981 x1x1 x2x2 What is fuzzy c-means (FCM)? (III-3a) (III-3b)

FCM: missing elements Optimal number of clusters c* Optimal fuzziness amount m* m c FCM

Extended fuzzy c-means algorithm Step 1: Initialize the parameters related to fuzzy partitioning: c = 2 (2 ≤ c  cmax), m = 1 (1 ≤ m  mmax), where c is an integer, m is a real number; Fix minc where minc is incremental value of m ( 0 < minc ≤ 0.1); Fix cut-off threshold  L; Choose validity index v Step 2: Given c and m, initialize U(0) so that it becomes the fuzzy matrix. Then at step l, l = 0, 1, 2, ….; Step 3: Calculate the c fuzzy cluster centers {vi(l)} with (III-3a) and U(l) Step 4: Update U(l+1) using (III-3b) and {vi(l)} Step 5: Compare U(l) to U(l+1) in a convenient matrix norm; if || U(l+1) – U(l) || ≤  L to go step 6; otherwise return to Step 3. Step 6: Compute the validity index for given c and m Step 7: If c < cmax, then increase c  c + 1 and go to step 3; otherwise go to step 8 Step 8: If m < mmax, then increase m  m + minc and go to step 3; otherwise go to step 9 Step 9: Obtain the optimal validity index from, optimal number of clusters c*, and optimal amount of fuzziness exponent m*; The optimal fuzzy partition U is obtained given c* and m*

Cluster validity indices Partition coefficientPartition entropy Xie-Beni index SVi index where w is set to 2 in this study

Selected validity indices are calibrated over the study set Xie-Beni index is recommended as a validity index Average m* is 1.38 Determining c* and m*

Histogram of m* for FCM

Methodology: illustration

Median home value of Buffalo, NY

Dimensionality of Buffalo housing market PredictorCoefficientStandard Errort-statisticsp-value Constant Per capita income % college degree % family: couple with children % detached housing unit Housing age (year) % non-hispanic white % native born status Job accessibility Hedonic regression equation of median home value in Buffalo, NY Adjusted R sq = 84.3%

Optimal number of housing submarkets c*, Optimal fuzziness amount m*, Buffalo, NY c m c* Values in the cell represent Xie-Beni index given c and m

c* = 3; m* = 1.3 Membership to Cluster 1Membership to Cluster 2 Membership to Cluster 3Defuzzified Clusters Buffalo housing submarkets

Evaluating the performance of fuzzy clustering

Compare the sum of squared error derived from KM (m=1) and FCM (m=m*) given c* Fuzzy clustering outperforms crisp clustering Compare FCM with K-means (KM)

Conclusions Fuzzy set theory provides a mechanism for uncertainty handling involved in classification task Fuzzy c-means algorithm is of practical use in delineating housing submarkets Fuzzy set theory needs further attention in social science fields More works on the choice of parameters are needed