DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**,

DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**, *Department of Geography and **Department of Ecology and Evolutionary Biology, University of Colorado Contact: Jeremy Mennis, Department of Geography, UCB 260, University of Colorado, Boulder, CO 80309, Phone: (303) 492-4794, Fax: (303) 492-7501, Email: jeremy@colorado.edu 66-1510 1511-1969 1973-2315 2318-2927 Residential Density Mean density of residential land in residential land (cells) 1939-1957 1958-1971 1972-1979 1980-1997 Median year structures built Housing YearPopulation Density 322-2669 2673-3348 3389-4525 4527-19810 People/m^2 in residential land 1506-1610 1611-1637 1638-1668 1669-1817 Mean elev. in residential land (m) Elevation 270-1551 1556-3002 3035-5716 5732-26237 Mean distance to limited access highways in residential land (m) Distance to Highway 10-167 168-308 309-520 523-2191 Mean density of commercial land in residential land (cells) Commercial Density NDVI (tract level) -0.12-0.15 0.16-0.19 0.20-0.22 0.23-0.33 Mean NDVI in residential land -0.55-0.00 0.01-0.20 0.21-0.40 0.40-0.78 NDVI NDVI (image) 37-77 78-89 90-95 96-99 % with a high school diploma Education natural veg. commercial residential agriculture water other Land use Land Use Data: Sources and Preprocessing Vegetation: NDVI from July 27, 1999 Landsat 7 ETM+ image Land use: USGS (from aerial photography) Socioeconomic Status: 2000 U.S. Census Residential and Commercial Density: calculated by generating grids of the number of residential and commercial grid cells within 1 km of each cell, then calculating the tract mean Elevation: USGS Highways: ESRI Note that although colors are mapped to entire tracts, data represents only the residential land within each tract. Denver Boulder Methods Spatial data mining techniques are exploratory methods for detecting patterns in very large spatial databases. We use spatial association rule mining and spatial on-line analytical processing (OLAP), as well as mapping and statistics. Spatial Association Rule Mining seeks to discover associations among transactions encoded in a spatial database. An association rule takes the form A → B where A and B are sets of predicates, and either A or B contains a spatial relationship. Interesting rules are found by using metrics such lift, which indicates how much more often than expected B occurs when paired with A. Magnum Opus Association Rule Mining Software Microsoft SQL Server Relational Star Schema Spatial On-Line Analytical Processing is an extension to the SQL GroupBy operation that exhaustively summarizes the value of a measurement variable contained in the fact table by all unique combinations of a set of categorical dimension variables contained in dimension tables. Here, we summarize NDVI by categorizations of the other variables, and export the results to GIS for mapping. Tract_ID 1 2 3 Education 73 58 82 Education_D 2 1 3 … …. … … ……… … Education_D 0 1 2 Level_2 0 0 1 31 PopDen_D 0 1 2 Level_2 0 0 1 31 Minority_D 0 1 2 Level_2 0 0 1 31 NDVI_D 0 1 2 Level_2 0 0 1 31 Fact Table Dimension Table Results Statistics. Correlations indicate that the variables that have the strongest relationships with NDVI are Population Density (negative relationship), Commercial Density (negative relationship) and Residential Density (positive relationship). In a multivariate context, Housing Year exerts the most influence when the influence of the other explanatory variables are accounted for, although its zero-order correlation is much lower those of all of the other explanatory variables. Spatial Association Rule Mining. Results suggest that residential NDVI is lowest in older, socioeconomically disadvantaged neighborhoods nearby commercial centers. Residential NDVI is highest in older neighborhoods with higher socioeconomic status. Residential NDVI is also highest in areas of residential concentration but sparse population, i.e. planned developments with large lots. Note the role of low Housing Year in predicting both low and high residential NDVI, which explains its statistical results. Spatial On-Line Analytical Processing. The maps at right show one OLAP result where mean NDVI is calculated for dimensions of Residential Density and Housing Year. Each tract is categorized as belonging to a unique combination of the dimensions (e.g. low Residential Density and high Housing Year). The mean for all tracts within each category is then calculated. Maps use the HSV color model to display the multidimensional data. Hue is mapped to Housing Year where yellow, orange, red, and purple map from lowest (oldest) to highest (most recent). Saturation is mapped to Residential Density where low (high) saturation represents low (high) Residential Density. Value maps to the NDVI value using a linear stretch between values of 105 and 255. The map on the left shows the NDVI data mapped to tracts categorized by Residential Density and Housing Year. The map on the right maps the color value to the NDVI mean for the entire data set. Areas that are darker (lighter) in the map on the left have a relatively high (low) NDVI. Older, densely residential areas have high NDVI. Comparison of the color cubes shows that Residential Density distinguishes between high and low NDVI, but only between the areas of lowest Residential Density and the other classes. Likewise, Housing Year is important only in distinguishing the most recent residential development from other areas. Low High NDVI Sample of the Mined Rule Set If Housing Year is low and Residential Density is lowthen NDVI is low (Lift = 4.8) If %Minority is high and Residential Density is lowthen NDVI is low (Lift = 4.4) If Elevation is low and Incomeis lowthen NDVI is low (Lift = 4.1) If Education is low and Distance to Commercialis lowthen NDVI is low (Lift = 3.3) If Housing Year is low and % Minority is lowthen NDVI is high (Lift = 5.4) If Housing Year is low and Distance to Highway is highthen NDVI is high (Lift = 5.0) If Population Den.is low and Residential Density is highthen NDVI is high (Lift = 4.8) If Housing Value is high and Distance to Commercial is lowthen NDVI is high (Lift = 3.9) (Constant) Education Housing Year Population Den. Elevation Dist. to Highway Commercial Den. Residential Den. Res. Den. Hous. Yr. Res. Den. Hous. Yr. With Value Mapped to NDVI DataWithout Value Mapped to NDVI Data Saturation Hue Value = NDVI Conclusions This research demonstrates that vegetation greeness in residential areas is a function of the age and type of development as well as socioeconomic status. Vegetation tends to be concentrated in older, densely residential developments that are far from commercial centers and highways and that contain primarily non-minority households with high educational attainment and income. Spatial data mining and visualization, in combination with multivariate statistics, have shown to be useful tools in identifying land cover, socioeconomic, and ecological relationships that are complex and non-linear. GIS serves a key function as data pre-processor and map display device. Future research will address using more sophisticated metrics of ecological character and the application of similar techniques to identify patterns and relationships in time series data. Objective and Motivation Analyzing socioeconomic-vegetation relations in the context of urban growth contributes to an understanding of the role of urban regions in carbon cycling and global environmental change. This project investigates the relationships among socioeconomic character, land use, and vegetation in residential land in the Front Range of Colorado, a rapidly urbanizing region.

DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**,

Similar presentations

Presentation on theme: "DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**,

Similar presentations

Presentation on theme: "DATA MINING RELATIONSHIPS AMONG URBAN SOCIOECONOMIC, LAND COVER, AND REMOTELY SENSED ECOLOGICAL DATA Jeremy Mennis*, Carol, Wessman, and Nancy Golubiewski**,"— Presentation transcript:

Similar presentations

About project

Feedback