The Use of Census Data and Spatial Statistical Tools in GIS to Identify Economically Distressed Areas Presented by: Barbara Gibson & Ty Simmons SCAUG User Group Meeting, Broken Arrow March 2nd, 2010
Introduction What we were asked to do Creating the Economic Conditions Index Using Spatial Statistical Tools Mapping the Results
What we were asked to do Approached by transportation planning staff to create an economic conditions index (similar to a Florida study) as a part of a TIGER grant application The index categorizes block groups based on their level of distress as measured by 3 factors Unemployment Families in poverty Substandard housing The index is based solely on 2000 census data by block group TIGER (Transportation Investment Generating Economic Recovery) which we were awarded!! The only one in the state! The project was a multi-model bridge on I-244 over the Arkansas River. Bridge would be the first of it’s kind in Tulsa built to accommodate highway, high-speed intercity and commuter rail, and pedestrian and bicycle traffic. The high speed passenger rail component gives the project national significance…………. The index score for each census block group was based on comparing its indicator scores to the average scores for the county in which the block group was located in our case Tulsa County
Unemployment – Tulsa County P43. SEX BY EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER [15] Universe: Population 16 years and over Total: P043001 432,088 Male: P043002 206,309 In labor force: P043003 156,810 In Armed Forces P043004 294 Civilian: P043005 156,516 Employed P043006 149,173 Unemployed P043007 7,343 Not in labor force P043008 49,999 Female: P043009 225,779 In labor force: P043010 133,228 In Armed Forces P043011 47 Civilian: P043012 133,181 Employed P043013 126,683 Unemployed P043014 6,498 Not in labor force P043015 92,551 13,841 / 432,088 = 3.20% Percent Unemployment Start with unemployment P43 – table reference number for the data
Unemployment – Tract 7 BG 2 Census Tract 7 Block Group 2 P43. SEX BY EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER [15] Universe: Population 16 years and over Total: P043001 691 Male: P043002 289 In labor force: P043003 180 In Armed Forces P043004 0 Civilian: P043005 180 Employed P043006 152 Unemployed P043007 28 Not in labor force P043008 109 Female: P043009 402 In labor force: P043010 186 In Armed Forces P043011 0 Civilian: P043012 186 Employed P043013 172 Unemployed P043014 14 Not in labor force P043015 216 Unemployment 42 / 691 = 6.08% Percent Unemployment
Unemployment Index Value Unemployment Index Value for Census Tract 7 Block Group 2 6.08 1.9 3.20
Families in Poverty – Tulsa County P90. POVERTY STATUS IN 1999 OF FAMILIES BY FAMILY TYPE BY PRESENCE OF RELATED CHILDREN UNDER 18 YEARS BY AGE OF RELATED CHILDREN [41] Universe: Families Total: P090001 148,189 Income in 1999 below poverty level: P090002 12,962 12,962 / 148,189 = 8.75% Percent Families in Poverty P90 – table reference number for census
Families in Poverty – Tract 7 BG 2 Census Tract 7 Block Group 2 P90. POVERTY STATUS IN 1999 OF FAMILIES BY FAMILY TYPE BY PRESENCE OF RELATED CHILDREN UNDER 18 YEARS BY AGE OF RELATED CHILDREN [41] Universe: Families Total: P090001 204 Income in 1999 below poverty level: P090002 37 37 / 207 = 18.14% Percent Families in Poverty
Family Poverty Index Value Family Poverty Index Value for Census Tract 7 Block Group 2 18.14 2.07 8.75
Substandard Housing Index Housing Index is based on 3 variables Housing Units lacking complete plumbing facilities Home value for all owner occupied housing units Year Structure Built for all housing units To create the substandard housing index by block group we found an index for each variable and then took the average of the 3 overall to a have a composite index score.
Plumbing Facilities – Tulsa County H47. PLUMBING FACILITIES [3] Universe: Housing units Total: H047001 243,953 Complete plumbing facilities H047002 242,404 Lacking complete plumbing facilities H047003 1,549 1,549 / 243,953 = 0.63% Percent Housing Units Lacking Complete Plumbing
Plumbing Facilities – Tract 7 BG 2 Census Tract 7 Block Group 2 H47. PLUMBING FACILITIES [3] Universe: Housing units Total: H047001 367 Complete plumbing facilities H047002 345 Lacking complete plumbing facilities H047003 22 22 / 367 = 5.99% Percent Housing Units Lacking Complete Plumbing
Lacking Complete Plumbing Index Value Lacking Complete Plumbing Index Value for Census Tract 7 Block Group 2 5.99 9.51 0.63
Home Value – Tulsa County H84. VALUE FOR ALL OWNER-OCCUPIED HOUSING UNITS [25] Universe: Owner-occupied housing units Total: H084001 140,131 Less than $10,000 H084002 1,264 $10,000 to $14,999 H084003 1,279 $15,000 to $19,999 H084004 1,345 $20,000 to $24,999 H084005 2,065 $25,000 to $29,999 H084006 2,751 $30,000 to $34,999 H084007 3,659 $35,000 to $39,999 H084008 4,605 $40,000 to $49,999 H084009 9,057 $50,000 to $59,999 H084010 11,084 $60,000 to $69,999 H084011 13,140 $70,000 to $79,999 H084012 13,268 $80,000 to $89,999 H084013 13,183 $90,000 to $99,999 H084014 11,290 $100,000 to $124,999 H084015 16,583 $125,000 to $149,999 H084016 11,885 $150,000 to $174,999 H084017 6,700 $175,000 to $199,999 H084018 4,562 $200,000 to $249,999 H084019 4,378 $250,000 to $299,999 H084020 2,837 $300,000 to $399,999 H084021 2,598 $400,000 to $499,999 H084022 1,142 $500,000 to $749,999 H084023 874 $750,000 to $999,999 H084024 255 $1,000,000 or more H084025 327 Total number of owner- occupied housing units with a value < $90,000 76,700 76,700 / 140,131 = 55% Percent of owner- Occupied housing units With a value < $90,000 For Home value and year structure built have to first look at the median for the county. In our case Tulsa Counties medain home value was $85,000. Since we are using grouped values you have to make a choice and we went with all owner-occupied housing units with a value less than $90,000. Tulsa County Median Value for all Owner-occupied housing units = $85,000
Occupied housing units Home Value – Tract 7 BG 2 Census Tract 7 Block Group 2 H84. VALUE FOR ALL OWNER-OCCUPIED HOUSING UNITS [25] Universe: Owner-occupied housing units Total: H084001 192 Less than $10,000 H084002 25 $10,000 to $14,999 H084003 16 $15,000 to $19,999 H084004 24 $20,000 to $24,999 H084005 24 $25,000 to $29,999 H084006 0 $30,000 to $34,999 H084007 13 $35,000 to $39,999 H084008 0 $40,000 to $49,999 H084009 21 $50,000 to $59,999 H084010 0 $60,000 to $69,999 H084011 10 $70,000 to $79,999 H084012 20 $80,000 to $89,999 H084013 19 $90,000 to $99,999 H084014 0 $100,000 to $124,999 H084015 13 $125,000 to $149,999 H084016 7 $150,000 to $174,999 H084017 0 $175,000 to $199,999 H084018 0 $200,000 to $249,999 H084019 0 $250,000 to $299,999 H084020 0 $300,000 to $399,999 H084021 0 $400,000 to $499,999 H084022 0 $500,000 to $749,999 H084023 0 $750,000 to $999,999 H084024 0 $1,000,000 or more H084025 0 Total number of owner-occupied housing units with a value < $90,000 172 172 / 192 = 89.58% Percent of owner- Occupied housing units With a value < $90,000
Home Value Index Home Value Index for Census Tract 7 Block Group 2 89.58 1.63 55
Year Built – Tulsa County H34. YEAR STRUCTURE BUILT [10] Universe: Housing units Total: H034001 243,953 Built 1999 to March 2000 H034002 5,196 Built 1995 to 1998 H034003 14,270 Built 1990 to 1994 H034004 13,202 Built 1980 to 1989 H034005 44,570 Built 1970 to 1979 H034006 54,908 Built 1960 to 1969 H034007 37,062 Built 1950 to 1959 H034008 37,160 Built 1940 to 1949 H034009 17,598 Built 1939 or earlier H034010 19,987 Total number of housing units Built before 1970 111,807 Tulsa County Median Year Structure Built for all housing units = 1972 111,807 / 243,953 = 46% Percent of housing Units Built before 1970 Year built we follow the same principle. For Tulsa county the median year built was 1972 so we went with all housing units built before 1970.
Percent of housing units Year Built – Tract 7 BG 2 Census Tract 7 Block Group 2 H34. YEAR STRUCTURE BUILT [10] Universe: Housing units Total: H034001 367 Built 1999 to March 2000 H034002 9 Built 1995 to 1998 H034003 6 Built 1990 to 1994 H034004 0 Built 1980 to 1989 H034005 19 Built 1970 to 1979 H034006 31 Built 1960 to 1969 H034007 27 Built 1950 to 1959 H034008 106 Built 1940 to 1949 H034009 146 Built 1939 or earlier H034010 23 Total number of housing units Built before 1970 = 302 302 / 367 = 82.29% Percent of housing units Built before 1970
Year Built Index Year Structure Built Index for Census Tract 7 Block Group 2 82.29 1.79 46
Substandard Housing Index Indicator scores for each block group were summed and averaged to provide an overall substandard housing Index score Lacking complete plumbing index score = 9.51 Home Value index score = 1.63 Year Structure Built index score = 1.79 Substandard Housing Index Score for Census Tract 7 Block Group 2 = (9.51 + 1.63 + 1.79) / 3 = 4.31 Now that we have the indicator scores for the 3 variables for housing we can develop our composite index. Scores are for our sample block group – census tract 7 block group 2
Economic Conditions Index Indicator scores for each census block are then summed and averaged to provide an overall economic conditions index score Unemployment Index = 1.9 Family Poverty Index = 2.07 Substandard Housing Index = 4.31 Index score for Census tract 7 block group 2 = (1.9 + 2.07 + 4.31) / 3 = 2.76 Now that we have our composite index score for substandard housing we will use the same formula for our overall economic conditions score. An index score was created for each of the 410 block groups within Tulsa County
Index Scores Mapped The results were mapped based on the natural breaks method using five classes. We knew north Tulsa was an economically distressed area from the beginning and our analysis illustrates that. We wanted to somehow emphasize the clustering of the high index score values around the I-244 bridge relative to the rest of Tulsa County, so we began exploring the spatial statistical tools in ArcGIS.
Using Spatial Statistical Tools Are found in ArcToolbox Toolsets Include: Analyzing Patterns Mapping Clusters Measuring Geographic Distributions Modeling Spatial Relationships Cluster and Outlier Analysis Tool http://webhelp.esri.com/arcgisdesktop/9.3 Spatial Statistical Tools come standard with ArcGIS, is not an extension. Modeling Spatial Relationships toolset is not available with ArcView.
Using Spatial Statistical Tools Spatial Statistics: The love child of Geography and Statistics Were developed specifically for use with geographic data Incorporates space, such as proximity, area, and connectivity into the statistical process Allows you to analyze spatial Distributions Patterns Processes Relationships Differs from traditional statistics in that you are not making inferences about the data, rather you typically are dealing with all the available data in your study area Traditional statistics typically works with a random sample with you trying to determine if your sample data is a good representation of the population at large. For example, what are the chances that the results from my exit poll will reflect the final election results. On the other hand, when you compute a statistic for the entire population with spatial statistics, you do not have an estimate, but rather a fact, since you are dealing with all the possible data.
Using Spatial Statistical Tools The Statistics behind it all The randomization null hypothesis – is used by many of the tools in the spatial statistics toolbox for statistical significance testing. It postulates that there is no spatial pattern among the features, or among the values associated with those features, in the study area. Most statistical tests begin by identifying a null hypothesis, which is a statement of no effect or no difference. The cluster and outlier analysis tool, like many of the spatial statistical tools, uses the randomization null hypothesis, which postulates that there is no spatial pattern among the features, or among the values associated with those features, in the study area. If I were to pick up the index score values and throw them into the block groups I would have one possible spatial arrangement. The randomization null hypothesis states that if I do this an infinite number of times most of the time the pattern will be different than the observed pattern (what our actual index score map looks like). The randomization null hypothesis states that your data is one of many, many possible versions of complete spatial randomness. The data values are fixed; only their spatial arrangement could vary.
Using Spatial Statistical Tools The Statistics behind it all Z score – test of statistical significance that helps you decide whether or not to reject the null hypothesis. They tell us how many standard deviations our index scores are from the mean and in what direction P-value – the probability that you have falsely rejected the null hypothesis. The smaller the p-value is, the stronger the evidence is against the null hypothesis In order to determine whether or not to reject the null hypothesis, you have to derive a Z score and a p-value. The Z score and the p-value help us determine whether or not the clustering we see on our map is actually statistically significant.
Using Spatial Statistical Tools The Statistics behind it all Both the z score and p-value are associated with the standard normal distribution, which relates standard deviations with probabilities and allows significance and confidence to be attached to the Z scores and p-values Very high or low (negative) Z scores with very small p-values are found in the tails of the normal distribution Using a 95% confidence level, the Z scores would be -1.96 and +1.96 and the p-value would be 0.05, which means you can reject the null hypothesis 95% 2.5% When you perform a feature pattern analysis, such as cluster/outlier analysis, and it yields small p-values and either a very high or very low (negative) Z score, this indicates it is very unlikely that the observed pattern is some version of the theoretical spatial random pattern represented by your null hypothesis, thus you can reject the null hypothesis.
Using Spatial Statistical Tools The Statistics behind it all A Z score between -1.96 and +1.96 means the p-value will be larger than 0.05, thus the null hypothesis cannot be rejected 95% 2.5% On the other hand, if your Z scores are between -1.96 and +1.96 and your p-value is larger than 0.05, you cannot reject the null hypothesis.
Using Spatial Statistical Tools Cluster and Outlier Analysis Analysis Identifies clusters of features with similar magnitudes, as well as spatial outliers It does this by calculating Local Moran’s I Value Z score P-value COType field Interpretation Positive I value indicates a cluster Negative I value indicates an outlier COType field distinguishes between statistically significant Cluster of high values (HH) Cluster of low values (LL) Outlier with a high value surrounded primarily by low values (HL) Outlier with a low value surrounded primarily by high values (LH) It is important to note that the Cluster and Outlier Analysis tool requires projected data to accurately measure distance. The Local Moran's index evaluates whether the pattern expressed is clustered, dispersed, or random. It can only be interpreted within the context of the computed Z score or p-value. COType field gives you an alpha code for statistically significant features of HH, LL, HL, or LH
Using Spatial Statistical Tools Cluster and Outlier Analysis
Mapping the Results
Mapping the Results Census Tract 7, Block Group 2 Local Moran’s I Value = 0.019055 Z score = 19.4146 P-value = 0 COType field = HH A positive Local Moran’s I Value means that this block group is part of a cluster The large Z score means that the block group is statistically significant. The very small (non existent) p-value means we can safely reject our null hypothsis And the Cluster type field reveals that this is part of a cluster of high economic distress values.
Other Applications Brownfield identification funding Kendall-Whittier Tulsa Community Foundation Anticipate using for environmental justice maps for 2035 regional transportation plan
Questions? Contact Information: Ty Simmons – tsimmons@incog.org Barbara Gibson – bgibson@incog.org Phone: 584-7526