The Use of Census Data and Spatial Statistical Tools in GIS to Identify Economically Distressed Areas Presented by: Barbara Gibson & Ty Simmons SCAUG User.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

A small taste of inferential statistics
Spatial statistics Lecture 3.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 9_part I ( and 9.7) Tests of Significance.
SA basics Lack of independence for nearby obs
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Random Thoughts 2012 (COMP 066) Jan-Michael Frahm Jared Heinly source: fivethirtyeight.com.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Hypothesis Testing II The Two-Sample Case.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 15 Data Analysis: Testing for Significant Differences.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Don’t cry because it is all over, smile because it happened.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
1 Nonparametric Statistical Techniques Chapter 17.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
AP Statistics Section 11.1 B More on Significance Tests.
Local Spatial Statistics Local statistics are developed to measure dependence in only a portion of the area. They measure the association between Xi and.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
IE241 Final Exam. 1. What is a test of a statistical hypothesis? Decision rule to either reject or not reject the null hypothesis.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Outline Sampling Measurement Descriptive Statistics:
Statistics & Evidence-Based Practice
Confidence Intervals for Means
Anticipating Patterns Statistical Inference
More on Inference.
Task 2. Average Nearest Neighborhood
Introduction to Spatial Statistical Analysis
Nonparametric Tests IPS Chapter : The Wilcoxon Rank Sum Test
Lecture Nine - Twelve Tests of Significance.
Hypothesis Testing: One Sample Cases
Unit 5: Hypothesis Testing
P-values.
Inference and Tests of Hypotheses
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
Introduction to Inferential Statistics
Hypothesis Tests for a Population Mean,
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Module 8 Statistical Reasoning in Everyday Life
CHAPTER 26: Inference for Regression
Chapter 9 Hypothesis Testing.
Econ 3790: Business and Economics Statistics
Elementary Statistics
Chapter 11: Inference for Distributions of Categorical Data
Daniela Stan Raicu School of CTI, DePaul University
Essential Statistics Introduction to Inference
Significance Tests: The Basics
Introduction to Estimation
Seminar in Economics Econ. 470
Intro to Confidence Intervals Introduction to Inference
Chapter 9: Significance Testing
Statistical Test A test of significance is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

The Use of Census Data and Spatial Statistical Tools in GIS to Identify Economically Distressed Areas Presented by: Barbara Gibson & Ty Simmons SCAUG User Group Meeting, Broken Arrow March 2nd, 2010

Introduction What we were asked to do Creating the Economic Conditions Index Using Spatial Statistical Tools Mapping the Results

What we were asked to do Approached by transportation planning staff to create an economic conditions index (similar to a Florida study) as a part of a TIGER grant application The index categorizes block groups based on their level of distress as measured by 3 factors Unemployment Families in poverty Substandard housing The index is based solely on 2000 census data by block group TIGER (Transportation Investment Generating Economic Recovery) which we were awarded!! The only one in the state! The project was a multi-model bridge on I-244 over the Arkansas River. Bridge would be the first of it’s kind in Tulsa built to accommodate highway, high-speed intercity and commuter rail, and pedestrian and bicycle traffic. The high speed passenger rail component gives the project national significance…………. The index score for each census block group was based on comparing its indicator scores to the average scores for the county in which the block group was located in our case Tulsa County

Unemployment – Tulsa County P43. SEX BY EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER [15] Universe: Population 16 years and over Total: P043001 432,088 Male: P043002 206,309 In labor force: P043003 156,810 In Armed Forces P043004 294 Civilian: P043005 156,516 Employed P043006 149,173 Unemployed P043007 7,343 Not in labor force P043008 49,999 Female: P043009 225,779 In labor force: P043010 133,228 In Armed Forces P043011 47 Civilian: P043012 133,181 Employed P043013 126,683 Unemployed P043014 6,498 Not in labor force P043015 92,551 13,841 / 432,088 = 3.20% Percent Unemployment Start with unemployment P43 – table reference number for the data

Unemployment – Tract 7 BG 2 Census Tract 7 Block Group 2 P43. SEX BY EMPLOYMENT STATUS FOR THE POPULATION 16 YEARS AND OVER [15] Universe: Population 16 years and over Total: P043001 691 Male: P043002 289 In labor force: P043003 180 In Armed Forces P043004 0 Civilian: P043005 180 Employed P043006 152 Unemployed P043007 28 Not in labor force P043008 109 Female: P043009 402 In labor force: P043010 186 In Armed Forces P043011 0 Civilian: P043012 186 Employed P043013 172 Unemployed P043014 14 Not in labor force P043015 216 Unemployment 42 / 691 = 6.08% Percent Unemployment

Unemployment Index Value Unemployment Index Value for Census Tract 7 Block Group 2 6.08 1.9 3.20

Families in Poverty – Tulsa County P90. POVERTY STATUS IN 1999 OF FAMILIES BY FAMILY TYPE BY PRESENCE OF RELATED CHILDREN UNDER 18 YEARS BY AGE OF RELATED CHILDREN [41] Universe: Families Total: P090001 148,189 Income in 1999 below poverty level: P090002 12,962 12,962 / 148,189 = 8.75% Percent Families in Poverty P90 – table reference number for census

Families in Poverty – Tract 7 BG 2 Census Tract 7 Block Group 2 P90. POVERTY STATUS IN 1999 OF FAMILIES BY FAMILY TYPE BY PRESENCE OF RELATED CHILDREN UNDER 18 YEARS BY AGE OF RELATED CHILDREN [41] Universe: Families Total: P090001 204 Income in 1999 below poverty level: P090002 37 37 / 207 = 18.14% Percent Families in Poverty

Family Poverty Index Value Family Poverty Index Value for Census Tract 7 Block Group 2 18.14 2.07 8.75

Substandard Housing Index Housing Index is based on 3 variables Housing Units lacking complete plumbing facilities Home value for all owner occupied housing units Year Structure Built for all housing units To create the substandard housing index by block group we found an index for each variable and then took the average of the 3 overall to a have a composite index score.

Plumbing Facilities – Tulsa County H47. PLUMBING FACILITIES [3] Universe: Housing units Total: H047001 243,953 Complete plumbing facilities H047002 242,404 Lacking complete plumbing facilities H047003 1,549 1,549 / 243,953 = 0.63% Percent Housing Units Lacking Complete Plumbing

Plumbing Facilities – Tract 7 BG 2 Census Tract 7 Block Group 2 H47. PLUMBING FACILITIES [3] Universe: Housing units Total: H047001 367 Complete plumbing facilities H047002 345 Lacking complete plumbing facilities H047003 22 22 / 367 = 5.99% Percent Housing Units Lacking Complete Plumbing

Lacking Complete Plumbing Index Value Lacking Complete Plumbing Index Value for Census Tract 7 Block Group 2 5.99 9.51 0.63

Home Value – Tulsa County H84. VALUE FOR ALL OWNER-OCCUPIED HOUSING UNITS [25] Universe: Owner-occupied housing units Total: H084001 140,131 Less than $10,000 H084002 1,264 $10,000 to $14,999 H084003 1,279 $15,000 to $19,999 H084004 1,345 $20,000 to $24,999 H084005 2,065 $25,000 to $29,999 H084006 2,751 $30,000 to $34,999 H084007 3,659 $35,000 to $39,999 H084008 4,605 $40,000 to $49,999 H084009 9,057 $50,000 to $59,999 H084010 11,084 $60,000 to $69,999 H084011 13,140 $70,000 to $79,999 H084012 13,268 $80,000 to $89,999 H084013 13,183 $90,000 to $99,999 H084014 11,290 $100,000 to $124,999 H084015 16,583 $125,000 to $149,999 H084016 11,885 $150,000 to $174,999 H084017 6,700 $175,000 to $199,999 H084018 4,562 $200,000 to $249,999 H084019 4,378 $250,000 to $299,999 H084020 2,837 $300,000 to $399,999 H084021 2,598 $400,000 to $499,999 H084022 1,142 $500,000 to $749,999 H084023 874 $750,000 to $999,999 H084024 255 $1,000,000 or more H084025 327 Total number of owner- occupied housing units with a value < $90,000 76,700 76,700 / 140,131 = 55% Percent of owner- Occupied housing units With a value < $90,000 For Home value and year structure built have to first look at the median for the county. In our case Tulsa Counties medain home value was $85,000. Since we are using grouped values you have to make a choice and we went with all owner-occupied housing units with a value less than $90,000. Tulsa County Median Value for all Owner-occupied housing units = $85,000

Occupied housing units Home Value – Tract 7 BG 2 Census Tract 7 Block Group 2 H84. VALUE FOR ALL OWNER-OCCUPIED HOUSING UNITS [25] Universe: Owner-occupied housing units Total: H084001 192 Less than $10,000 H084002 25 $10,000 to $14,999 H084003 16 $15,000 to $19,999 H084004 24 $20,000 to $24,999 H084005 24 $25,000 to $29,999 H084006 0 $30,000 to $34,999 H084007 13 $35,000 to $39,999 H084008 0 $40,000 to $49,999 H084009 21 $50,000 to $59,999 H084010 0 $60,000 to $69,999 H084011 10 $70,000 to $79,999 H084012 20 $80,000 to $89,999 H084013 19 $90,000 to $99,999 H084014 0 $100,000 to $124,999 H084015 13 $125,000 to $149,999 H084016 7 $150,000 to $174,999 H084017 0 $175,000 to $199,999 H084018 0 $200,000 to $249,999 H084019 0 $250,000 to $299,999 H084020 0 $300,000 to $399,999 H084021 0 $400,000 to $499,999 H084022 0 $500,000 to $749,999 H084023 0 $750,000 to $999,999 H084024 0 $1,000,000 or more H084025 0 Total number of owner-occupied housing units with a value < $90,000 172 172 / 192 = 89.58% Percent of owner- Occupied housing units With a value < $90,000

Home Value Index Home Value Index for Census Tract 7 Block Group 2 89.58 1.63 55

Year Built – Tulsa County H34. YEAR STRUCTURE BUILT [10] Universe: Housing units Total: H034001 243,953 Built 1999 to March 2000 H034002 5,196 Built 1995 to 1998 H034003 14,270 Built 1990 to 1994 H034004 13,202 Built 1980 to 1989 H034005 44,570 Built 1970 to 1979 H034006 54,908 Built 1960 to 1969 H034007 37,062 Built 1950 to 1959 H034008 37,160 Built 1940 to 1949 H034009 17,598 Built 1939 or earlier H034010 19,987 Total number of housing units Built before 1970 111,807 Tulsa County Median Year Structure Built for all housing units = 1972 111,807 / 243,953 = 46% Percent of housing Units Built before 1970 Year built we follow the same principle. For Tulsa county the median year built was 1972 so we went with all housing units built before 1970.

Percent of housing units Year Built – Tract 7 BG 2 Census Tract 7 Block Group 2 H34. YEAR STRUCTURE BUILT [10] Universe: Housing units Total: H034001 367 Built 1999 to March 2000 H034002 9 Built 1995 to 1998 H034003 6 Built 1990 to 1994 H034004 0 Built 1980 to 1989 H034005 19 Built 1970 to 1979 H034006 31 Built 1960 to 1969 H034007 27 Built 1950 to 1959 H034008 106 Built 1940 to 1949 H034009 146 Built 1939 or earlier H034010 23 Total number of housing units Built before 1970 = 302 302 / 367 = 82.29% Percent of housing units Built before 1970

Year Built Index Year Structure Built Index for Census Tract 7 Block Group 2 82.29 1.79 46

Substandard Housing Index Indicator scores for each block group were summed and averaged to provide an overall substandard housing Index score Lacking complete plumbing index score = 9.51 Home Value index score = 1.63 Year Structure Built index score = 1.79 Substandard Housing Index Score for Census Tract 7 Block Group 2 = (9.51 + 1.63 + 1.79) / 3 = 4.31 Now that we have the indicator scores for the 3 variables for housing we can develop our composite index. Scores are for our sample block group – census tract 7 block group 2

Economic Conditions Index Indicator scores for each census block are then summed and averaged to provide an overall economic conditions index score Unemployment Index = 1.9 Family Poverty Index = 2.07 Substandard Housing Index = 4.31 Index score for Census tract 7 block group 2 = (1.9 + 2.07 + 4.31) / 3 = 2.76 Now that we have our composite index score for substandard housing we will use the same formula for our overall economic conditions score. An index score was created for each of the 410 block groups within Tulsa County

Index Scores Mapped The results were mapped based on the natural breaks method using five classes. We knew north Tulsa was an economically distressed area from the beginning and our analysis illustrates that. We wanted to somehow emphasize the clustering of the high index score values around the I-244 bridge relative to the rest of Tulsa County, so we began exploring the spatial statistical tools in ArcGIS.

Using Spatial Statistical Tools Are found in ArcToolbox Toolsets Include: Analyzing Patterns Mapping Clusters Measuring Geographic Distributions Modeling Spatial Relationships Cluster and Outlier Analysis Tool http://webhelp.esri.com/arcgisdesktop/9.3 Spatial Statistical Tools come standard with ArcGIS, is not an extension. Modeling Spatial Relationships toolset is not available with ArcView.

Using Spatial Statistical Tools Spatial Statistics: The love child of Geography and Statistics Were developed specifically for use with geographic data Incorporates space, such as proximity, area, and connectivity into the statistical process Allows you to analyze spatial Distributions Patterns Processes Relationships Differs from traditional statistics in that you are not making inferences about the data, rather you typically are dealing with all the available data in your study area Traditional statistics typically works with a random sample with you trying to determine if your sample data is a good representation of the population at large. For example, what are the chances that the results from my exit poll will reflect the final election results. On the other hand, when you compute a statistic for the entire population with spatial statistics, you do not have an estimate, but rather a fact, since you are dealing with all the possible data.

Using Spatial Statistical Tools The Statistics behind it all The randomization null hypothesis – is used by many of the tools in the spatial statistics toolbox for statistical significance testing. It postulates that there is no spatial pattern among the features, or among the values associated with those features, in the study area. Most statistical tests begin by identifying a null hypothesis, which is a statement of no effect or no difference. The cluster and outlier analysis tool, like many of the spatial statistical tools, uses the randomization null hypothesis, which postulates that there is no spatial pattern among the features, or among the values associated with those features, in the study area. If I were to pick up the index score values and throw them into the block groups I would have one possible spatial arrangement. The randomization null hypothesis states that if I do this an infinite number of times most of the time the pattern will be different than the observed pattern (what our actual index score map looks like). The randomization null hypothesis states that your data is one of many, many possible versions of complete spatial randomness. The data values are fixed; only their spatial arrangement could vary.

Using Spatial Statistical Tools The Statistics behind it all Z score – test of statistical significance that helps you decide whether or not to reject the null hypothesis. They tell us how many standard deviations our index scores are from the mean and in what direction P-value – the probability that you have falsely rejected the null hypothesis. The smaller the p-value is, the stronger the evidence is against the null hypothesis In order to determine whether or not to reject the null hypothesis, you have to derive a Z score and a p-value. The Z score and the p-value help us determine whether or not the clustering we see on our map is actually statistically significant.

Using Spatial Statistical Tools The Statistics behind it all Both the z score and p-value are associated with the standard normal distribution, which relates standard deviations with probabilities and allows significance and confidence to be attached to the Z scores and p-values Very high or low (negative) Z scores with very small p-values are found in the tails of the normal distribution Using a 95% confidence level, the Z scores would be -1.96 and +1.96 and the p-value would be 0.05, which means you can reject the null hypothesis 95% 2.5% When you perform a feature pattern analysis, such as cluster/outlier analysis, and it yields small p-values and either a very high or very low (negative) Z score, this indicates it is very unlikely that the observed pattern is some version of the theoretical spatial random pattern represented by your null hypothesis, thus you can reject the null hypothesis.

Using Spatial Statistical Tools The Statistics behind it all A Z score between -1.96 and +1.96 means the p-value will be larger than 0.05, thus the null hypothesis cannot be rejected 95% 2.5% On the other hand, if your Z scores are between -1.96 and +1.96 and your p-value is larger than 0.05, you cannot reject the null hypothesis.

Using Spatial Statistical Tools Cluster and Outlier Analysis Analysis Identifies clusters of features with similar magnitudes, as well as spatial outliers It does this by calculating Local Moran’s I Value Z score P-value COType field Interpretation Positive I value indicates a cluster Negative I value indicates an outlier COType field distinguishes between statistically significant Cluster of high values (HH) Cluster of low values (LL) Outlier with a high value surrounded primarily by low values (HL) Outlier with a low value surrounded primarily by high values (LH) It is important to note that the Cluster and Outlier Analysis tool requires projected data to accurately measure distance. The Local Moran's index evaluates whether the pattern expressed is clustered, dispersed, or random. It can only be interpreted within the context of the computed Z score or p-value. COType field gives you an alpha code for statistically significant features of HH, LL, HL, or LH

Using Spatial Statistical Tools Cluster and Outlier Analysis

Mapping the Results

Mapping the Results Census Tract 7, Block Group 2 Local Moran’s I Value = 0.019055 Z score = 19.4146 P-value = 0 COType field = HH A positive Local Moran’s I Value means that this block group is part of a cluster The large Z score means that the block group is statistically significant. The very small (non existent) p-value means we can safely reject our null hypothsis And the Cluster type field reveals that this is part of a cluster of high economic distress values.

Other Applications Brownfield identification funding Kendall-Whittier Tulsa Community Foundation Anticipate using for environmental justice maps for 2035 regional transportation plan

Questions? Contact Information: Ty Simmons – tsimmons@incog.org Barbara Gibson – bgibson@incog.org Phone: 584-7526