Final Project : 460 VALLEY CRIMES. Chontanat Suwan Geography 460 : Spatial Analysis Prof. Steven Graves, Ph.D.

Slides:



Advertisements
Similar presentations
Inference for Regression
Advertisements

Correlation and regression
Spatial statistics Lecture 3.
Objectives (BPS chapter 24)
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Statistical Tests Karen H. Hagglund, M.S.
Spatial Statistics II RESM 575 Spring 2010 Lecture 8.
QUANTITATIVE DATA ANALYSIS
The Simple Regression Model
Topic 2: Statistical Concepts and Market Returns
PPA 415 – Research Methods in Public Administration Lecture 5 – Normal Curve, Sampling, and Estimation.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Introduction to Educational Statistics
Introduction to Probability and Statistics Linear Regression and Correlation.
Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.
Density vs Hot Spot Analysis. Density Density analysis takes known quantities of some phenomenon and spreads them across the landscape based on the quantity.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Inference for regression - Simple linear regression
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview of Statistical Hypothesis Testing: The z-Test
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 9 Statistical Data Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Statistics and Quantitative Analysis U4320
Chapter 15 Data Analysis: Testing for Significant Differences.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Review. Statistics Types Descriptive – describe the data, create a picture of the data Mean – average of all scores Mode – score that appears the most.
Central Tendency & Dispersion
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Chapter 10: Introduction to Statistical Inference.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
What’s the Point? Working with 0-D Spatial Data in ArcGIS
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Chapter 10 The t Test for Two Independent Samples
Chapter Eight: Using Statistics to Answer Questions.
Data Analysis.
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
LIS 570 Summarising and presenting data - Univariate analysis.
Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Outline Sampling Measurement Descriptive Statistics:
Task 2. Average Nearest Neighborhood
PCB 3043L - General Ecology Data Analysis.
Statistics.
Chapter 9 Hypothesis Testing.
Analysis and Interpretation of Experimental Findings
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Final Project : 460 VALLEY CRIMES

Chontanat Suwan Geography 460 : Spatial Analysis Prof. Steven Graves, Ph.D.

Objectives  Analyzing a series of hypotheses of construction and interpret  Analyzing the pattern of crimes occurring in the San Fernando Valley by using statistical measures Data Preparation  Clipped out all outlying violent crimes in the San Fernando Valley  Exported clipped data and other data to a local workspace.  Spatially join violent crimes and liquor stores to census tracts  Symbolize entire meaning symbols

Statistical Measurements  Descriptive Statistics – Mean, Median, Standard Deviation  Descriptive Spatial Statistics – Mean Center, Median Center  Sampling - random, stratified  Inferential Statistics – Z-test, ANOVA  Goodness of Fit Testing – Komologrov-Smirnov  Inferential Spatial Statistics – Nearest Neighbor Analysis, Moran’s I  Predictive Statistics – Correlation, Regression

Collect Events  Collect Event Tool combines coincident points and have the exact same X and Y centroid coordinates. It creates a new Output Feature Class containing all of the unique locations found in the Input Feature Class. It then adds a field named ICOUNT to hold the sum of all incidents at each unique location.  The result shows the coincident points occur repeatedly on the east of San Fernando Valley. This can be concluded that crimes are clustered in these areas

Collect Events

Inferential Statistics NEAREST NEIGHBOR ANALYSIS, MORAN'S I

Average Nearest Neighbor  Calculates a nearest neighbor index based on the average distance from each feature to its nearest neighboring feature.  The z-score and p-value results are measures of statistical significance which tell if to reject the null hypothesis. For the Average Nearest Neighbor statistic.  If the index is lower than 1, the pattern exhibits clustering, or if the index is higher than 1, the trend is toward dispersion.  Hypothesis  If crimes in San Fernando Valley are clustered, then P- value would be close to zero, and Z-value will be less than (2-tailed)

Procedure : Average Nearest Neighbor In order to answer if are crimes clustered in the San Fernando Valley. Average Nearest Neighbor is used Running the tool and interpret the results The result indicates the crimes pattern is strongly clustered, since the Z-value is negatively huge value (below -1.65) and P-value is obviously close to zero.

Average Nearest Neighbor

Moran’s I  Moran’s I calculates a z-score and p-value to indicate whether or not you can reject the null hypothesis, measures spatial autocorrelation based on feature locations and attribute values using the Global Moran's I statistic.  If Z-score is greater than (right tail), then P-value is small, or less than 1 percent chance of making type l error.  Moran’s I : Black Neighborhood and Hispanic Neighborhood.

Moran’s I : Black Neighborhood  Statistical results shows Moran index, Z-score and p-value are 0.38, and very close to zero respectively. They indicate that the area pattern is much clustered. Moran’s index is close to 1 indicates clustering, Z- score is over the critical value at 1.96 (right tail) and falls within rejection area (difference). P-value is very small, or less than 1 percent of variable 1 equals to variable 2, or 1 percent chance of making type l error. LINKLINK  In conclusion, comparing crimes (Collected Counts) with black neighborhood can tell that crimes seem clustered and occur in the area where the a number of black people live, especially in the east of the valley.

Moran’s I : Black Neighborhood

Moran’s I : Hispanic Neighborhood  Moran’s Index value is 0.65 indicates clustering and P-value and very close to zero which highly significant tendency toward clustering between above average (red) and below average (blue) areas. On the right, statistical results shows Moran index, Z-score and p- value are 0.65, and very close to zero respectively. They indicate that the area pattern is totally clustered. Moran’s index is close to 1 indicates clustering, Z-score is over the critical value at 1.96 (right tail) and falls within rejection area (difference LINKLINK  Thus, comparing crimes (Collected Counts) with Hispanic neighborhood illustrates the crimes are clustered and occur in the areas the high density of Hispanic people are settled.

Moran’s I : Hispanic Neighborhood

Descriptive Spatial Statistics MEAN CENTER, MEDIAN CENTER

Mean Center  The mean center is a point constructed from the average x and y values for the input feature centroids and calculated based on either Euclidean or Manhattan distance require projected data to accurately measure distances.  The result shows the mean center located almost in the middle of the San Fernando Valley. It caused by the number of crimes on the east, otherwise outliers; implies where the concentrated area is.

Mean Center

Median Center  The Mean Center and Median Center are measures of central tendency and Identifies the location that minimizes overall Euclidean distance to the features in a dataset.  The result shows

Median Center

Standard Deviational Ellipse  Measures whether a distribution of features exhibits a directional trend within 1-standard deviation (whether features are farther from a specified point in one direction than in another direction). Feature geometry is projected to the output coordinate system prior to analysis.  Elliptical polygons indicates crimes are concentrated in east to west direction caused by outliers in Canoga Park area and others in Sunland area. Thus, the trend of crimes could have more chance to elongate along west direction, since there is a natural barriers, hills, on the east.

Standard Deviational Ellipse

Standard Distance  Measures the degree to which features are concentrated or dispersed around the geometric mean center. Provides a single summary measure of feature distribution around their center, each circle polygon is drawn with a radius equal to the standard distance.  The radius is 9.5 km. from the center and 384 out of 683 or 68% crimes fall within this circle. In this instance, the crimes are concentrated and over a half of crimes occurred within 9-10 km.

Standard Distance  These tables on the right represent collected and counted data in the violent crimes dataset; filtered by Standard Distance to analyze when the crimes most occur by analyzing the greatest number in months, weekdays and hours. According to the tables, the crimes most occur in October, on Monday and around midnight(the blue boxes). These could be used to narrow down areas and time that crimes would occur. Notice that the highest number of crime is not in December.

Standard Distance

Sampling RANDOM, STRATIFIED

Random  Sampling is used to compare a mean from a random sample to the mean of a population.  Critical Z score values when using a 95% confidence level are and standard deviations.  Hypothesis : If Z score is between +/-1.96, then the means are not different and the null hypothesis would not be rejected.

 Crime Rates : Z score of the sample mean of crime rates is 0.40 which falls within the critical area. P-value is 0.65 or 65% of confidence of the null hypothesis. It can be inferred the sample and population means of random crime rates are not different, and the null hypothesis should not be rejected.  Liquor Stores : Z score is 0.72 and P-value is 0.76 can be interpreted that Z score is in the critical area, +/-1.96, and P-value is 76% sure (Type I Error) that rejecting the null hypothesis is incorrect; would not reject the null hypothesis. Random

Stratified  the process of dividing members of the population into homogeneous subgroups before sampling.  Results : The Z-value of crimes is 1.17 and P-value is 0.87 represent the population mean and the sample mean came from the same population, since Z-value falls within +/-1.96 as well as P-value shows 87% trust in the null hypothesis. In this instance, should not reject the null. On the other hand, Z-value and P-value of liquor stores indicates differently, merely 33% of confidence in the null

Inferential Statistics Z-TEST, ANOVA

Goodness of Fit Testing KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST

Kolmogorov-Smirnov  K-S can be used to compare a sample with a reference probability distribution (one-sample K–S test), quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution. P-value is close to 1 indicates “Very similar”, close to 0 indicates “Very different”.  if P is small, close to ZERO, then Reject the null hypothesis.  Results : The table shows K-S test of normality. The significant values which illustrate 0.20 meet normality. Other values which lower than 0.05 are not normal. For example, the Q-Q plot of POP 2007 shows a normality between expected and observed values, meanwhile Crime Rate does not.

Predictive Statistics REGRESSION

Conclusion

A new crime station possibility  According to question three, opening a new rapid- response crime station which focus on Assault with Deadly Weapons. After running Mean Center Tool, Collect Event Tool and Standard Deviational Ellipse Tool. The findings illustrate that the mean is in the exact location when other crimes combined.  The location that would be proper to settle a new rapid- response crime station might be located in 64 square miles within the circle, especially on the east of the circle where the number of crime occurred repeatedly at the exact same locations.

Deathly Weapons Crimes

References   atial_statistics_tools/directional_distribution_standard_deviational_elli pse_spatial_statistics_.htm  atial_statistics_tools/standard_distance_spatial_statistics_.htm