Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005.

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005

Areal data Study region is partitioned in disjoint areas The region is the union of the areas Each map has one or more associated measures  Treated as random variables Examples:  Map of Germany divided in municipalities. For each area, we measure the unemployment rate and the literacy rate.  Is unemployment correlated with years of school?  What about Brazil?

Violence in Minas Gerais

Attributes in areal data As a general rule, each measure is a sum, count or a similar aggregated function over all the area Each value is associated to all the corresponding area If we need to choose a single location, usually we take the polygon centroid There are no intermediate values

What is mapped in areal data? Typical values are rates or proportions Numerator = events Denominador = pop at risk Log maps?

Log rate of motor vehicle accident death per 100.000 residents, 1990-92

Log ratio of homicide death of males 15-49 per 100.000 residents of same group age, 1990-92

Models of Discrete Spatial Variation Random variable in area i n° of ill people n° of newborn babies per capita income Source: Renato Assunção (UFMG/Brasil)

When the study variable is a rate or a proportion, mapping those rates is the first obvious step in any analysis. However, the use of raw observed rates might be misleading, since the variability of those rates will be a function of the population counts, which differs widely between the areas. Bailey,1995 Dealing with rates and proportions

Source: Fred Ramos (CEDEST/Brasil)

Model-Driven Approaches Model of discrete spatial variation  Each subregion is described by is a statistical distribution Z i  e.g., homicides numbers are Poisson ( ,  ).  The main objective of the analysis is to estimate the joint distribution of random variables Z = {Z 1,…,Z n } We use a model-driven approach to correct the missing data  It is called the “Empirical Bayes” method...  We could also use the “Full Bayes” method (but that is another story...)

i (measured rate) In Bayesian statistics, the best estimate of the true and unknown rate is where Source: Fred Ramos (CEDEST/Brasil)

Simplifying assumptions for estimating means and variances for all random variables of all areas (Marshall, 1991) Empirical Bayes Source: Fred Ramos (CEDEST/Brasil)

Infant Mortality Rate – São Paulo (Raw) Source: Fred Ramos (CEDEST/Brasil)

Infant Mortality Rate – São Paulo (Corrected) Source: Fred Ramos (CEDEST/Brasil)

Some Important Questions How does scale matter? How do the spatial partitions matter? How does proximity matter? What can we learn by studing how multiple data vary in space? How much prior assumptions can we impose in our spatial data?

Problema das Unidades de Área Modificáveis - MAUP A Question of Scale A basic problem with areal data  The spatial definition of the frontiers of the areas impacts the results Different results can be obtained by just changing the frontiers of these zones. This problem is known as the “the modifiable area unit problem”

Per capita income Jobs/ population Illiterate / population Scale Effects Source: Fred Ramos (CEDEST/Brasil)

Scale Effects Per capita income Jobs/ population Illiterate / population Source: Fred Ramos (CEDEST/Brasil)

Population >60 years Illiteratesper capita income 270 ZONES OD97 Scale Effects: Figthing the MAUP Source: Fred Ramos (CEDEST/Brasil)

96 DISTRICTS OF SÃO PAULO Scale Effects: Figthing the MAUP Population >60 years Illiteratesper capita income Source: Fred Ramos (CEDEST/Brasil)

96 INCOME-HOMOGENOUS ZONES IN SÃO PAULO Scale Effects: Figthing the MAUP Population >60 years Illiteratesper capita income Source: Fred Ramos (CEDEST/Brasil)

270 ZONES OD97 96 DISTRICTS 96 INCOME- AGGREGATED A) Percentage of population 60 year-old or more B) Percentage of illiterate population C) Per capita individual income VARIABLES Correlation matrices Source: Fred Ramos (CEDEST/Brasil)

Get census data Identify inter- tract variation Adaptation Minimize the outlier effect Reduce data variability A Questão da Escala

Regionalization Reagregate N small areas (finest scale available) into M bigger regions to reduce scale effects. A possible solution: constrained clustering

Regionalization: Maps as graphs

Simple aggregationPopulation-constrained aggregation

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005.

Similar presentations

Presentation on theme: "Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005.

Similar presentations

Presentation on theme: "Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005."— Presentation transcript:

Similar presentations

About project

Feedback