Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005.

Slides:



Advertisements
Similar presentations
Introduction Simple Random Sampling Stratified Random Sampling
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Sampling: Final and Initial Sample Size Determination
Random Sampling and Data Description
Chapter 4 Probability and Probability Distributions
Statistics 1: Introduction to Probability and Statistics Section 3-3.
4. FREQUENCY DISTRIBUTION
Bayesian Decision Theory
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
Spatial Interpolation
Why sample? Diversity in populations Practicality and cost.
Presenting: Assaf Tzabari
GIS in Spatial Epidemiology: small area studies of exposure- outcome relationships Robert Haining Department of Geography University of Cambridge.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
1.  Why understanding probability is important?  What is normal curve  How to compute and interpret z scores. 2.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Transforming the data Modified from: Gotelli and Allison Chapter 8; Sokal and Rohlf 2000 Chapter 13.
Spatial Statistics for Cancer Surveillance Martin Kulldorff Harvard Medical School and Harvard Pilgrim Health Care.
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Area Objects and Spatial Autocorrelation Chapter 7 Geographic Information Analysis O’Sullivan and Unwin.
5-2 Probability Distributions This section introduces the important concept of a probability distribution, which gives the probability for each value of.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
H IERARCHICAL B AYESIAN M ODELLING OF THE S PATIAL D EPENDENCE OF I NSURANCE R ISK L ÁSZLÓ M ÁRKUS and M IKLÓS A RATÓ Eötvös Loránd University Budapest,
1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Acknowledgements Epidemiologic Query & Mapping System Patrick O’Carroll Clark Johnson Richard Hoskins Cathy O’Connor Sherrilynn Fuller Principal Investigator.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Spatial Statistics in Ecology: Area Data Lecture Four.
Our objectives: We will consider four thematic map types choropleth proportional symbol dot density cartograms understand decisions involved in classifying.
Data Analysis & Interpretation Intermediate Injury Prevention August 23-26, 2011 Billings, MT.
Pyramid Building (5 points)
Foundations of Sociological Inquiry Quantitative Data Analysis.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Spatial Data Analysis of Areas: Regression. Introduction Basic Idea  Dependent variable (Y) determined by independent variables X1,X2 (e.g., Y = mX +
Population Calculations Data from: Population Reference Bureau World Population Data Sheet. Available at
MDG data at the sub-national level: relevance, challenges and IAEG recommendations Workshop on MDG Monitoring United Nations Statistics Division Kampala,
1 ◄ ◄ Maternal and Infant Health data for California Choose one vital records indicator:  Preterm birth (birth prior to 37 weeks of pregnancy among singletons)
1 G Lect 11a G Lecture 11a Example: Comparing variances ANOVA table ANOVA linear model ANOVA assumptions Data transformations Effect sizes.
QM Spring 2002 Business Statistics Probability Distributions.
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 6: Random Variables Section 6.1 Discrete and Continuous Random Variables.
Statistics for Engineer. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems and design.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
GIS September 27, Announcements Next lecture is on October 18th (read chapters 9 and 10) Next lecture is on October 18th (read chapters 9 and 10)
Introduction to statistics I Sophia King Rm. P24 HWB
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Lecture 1.31 Criteria for optimal reception of radio signals.
Sampling Distributions and Estimation
Chapter 2: The Pitfalls and Potential of Spatial Data
The Statistical Imagination
4 Sampling.
Chapter 4 – Part 3.
Discrete Event Simulation - 4
Before-After Studies Part I
Statistical NLP: Lecture 4
Lecture Slides Elementary Statistics Twelfth Edition
Why are Spatial Data Special?
Sampling.
Lecture Slides Elementary Statistics Twelfth Edition
Introductory Statistics
Presentation transcript:

Spatial Data Analysis Areas I: Rate Smoothing and the MAUP Gilberto Câmara INPE, Brazil Ifgi, Muenster, Fall School 2005

Areal data Study region is partitioned in disjoint areas The region is the union of the areas Each map has one or more associated measures  Treated as random variables Examples:  Map of Germany divided in municipalities. For each area, we measure the unemployment rate and the literacy rate.  Is unemployment correlated with years of school?  What about Brazil?

Violence in Minas Gerais

Attributes in areal data As a general rule, each measure is a sum, count or a similar aggregated function over all the area Each value is associated to all the corresponding area If we need to choose a single location, usually we take the polygon centroid There are no intermediate values

What is mapped in areal data? Typical values are rates or proportions Numerator = events Denominador = pop at risk Log maps?

Log rate of motor vehicle accident death per residents,

Log ratio of homicide death of males per residents of same group age,

Models of Discrete Spatial Variation Random variable in area i n° of ill people n° of newborn babies per capita income Source: Renato Assunção (UFMG/Brasil)

When the study variable is a rate or a proportion, mapping those rates is the first obvious step in any analysis. However, the use of raw observed rates might be misleading, since the variability of those rates will be a function of the population counts, which differs widely between the areas. Bailey,1995 Dealing with rates and proportions

Source: Fred Ramos (CEDEST/Brasil)

Model-Driven Approaches Model of discrete spatial variation  Each subregion is described by is a statistical distribution Z i  e.g., homicides numbers are Poisson ( ,  ).  The main objective of the analysis is to estimate the joint distribution of random variables Z = {Z 1,…,Z n } We use a model-driven approach to correct the missing data  It is called the “Empirical Bayes” method...  We could also use the “Full Bayes” method (but that is another story...)

i (measured rate) In Bayesian statistics, the best estimate of the true and unknown rate is where Source: Fred Ramos (CEDEST/Brasil)

Simplifying assumptions for estimating means and variances for all random variables of all areas (Marshall, 1991) Empirical Bayes Source: Fred Ramos (CEDEST/Brasil)

Infant Mortality Rate – São Paulo (Raw) Source: Fred Ramos (CEDEST/Brasil)

Infant Mortality Rate – São Paulo (Corrected) Source: Fred Ramos (CEDEST/Brasil)

Some Important Questions How does scale matter? How do the spatial partitions matter? How does proximity matter? What can we learn by studing how multiple data vary in space? How much prior assumptions can we impose in our spatial data?

Problema das Unidades de Área Modificáveis - MAUP A Question of Scale A basic problem with areal data  The spatial definition of the frontiers of the areas impacts the results Different results can be obtained by just changing the frontiers of these zones. This problem is known as the “the modifiable area unit problem”

Per capita income Jobs/ population Illiterate / population Scale Effects Source: Fred Ramos (CEDEST/Brasil)

Scale Effects Per capita income Jobs/ population Illiterate / population Source: Fred Ramos (CEDEST/Brasil)

Population >60 years Illiteratesper capita income 270 ZONES OD97 Scale Effects: Figthing the MAUP Source: Fred Ramos (CEDEST/Brasil)

96 DISTRICTS OF SÃO PAULO Scale Effects: Figthing the MAUP Population >60 years Illiteratesper capita income Source: Fred Ramos (CEDEST/Brasil)

96 INCOME-HOMOGENOUS ZONES IN SÃO PAULO Scale Effects: Figthing the MAUP Population >60 years Illiteratesper capita income Source: Fred Ramos (CEDEST/Brasil)

270 ZONES OD97 96 DISTRICTS 96 INCOME- AGGREGATED A) Percentage of population 60 year-old or more B) Percentage of illiterate population C) Per capita individual income VARIABLES Correlation matrices Source: Fred Ramos (CEDEST/Brasil)

Get census data Identify inter- tract variation Adaptation Minimize the outlier effect Reduce data variability A Questão da Escala

Regionalization Reagregate N small areas (finest scale available) into M bigger regions to reduce scale effects. A possible solution: constrained clustering

Regionalization: Maps as graphs

Simple aggregationPopulation-constrained aggregation