Lecture 22 Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure.

Slides:



Advertisements
Similar presentations
Spatial point patterns and Geostatistics an introduction
Advertisements

MCMC for Poisson response models
Contextual effects In the previous sections we found that when regressing pupil attainment on pupil prior ability schools vary in both intercept and slope.
Lecture 23 Spatial Modelling 2 : Multiple membership and CAR models for spatial data.
Multilevel modelling short course
Sources and effects of bias in investigating links between adverse health outcomes and environmental hazards Frank Dunstan University of Wales College.
Lecture 11 (Chapter 9).
Objectives 10.1 Simple linear regression
The Geography, Math and Science of Disease Roger Palmer Red River High School GISetc.
CHAPTER 24: Inference for Regression
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
Applied Geostatistics
Geography and Geographical Analysis using the ONS Longitudinal Study Christopher Marshall & Julian Buxton CeLSIUS.
Deterministic Solutions Geostatistical Solutions
CHAPTER 6 Statistical Analysis of Experimental Data
Why Geography is important.
 There are times when an experiment cannot be carried out, but researchers would like to understand possible relationships in the data. Data is collected.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Peter Congdon, Centre for Statistics and Department of Geography, Queen Mary University of London. 1 Spatial Path Models with Multiple.
Inference for regression - Simple linear regression
Describing distributions with numbers
The micro-geography of UK demographic change Paul Norman School of Geography, University of Leeds understanding population trends and processes.
Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
1/26/09 1 Community Health Assessment in Small Populations: Tools for Working With “Small Numbers” Region 2 Quarterly Meeting January 26, 2009.
S3: Chapter 4 – Goodness of Fit and Contingency Tables Dr J Frost Last modified: 30 th August 2015.
Basics of spatial statistics EG1106: GI, a primer 12 th November 2004.
Chapter 8 Introduction to Hypothesis Testing
Basic Geographic Concepts GEOG 370 Instructor: Christine Erlien.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
MATH IN THE FORM OF STATISTICS IS VERY COMMON IN AP BIOLOGY YOU WILL NEED TO BE ABLE TO CALCULATE USING THE FORMULA OR INTERPRET THE MEANING OF THE RESULTS.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Geographic Information Science
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
MDG data at the sub-national level: relevance, challenges and IAEG recommendations Workshop on MDG Monitoring United Nations Statistics Division Kampala,
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Spatial Interpolation Chapter 13. Introduction Land surface in Chapter 13 Land surface in Chapter 13 Also a non-existing surface, but visualized as a.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Geographical Data and Measurement Geography, Data and Statistics.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
1 Part09: Applications of Multi- level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
1 Module IV: Applications of Multi-level Models to Spatial Epidemiology Francesca Dominici & Scott L Zeger.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Subregional public and private sector employment Richard Prothero Head of Regional Economic Analysis. Office for National Statistics March 2012.
TOPIC 1.2, RISK. SPECIFICATIONS: RISK 1.18 Analyse and interpret quantitative data on illness and mortality rates to determine health risks (including.
Examining difference: chi-squared (x 2 ). When to use Chi-Squared? Chi-squared is used to examine differences between what you actually find in your study.
Spatial Health Analysis Click to continue. A series of sketch maps and charts are used to present a selection of well-recognised health patterns. These.
South West Public Health Observatory Using GIS in Public Health Luke Hounsome South-West Public Health Observatory.
Stats Methods at IC Lecture 3: Regression.
Cases and controls A case is an individual with a disease, whose location can be represented by a point on the map (red dot). In this table we examine.
AP Biology Intro to Statistics
Introduction to Spatial Statistical Analysis
CHAPTER 26: Inference for Regression
Discrete Event Simulation - 4
Fixed, Random and Mixed effects
Presentation transcript:

Lecture 22 Spatial Modelling 1 : Incorporating spatial modelling in a random effects structure

Lecture Contents Introduction to spatial modelling Nested random effect levels House price dataset Including distance as a fixed effect Direction effects Focused clustering (Falkirk dataset)

Spatial statistical modelling Here we require a statistical approach that accounts for the spatial location at which a response is collected. This means that the model that is fitted to the data needs to account for the spatial effects. This may be to account for any effects due to location in the model or to predict values of the response at other locations via some form of interpolation that accounts for both other predictor variables and/or the spatial location.

Types of spatial data There are many forms of spatial data but we can broadly divide these into three types: (Cressie 1993) 1.Geostatistical data – here measurements are taken at a fixed number of chosen locations in a geographical area. 2.Lattice data – here measurement are taken at on a regular lattice and at each point on this lattice a measurement is collected. 3.Point process data – here each observation is the location of a response and its co-ordinates are also recorded.

Geostatistical data Such data are collected in various fields, particularly mining and earth sciences. A measurement e.g. %age coal ash is taken at each of a number of locations. Methods such as variograms and spatial Kriging are used to analyse such data. Other application areas include weather maps and agricultural field trials. Note such data is not ideally suited to standard random effect modelling.

Disease mapping One particular type of spatial modelling that is often linked with random effect modelling is disease mapping. Here cases of a disease (either human or animal) are observed over a chosen region e.g. a country. We then wish to infer the relative risk of the disease for a particular individual at a particular location based on the data collected. Both our practicals this afternoon will consider disease mapping datasets. The other two types of spatial data relate to disease data.

Lattice Data Such data is common in many fields, for example image analysis where the pixels in an image are found on a regular rectangular lattice. More importantly we will consider disease count data where counts of a disease are recorded for contiguous regions on a map. Although a map is not regular we can construct a lattice from a map by identifying neighbouring regions and linking neighbouring regions to form a lattice.

Example Here we see a map of 5 regions in the left hand picture, and on the right it has been converted to a lattice with connections between regions that share boundaries.

Point process data This data is also commonly found in disease mapping although may be used in many applications where cases of an event are seen at particular locations. Each item of data consists of the location of an event, the response (type of event) and potentially predictor variables for the event. Note Rasmus has worked more extensively in this area and will be happy to answer questions here.

Disease point process modelling In disease mapping our data is typically binary i.e. people are infected (or die from) a disease or are not. The data occur in point process form but there are 2 problems with analysing them as a point process: 1.All our responses are 1 as we only observe the infected/dead people! 2.Due to confidentiality and the sensitive nature of medical data the data cannot often be released as individual records. To counter point 1 we could sample control cases at random from the population however point 2 means that we typically total up cases for fixed areas and use a Poisson model on the lattice data that this creates.

Why might there be spatial effects? This depends on the response variable and application area. It is possible that geography is itself a predictor for our response or is a surrogate for other factors. Many factors can be linked to location e.g. weather, deprivation, altitude, pollution, wealth which might influence the response. So if our response is influenced by any of these factors then accounting for spatial effects many improve our model.

Nested random effects/ levels of geography The simplest link to random effect models is to consider nested random effects. We have considered pupils nested in schools and cows nested in herds. In some sense the schools and herds are spatial units in that schools generally take children from their locality and a herd is based on a particular farm. However we could also fit where the pupils live as another classification of the data which is more spatial. On the next slide we consider a dataset with more levels of geography.

UK house prices dataset An MMath student of mine (David Goodacre) studied a dataset of house prices in the UK. The data supplied by the Nationwide building society consists of average house prices in areas of the UK over a 12 year period ( ). The data is for 753 towns in the UK and there are 3 levels of geography (towns nested in counties nested in regions.) Note that if we had individual house sale information then we could have considered point process approaches but here we consider random effect modelling.

A 4-level VC model for the house price dataset The following model was fitted to the data where i indexes year, j indexes town, k indexes county and l indexes region. The response, y is the log of the average price. This model can be fitted using both frequentist and likelihood methods in packages that allow four levels in the model.

Links with other topics It is worth noting that this house price dataset is a repeated measures dataset as you considered yesterday. It also contains missing data as in any year in which there were less than 50 sales in a postal town will lead to a missing observation. However we here assume MAR conditional on the model we are fitting.

Estimates for house price dataset Below are given IGLS estimates for the model: ParameterEstimate (SE) β0β (0.067) β1β (0.002) β2β (0.0001) σ2fσ2f (0.021) σ2vσ2v (0.004) σ2uσ2u (0.003) σ2eσ2e (0.0002) Here we see that the model consists of parallel curves with both year and year 2 very significant. The variance is greatest between regions and between postal towns

Region Level Effects Here we see that the south east of the UK and London are the most expensive whilst Scotland the North and Wales are the cheapest.

County level effects After accounting for regions the pattern of county effects is more sporadic. We can however pick up 2 regions, Cheshire in the North West and Surrey in the South East that are more expensive than their neighbours.

Region level predictions Here we see a graph of region level predictions:

Further Modelling In his project Dave looked at random slopes models at the various levels of the model, so that we could pick out whether the increase in prices was different in different regions. He also looked at fitting models of a more spatial nature! See next lecture.

Why are spatial effects different? The main difference with spatial effects is that we have additional information about each (spatial) unit. For example if we observe the average house price of a town in Grampian, a town in Surrey and 2 towns in Berkshire then we know something of the spatial relation of these towns. We might expect the prices in the 2 towns in Berkshire to be similar and to be more similar to Surrey which is also in the South East than Grampian that is in Scotland. In our current models we will fit an effect for Berkshire which will capture some of the relationship between its 2 towns and a South East effect that will capture the link with the Surrey town.

Problems with the nested classification approach As we have seen the nested classification approach can capture much of the spatial variability however we have to decide on the geographic definitions of areas. We generally use easily available definitions e.g. county and region but there is no guarantee that these are the best classifications. We also have the problem of border effects, for example two towns on either side of a region border will not share either region or county effects but may have very similar prices. We will look at another approach here before studying more complex spatial approaches in the next lecture.

Including location in fixed effects It may be the case that there is a trend e.g. in house prices in the UK they generally fall as we move North and West. We could therefore add in two (fixed effect) predictors giving the N/S and E/W co-ordinates of each point. If the unit of observation is an area e.g. postal town we would generally use the co-ordinates of the centroid of the unit. If a linear relationship is not sensible then we could consider polynomial terms in each direction. For example (excluding random effects)

Distance effects Another possibility in terms of UK house prices is to consider the distance from London. This distance can be constructed from the co- ordinates of each point. The graph to the left gives the combined region and county effects and suggests a distance from London effect might be appropriate.

Distance and direction effects In some scenarios the direction as well as the distance from a particular point is important. This is not the case with house prices however in pollution data then direction can be very important where a dominant wind direction will suggest that particular directions away from the source will experience more pollution than others. We will next look at a dataset from Falkirk in Scotland that is analysed in Lawson, Browne & Vidal Rodeiro (2003)

Focused Clustering One research area in public health looks at the impact of sources of pollution on the health status of communities. The detection of patterns of health events associated with pollution sources is known as focused clustering. The statistical modelling involved usually relates to the point process nature of such data. Lawson, Browne & Vidal Rodeiro (2003) devote a whole chapter to Focused clustering and include some fairly complex models that can be considered in WinBUGS. Here we will look at some simpler models that can be fitted in MLwiN to a dataset from Falkirk in Scotland.

Respiratory cancer in Falkirk The figure to the right shows the census geographies of 26 regions found around a foundry (marked by *) in Falkirk, Scotland. It is thought conceivable that the foundry was an air pollution hazard in the early 1970s prior to the study. This could have an impact on the respiratory cancer experience of those living in the areas close to the foundry

Falkirk dataset The data consists of observed and expected counts of respiratory cancer cases in the time period We first compare the standardized mortality rates (SMRs) = observed/expected against the locations of the centroids of the 26 areas in Falkirk (relative to the foundry) to look for patterns.

Position of the sites Note in the graphs to the right that the 3 highest SMRs are close to the source both in the N/S and E/W directions. We can convert these locations to distance and direction measures.

Distance and direction Here we see that there appears to be a negative relationship between distance and SMR but no obvious pattern with regard to the direction relationship.

(Extra) Poisson modelling We have modelled the effects of deprivation, distance and direction in the following Poisson model: Note that we have used 1 st order MQL in MLwiN and allowed extra-Poisson variation. This shows there is less variation than a Poisson distribution so we will also try fitting SMR as a Normal distributed response.

Normal response model for SMR Here we see that none of the predictors has a significant effect which is probably because the dataset is so small. We do see however that the risk reduces as distance from the foundry increases and for areas with larger deprivation scores. (suggesting higher rates in less deprived areas but not significantly.)

Information for the practical In the practical we will return to using nested random effects to account for spatial effects. Our data is from the European community and consists of male deaths from malignant melanoma in 9 countries in the EU. The practical is a (modified) chapter from Browne (2003) and looks at MCMC methods for this dataset. It is also analysed using quasilikelihood methods in the MLwiN users guide and you are welcome to also try these methods.