Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay.

Slides:



Advertisements
Similar presentations
A Roadmap of Open Source components for GI Web Services and Clients A Paul R Cooper MAGIC.
Advertisements

The Next Generation Network Enabled Weather (NNEW) SWIM Application Asia/Pacific AMHS/SWIM Workshop Chaing Mai, Thailand March 5-7, 2012 Tom McParland,
A gentle introduction to R – how to load in data and produce summary statistics BRC MH Bioinformatics group.
Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.
A very short introduction to R Pia Wohland. R is… -A statistical software -Programming language -Free! -Very good in handling and manipulating data sets.
Tim Hodson Re-imagining the virtual library CASE STUDY One:
R for Macroecology Aarhus University, Spring 2011.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
Building an online tool for spatial joins using open source software Karsten Vennemann Seattle.
Raster Based GIS Analysis
Title of Presentation Author 1, Author 2, Author 3, Author 4 Abstract Introduction This is my abstract. This is my abstract. This is my abstract. This.
Programming Tools for Visualization of GIS Data Garret Suen Wednesday, March 5, 2003 CPSC –Advanced Algorithms in GIS and Scientific Applications.
Raster Data in ArcSDE 8.2 Why Put Images in a Database? What are Basic Raster Concepts? How Raster data stored in Database?
1 Saturday, November 22, 2008 ToolMap - ‘SION’ Method : a new framework for digital geological mapping L. Schreiber, P. Ornstein, M.Sartori, A. Kühni 1.
You have just been given an aerial photograph that is not registered to real world coordinates. How do you display the aerial with other data layers that.
EAS781 Practical Geophysics: The Tools and How to Use Them ArcGis Introduction ArcView ArcInfo ArcGis ?
GIS 200 Introduction to GIS Buildings. Poly Streams, Line Wells, Point Roads, Line Zoning,Poly MAP SHEETS.
Introduction to ArcView ArcView_module_2 May 12, 10:40 AM.
Common Page Design. Graphics and Tables Uses: Objects Numbers Concepts Words.
Geographical Information System GIS By: Yahia Dahash.
Geographic Information Systems (GIS) Data Marcel Fortin Geographic Information Systems (GIS) and Map Librarian Map and Data Library December 7, 2009.
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE) Increasing Accessibility and Interoperability of NASA Data Products with GIS Tools.
INDIANA FIRST RESPONSE MAP APPLICATION FOR FIRST RESPONDERS End-user Customization Presented By: Phil Worrall GIS Director, Pinnacle Mapping Technologies,
Joomla! Week 6 LBSC 690 Information Technology. Key Ideas Web sites are made up of rectangular pieces Called “modules” Two basic types of modules exist.
FalconView Open Source Inspired Features
Workshop on Census Cartography and Management, Bangkok, Thailand, 15–19 October 2007 Software Options for Operational GIS in Professional Environments.
Invention Convention 2012 Midway STEM. Getting Started… Get a folder or notebook to be used… To write down all of your ideas as you brainstorm. To draw.
JumpStart Write down your learning style
Introduction to ArcGIS. Goals Become familiar with ArcGIS ▫Locating and running the program ▫Introduction to the 3 ArcGIS interfaces ▫Experience with.
Introduction to ArcGIS for Environmental Scientists Module 1 – Data Visualization Chapter 1 – GIS Basics.
MADGIC is… MAPS and ATLASES DATA: NUMERIC and GEOSPATIAL (for use with special software) GOVERNMENT INFORMATION (parliamentary and other official reports,
CARLSON SOFTWARE CONFERENCE DANIEL V. SYPERSMA VICTOR GRAPHICS.
MADGIC is… MAPS and ATLASES DATA (NUMERIC and GEOSPATIAL) for use with special software GOVERNMENT INFORMATION (parliamentary and other official reports,
© All Rights Reserved What is animation? Animation is a visual technique that provides the illusion that something is moving by displaying.
Support the spread of “good practice” in generating, managing, analysing and communicating spatial information Introduction to GIS for the Purpose of Practising.
Julie Hannaford Director, Information Resources & Services OISE, University of Toronto Image credit to:
Support the spread of “good practice” in generating, managing, analysing and communicating spatial information Introduction to GIS for the Purpose of Practising.
The visualization of pollution data distribution in Taiwan, using Open Data and R BY CHENG CHANG LU & YU LAN CHIANG.
Exploring Spatial Data Infrastructure in an Open Source World Jacqueline Lowe UNC-Asheville National Environmental Modeling and Analysis Center Jacqueline.
“Paper” output Root Graphics Workshop 16/07/2010.
Raster data models Rasters can be different types of tesselations SquaresTrianglesHexagons Regular tesselations.
1 1 ISyE 6203 Consolidation Intro to GIS John H. Vande Vate Fall 2011.
Exploring GIS concepts. Introduction to ArcGIS I (for ArcView 8, ArcEditor 8, and ArcInfo 8) Copyright © 2000–2003 ESRI. All rights reserved. 2-2 Organizing.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Software Options for.
ATN GIS Support Introduction to ArcGIS.
A Quick Introduction to GIS
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
What is GIS? GIS is an integrated system used to view and manage information about geographic places, analyze spatial relationships, and model spatial.
INSPIRE and Linked Data : what are the complementarities? INSPIRE Conference – Istanbul Tutorial/discussion on linked data – june 24th Bénédicte Bucher.
1 Overview Finding and importing data sets –Searching for data –Importing data_.
Images.  Images include graphics, such as backgrounds, color schemes and navigation bars, and photos and other illustrations  An essential part of a.
Introduction to Geographic Information Systems (GIS) using ArcMap 9.0 Monday January 16th, 2006 Marcel Fortin GIS & Map Librarian University of Toronto.
How To configure GDAL. 2 What is GDAL? GDAL is a translator library for raster and vector geospatial data formats that is released under an X/MIT style.
Digital Data Preservation: a schema-driven model Student: Stacy Kowalczyk Co-Authors: Clare McInerney and Phil Mitchell Digital Data Preservation – the.
Brief Comparison of ArcView 3.x and ArcView 8.x Paige Baldassaro Geospatial Applications Designer Geospatial Extension Program Sunday, March 20, 2016
“It’s Not a Sprint – It’s a Marathon” GIS 101 For Local Government ICIT Midyear Conference June 17, 2010 Jeff Miller, Dubuque County Matt Boeck, Story.
Karnataka Forest Department Developed & Maintained by Plantations Civil and Building Structures Goods Services Status of Assets in Real time.
Image Processing Software Options Which types of software can be used to view and process remotely sensed imagery?
GIS Basic Training June 7, 2007 – ICIT Midyear Conference
supporting ROMe with Earth Observation
GEOGRAPHICAL INFORMATION SYSTEM
Introduction to ArcGIS Software
HydroDesktop: A Key Component of the CUAHSI/CZO HIS for Hydrologic Data Discovery, Visualization, and Analysis Daniel P. Ames, Ph.D. P.E. Idaho State University.
What's New in eCognition 9
Building an online tool for spatial joins using open source software
What's New in eCognition 9
ESRM 250/CFR 520 Autumn 2009 Phil Hurvitz
What's New in eCognition 9
Presentation transcript:

Introduction to Geospatial Analysis in R SURF – 24 April 2012 Daniel Marlay

Synopsis This month's talk is going to look at the geo-spatial capabilities of R. We'll look at how to import common geographical data formats into R and some of the free geographic data sources and map layers available. We'll then look at how to create maps in R using this data, and some of the ways to style it to display our data. We'll look at how R stores geographic data and how we can perform queries against that - for example identifying which points fall into a particular region. Finally, we'll take a brief look at modeling geospatial data and some of the issues to be aware of.

Introduction There are extensive geospatial capabilities in R – I’ve just started to scratch the surface This presentation will give a little bit of theory – Most of the content is a walk through of doing geospatial analysis in R I’ve picked data sets that are freely available – Trying this yourself is the best way to learn And maybe we’ll learn something about the way Australians vote…

R Geospatial Packages sp – provides a generic set of functions, classes and methods for handling spatial data rgdal – provides an R interface into the Geospatial Data Abstraction Library (GDAL) which is used to read and write geospatial data from R

Types of Geospatial Data Vector data – Points – Lines – Areas Bitmap – Often used for image data (e.g. aerial photos) – Needs to be registered to a coordinate system “Labelled” data – Has geographic information, but needs to be matched before it can be used

Setting up the R Environment ## Set working directory to where the data is. Update as required if running this yourself setwd("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\ SURF Geospatial Analysis Presentation"); ## Load the relevant libraries library(sp); # Basic R classes for handling geographic data library(rgdal); # Library for using the Geographic Data Abstraction Layer library(nlme); # Library that gives us generalised least squares

Obtain Census Data (1/6)

Obtain Census Data (2/6)

Obtain Census Data (3/6)

Obtain Census Data (4/6)

Obtain Census Data (5/6)

Obtain Census Data (6/6)

Read In Census Data (1/3) ## Read in and clean the census data (Note: a lot of this cleaning could be done more easily in Excel) EducationLevel <- read.csv("EducationData.csv",skip=6,na.strings=""); EducationLevel <- EducationLevel[c(-1,-2),c(-1,-27)]; # Remove leading and trailing blank columns and blank second row EducationLevel <- EducationLevel[-(97:100),]; # Remove trailing blank lines #### Create some useable column names EduDataCols <- paste(c(rep("Male",8),rep("Female",8),rep("Total",8)), rep(c("NotStated","InadDescr","Postgrad","GradDipCert","Bachelor","Diploma","Certificate","NA"),3), sep="."); colnames(EducationLevel) <- c("SED",EduDataCols);

Read In Census Data (2/3) #### Recode the data into character and numeric data to avoid weird errors from factors EducationLevel[,1] <- as.character(EducationLevel[,1]); for (col in EduDataCols) { EducationLevel[,col] <- as.numeric(as.character(EducationLevel[,col])); } #### Eyeball the data to make sure it is ok. summary(EducationLevel); head(EducationLevel,10); tail(EducationLevel,10);

Read In Census Data (3/3)

Obtain Electoral Data (1/4)

Obtain Electoral Data (2/4)

Obtain Electoral Data (3/4)

Obtain Electoral Data (4/4)

Read In Electoral Data (1/2) ## Read in the electoral data ElectionResults <- read.csv("2011NSWElectionResults.csv"); #### Eyeball data to make sure it is ok summary(ElectionResults); head(ElectionResults); tail(ElectionResults);

Read In Electoral Data (2/2)

Obtain Geography (1/4)

Obtain Geography (2/4)

Obtain Geography (3/4)

Obtain Geography (4/4)

Read In SED Geography (1/3) ## Read in the state electoral division boundaries (geography) and explore the SpatialPolygonsDataFrame class SED <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\ SURF Geospatial Analysis Presentation\\Geographies","SED06aAUST_region"); #### Have an initial look at the SED data set that we've just read in summary(SED); plot(SED);

Read In SED Geography (2/3)

Read In SED Geography (3/3)

Examining the SpatialPloygonsDataFrame (1/2) #### SED is a SpatialPolygonsDataFrame, an S4 object. We can have a look at how it is constructed mode(SED); slotNames(SED);

Examining the SpatialPloygonsDataFrame (2/2)

Simple Mapping of SpatialPolygonsDataFrames (1/2) #### Let's now look at some more mapping, we've seen that we can plot all of Australia plot(SED[SED$STATE_2006 == "1",]); # Plot NSW plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney - xlim and ylim from google maps ;-) plot(SED[SED$STATE_2006 == "1",],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); # Plot Sydney and put on some electoral district names text(coordinates(SED[SED$STATE_2006 == "1",]),labels=(SED[SED$STATE_2006 == "1",])$NAME_2006,cex=0.5);

Simple Mapping of SpatialPolygonsDataFrames (1/2)

Thematic Mapping (1/8) ## Thematic mapping SED.NSW <- SED[SED$STATE_2006 == "1",]; # subset of SED for convenience #### Create a ThemeData data set with a summary of the data we are interested in - proportion of people with a tertiary education ThemeData <- data.frame(SED = as.character(EducationLevel$SED), PropTertiaryEd = (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate) / (EducationLevel$Total.Postgrad + EducationLevel$Total.GradDipCert + EducationLevel$Total.Bachelor + EducationLevel$Total.Diploma + EducationLevel$Total.Certificate + EducationLevel$Total.NA), stringsAsFactors=FALSE); hist(ThemeData$PropTertiaryEd); # Histogram of the proportions to work out the appropriate cut points ThemeData$PropTertiaryEdFact <- cut(ThemeData$PropTertiaryEd,c(0,0.25,0.3,0.35,0.4,0.5,1.0)); # Create a factor for the proportion variable levels(ThemeData$PropTertiaryEdFact) <- c("25% or Less","25% to 30%","30% to 35%","35% to 40%","40% to 50%","More than 50%");

Thematic Mapping (2/8)

Thematic Mapping (3/8) #### Display a thematic map for all of NSW bands <- length(levels(ThemeData$PropTertiaryEdFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,T hemeData$SED)]]); # Note the use of match() to get the right rows legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01); #### Display a thematic map for Sydney plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,T hemeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)); legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01);

Thematic Mapping (4/8)

Thematic Mapping (5/8) #### Now we'll add the election results to our ThemeData data set rownames(ElectionResults) <- as.character(ElectionResults$District); # Adding rownames allows us to index by them when matching ThemeData$PropGreenVote <- ElectionResults[ThemeData$SED,"GRN"] / ElectionResults[ThemeData$SED,"Total"]; # Create a green vote proportion variable hist(ThemeData$PropGreenVote,breaks=20); # Have a look at the distribution ThemeData$PropGreenVoteFact <- cut(ThemeData$PropGreenVote,c(0,0.05,0.06,0.08,0.1,0.15,1.0)); # Create a factor levels(ThemeData$PropGreenVoteFact) <- c("Less than 5%","5% to 6%","6% to 8%","8% to 10%","10% to 15%","More than 15%");

Thematic Mapping (6/8)

Thematic Mapping (7/8) #### And do some thematic maps of the election results bands <- length(levels(ThemeData$PropGreenVoteFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,Th emeData$SED)]]) legend("bottomright", legend=levels(ThemeData$PropPropGreenVoteFactFact), fill=pal, title="Prop. Voted Green",inset=0.01) plot(SED.NSW,col=pal[ThemeData$PropGreenVoteFact[match(SED.NSW$NAME_2006,Th emeData$SED)]],xlim=c(150.6,151.4),ylim=c(-34.3,-33.4)) legend("bottomright", legend=levels(ThemeData$PropGreenVoteFact), fill=pal, title="Prop. Voted Green",inset=0.01)

Thematic Mapping (8/8)

Obtain Topographic Map Data (1/9)

Obtain Topographic Map Data (2/9)

Obtain Topographic Map Data (3/9)

Obtain Topographic Map Data (4/9)

Obtain Topographic Map Data (5/9)

Obtain Topographic Map Data (6/9)

Obtain Topographic Map Data (7/9)

Obtain Topographic Map Data (8/9)

Obtain Topographic Map Data (9/9)

Geographic Querying (1/4) ## Demonstration of geographic querying #### Read in the Localities layer from the TOPO 2.5M data set Locs <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\ SURF Geospatial Analysis Presentation\\Geographies\\localities","aus25lgd_p"); Mtns <- Locs[Locs$LOCALITY == "6",]; # Select only mountains plot(Mtns) #### Use the over function to find a list of mountains in SEDs with more than 10% green votes over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns); # Only gets one mountain per SED over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE); # Gets all mountains, but in a less useful format do.call("rbind",over(SED.NSW[!is.na(ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,The meData$SED)]) & ThemeData$PropGreenVote[match(SED.NSW$NAME_2006,ThemeData$SED)] > 0.10,], Mtns,returnList=TRUE)); # Gives us something a bit more useable

Geographic Querying (2/4)

Geographic Querying (3/4)

Geographic Querying (4/4)

Geospatial Modelling (1/6) ## Spatial GLS relating proportion who vote green to proportion with a higher education #### Add some spatial data to the ThemeData data set - using equidistant conic coordinates - lat-long give greater distance distortion SED.NSW.coords.eqdc <- coordinates(spTransform(SED.NSW,CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0= lon_0=151 +x_0=0 +y_0=0"))); rownames(SED.NSW.coords.eqdc) <- as.character(SED.NSW$NAME_2006); colnames(SED.NSW.coords.eqdc) <- c("x","y"); plot(spTransform(SED.NSW,CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0= lon_0=151 +x_0=0 +y_0=0"))); # shows how the conic projection looks lines(spTransform(gridlines(SED.NSW,easts=seq(140,160,by=2.5),norths=seq(-37.5,- 27.5,by=2.5)),CRS("+proj=eqdc +lat_1=-34 +lat_2=-33 +lat_0= lon_0=151 +x_0=0 +y_0=0"))); tail(ThemeData); ThemeData2 <- ThemeData[-(94:96),]; # Remove the last few rows of ThemeData - they don't have geographic locations ThemeData2 <- cbind(ThemeData2,SED.NSW.coords.eqdc[ThemeData2$SED,]); head(ThemeData2); summary(ThemeData2);

Geospatial Modelling (2/6)

Geospatial Modelling (3/6) #### Start with a basic linear model model1 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,na.action=na.omit); summary(model1); plot(model1); plot(Variogram(model1, form=~x+y)); # Note the correlation structure

Geospatial Modelling (4/6)

Geospatial Modelling (5/6) #### Now try some gls models with spatial correlation structures model2 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corExp(form=~x+y),na.action=na.omit); summary(model2); plot(model2); plot(Variogram(model2, form=~x+y)); model3 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corGaus(form=~x+y),na.action=na.omit); summary(model3); plot(model3); plot(Variogram(model3, form=~x+y)); model4 <- gls(PropGreenVote ~ PropTertiaryEd,data=ThemeData2,corr=corSpher(form=~x+y),na.action=na.omit); summary(model4); plot(model4); plot(Variogram(model4, form=~x+y)); #### Compare the models using AIC AIC(model1,model2,model3,model4); # Looks like adding the correlation structure gave no benefit

Geospatial Modelling (6/6)

Nice Looking Map (1/2) ## Finally, lets put together a good looking map. Roads <- readOGR("C:\\Documents and Settings\\marlada\\My Documents\\AQUA Internal\\Thought Leadership\\ SURF Geospatial Analysis Presentation\\Geographies\\roads","aus25vgd_l"); SED.NSW.coords <- coordinates(SED.NSW); sydrows 150.5) & (SED.NSW.coords[,1] -34.3) & (SED.NSW.coords[,2] < -33.4); SED.SYD <- SED.NSW[sydrows,]; sydgrid <- gridlines(SED.SYD,easts=seq(150.4,151.6,by=0.1),norths=seq(-34.3,-33.4,by=0.1)); sydgridat <- gridat(SED.SYD,easts=seq(150.4,151.6,by=0.1),norths=seq(-34.3,-33.4,by=0.1)); pdf("FinalMap.pdf"); bands <- length(levels(ThemeData$PropTertiaryEdFact)); pal <- heat.colors(bands); plot(SED.NSW,col=pal[ThemeData$PropTertiaryEdFact[match(SED.NSW$NAME_2006,ThemeData$SED)]],xlim =c(150.6,151.4),ylim=c(-34.5,-33.4)) lines(Roads,col="black",xlim=c(150.6,151.4),ylim=c(-34.5,-33.4)); legend("bottomright", legend=levels(ThemeData$PropTertiaryEdFact), fill=pal, title="Prop. with Tertiary Ed.",inset=0.01,bty="n",bg="white") title(c("Proportion of People with Tertiary Education","by Sydney State Electoral Divisions"),sub="Data from 2006 Census") dev.off();

Nice Looking Map (2/2)

Example Data Sources Census geographies – me/Geography?opendocument#from-banner=LN me/Geography?opendocument#from-banner=LN Census results (CDATA Online) – NSW State Electoral Results – Geoscience Australia – Topographic Maps –

QUESTIONS?