I’ve found the data; it’s free and open access. Now what? Gilberto Câmara National Institute for Space Research (INPE) Brazil
Geospatial data catalogue Source: [Bai and Di, 2011]
The hard-wired map metaphor Cantino planisphere (1502)
Map metaphors live in GIS Geospatial Database Desktop GIS Web service
Birds do it… bees do it… even educated fleas do it… Let’s do it…
Distribution Model Algorithm Distribution map Temperature Precipitation Environmental data Ecological niche modelling
Species info Species info Precipitation Soil Temperature Environmental data openModeller Bioclim Neural Networks GARP Specimens Modelling algorithms openModeller
Natural disasters
Risk Analyses Analysis
On-line data feed ModelsSatellite/RadarDCP Rain total Fixed time and irregular – alert Point data One file per DCP Grid 4km Total rain 1h Total rain 24h Current (mm/h) Binary file ETA 40, 20, 5 Km Ensemble 40 Km Total rain 72h 72 files ASCII grid file
Natural Disasters Monitoring and Alert System
Até 10% % 20 – 30% 30 – 40% 40 – 50% 50 – 60% 60 – 70% 70 – 80% 80 – 90% 90 – 100% Amazonia ( km2 = size of Europe) Deforestation in Amazonia
Daily warnings of newly deforested large areas Real-time Deforestation Monitoring
Tb of data lines of code 150 man/years of software dev 200 man/years of interpreters How much it takes to survey Amazonia?
Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte?
Data Access Hitting a Wall Current science practice based on data download How do you download a petabyte? You don’t! Move the software to the archive
Virtual Observatory 17 “If data is online, the internet is the world’s best telescope” (Jim Gray)
How many clouds do we need?
19 What happened here in the last 10 years? source: INPE sugarcane ->
Are biofuels replacing food production in Brazil?
3 Tb of data behind this!
How much processing should be in the cloud? Standard API? WPS?
23 Could this analysis be done in the cloud? source: INPE sugarcane ->
Data chain in Earth System Science fonte: NASA
source: USGS Getting to the Data Requires solving the spatial semantics problem Tentative solutions catalogues, metadata, SDIs, ontologies, web services, semantic reference systems, linked open-data,....
Communicating location is easy Deforestation hotspots in Amazonia
Weather source: WMO 11,000 land stations (3000 automated) 900 radiosondes, 3000 aircraft 6000 ships, 1300 buoys 5 polar, 6 geostationary satellites Communicating about data is feasible
Communicating concepts is hard Image source: WMO vulnerability? climate change? poverty?
degradation We’re bad at representing meaning deforestation? degradation? disturbance? Communicating concepts is hard
When did the Aral Sea reach the tipping point? Communicating change is very hard
Objects exist, events occur (mount Etna 2002 eruption)
Observations allow us to get the measure of external reality
WMO’s global observing system
WMO GRIB: simple and clean Code Parameter Units. 052 Relative humidity % 053 Humidity mixing ratio kg/kg 054 Precipitable water kg/m2 055 Vapour pressure Pa 056 Saturation deficit Pa 057 Evaporation kg/m2 058 Cloud Ice kg/m2 059 Precipitation rate kg/m2/s 060 Thunderstorm probability % 061 Total precipitation kg/m2 076 Cloud water kg/m2.
When did the large flood occur in Angra?
When did the large flood occur in Angra? When precipitation was > 10mm/hour for 5 hours Coverage set (hourly precipitation grid) Cover change set (precipitation > 10 mm/hour)
When did the large flood occur in Angra? CoverageSet p1 (“Precipitation”). CoverChangeSet s1 = extract (p1 > 10, time1, time2) TimeSeries t1 = intersect (s1, geom (“Angra”)
How many walruses reached Baffin island?
How many walruses reached Baffin island? Those whose trajectories touched Baffin isld moving objects trajectories
How many walruses reached Baffin island? MovingObjectSet m1(“walruses”) Trajectories t1= extract(m1,time1,time2) Trajectories t2 = reach(t1, geom (“Baffin”))
When was this area converted from food to biofuel production? Coverage set (remote sensing images) Time Series (vegetation index)
When was this area converted from food to biofuel production? When the vegetation index peaked once a year. Coverage set (remote sensing images) Time Series (vegetation index)
When was this area converted from food to biofuel production? CoverageSet c1 (“Cerrado”). TimeSeries ts1 = extract (c1, “VegIndex”) for year = y1, yn do time1 = year* time2 = time TimeSeries t2 = onepeak(ts1, time1, time2) Time t1 = first (t2)
A new kind of geospatial analysis engine?
TerraLib: spatio-temporal database as a basis for innovation Visualization (TerraView) Spatio-temporal Database (TerraLib) Modelling (TerraME) Data Mining(GeoDMA) Statistics (aRT)
We need a new generation of GI appliances Connect data brokering, sources, analysis We need many clouds with remote processing Describe observations, not events Allow users to process the data Conclusions