Database management system Data analytics system:

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

U.S. Department of the Interior U.S. Geological Survey USGS/EROS Data Center Global Land Cover Project – Experiences and Research Interests GLC2000-JRC.
System Science Applications, Inc. EASy: An Environmental System for Mapping and Modeling Aquatic Systems.
Mapping of Fires Over North America Using Satellite Data Sean Raffuse CAPITA, Washington University September,
Poster Design & Printing by Genigraphics ® Leonard J. Trejo, Ph. D. Roman Rosipal, Ph. D Pacific Development and Technology, LLC Paul L.
Dimension reduction (1)
TRMM Tropical Rainfall Measurement (Mission). Why TRMM? n Tropical Rainfall Measuring Mission (TRMM) is a joint US-Japan study initiated in 1997 to study.
Identifying Soil Types using Soil moisture data CVEN 689 BY Uday Sant April 26, 2004.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
FLANN Fast Library for Approximate Nearest Neighbors
The use of Remote Sensing in Land Cover Mapping and Change Detection in Somalia Simon Mumuli Oduori, Ronald Vargas Rojas, Ambrose Oroda and Christian Omuto.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Data Mining Techniques
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Satellite Imagery and Remote Sensing NC Climate Fellows June 2012 DeeDee Whitaker SW Guilford High Earth/Environmental Science & Chemistry.
Climate modeling: where are we headed? Interactive biogeochemistry Large ensemble simulations (multi-century) Seasonal-interannual forecasts High resolution.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Week 11 Introduction A time series is an ordered sequence of observations. The ordering of the observations is usually through time, but may also be taken.
1 ANALYZING TIME SERIES OF SATELLITE IMAGERY USING TEMPORAL MAP ALGEBRA Jeremy Mennis 1 and Roland Viger 1,2 1 Dept. of Geography, University of Colorado.
ESIP Federation 2004 : L.B.Pham S. Berrick, L. Pham, G. Leptoukh, Z. Liu, H. Rui, S. Shen, W. Teng, T. Zhu NASA Goddard Earth Sciences (GES) Data & Information.
GP33A-06 / Fall AGU Meeting, San Francisco, December 2004 Magnetic signals generated by the ocean circulation and their variability. Manoj,
Objective Data  The outlined square marks the area of the study arranged in most cases in a coarse 24X24 grid.  Data from the NASA Langley Research Center.
Geographical Data and Measurement Geography, Data and Statistics.
ICDC7, Boulder September 2005 Estimation of atmospheric CO 2 from AIRS infrared satellite radiances in the ECMWF data assimilation system Richard.
What is geography? What is the location of the Atlantic Ocean relative to Africa?
GIS for Atmospheric Sciences and Hydrology By David R. Maidment University of Texas at Austin National Center for Atmospheric Research, 6 July 2005.
Task B7. Monitoring and Forecasting for Water Management and Drought/Flood Hazards Goals National scale characterization of snow water resources (Afghanistan’s.
Power Spectrum Analysis: Analysis of the power spectrum of the stream- function output from a numerical simulation of a 2- layer ocean model. Data Analysis.
1. Session Goals 2 __________________________________________ FAMINE EARLY WARNING SYSTEMS NETWORK Understand use of the terms climatology and variability.
ENVIRONMENTAL SCIENCE TEACHERS’ CONFERENCE ENVIRONMENTAL SCIENCE TEACHERS’ CONFERENCE, Borki Molo, Poland, 7-10 February 2007 Extreme Climatic and atmospheric.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
CLASSIFICATION OF ECG SIGNAL USING WAVELET ANALYSIS
CHAPTER 10 DATA EXPLORATION 10.1 Data Exploration Box 10.1 Data Visualization Descriptive Statistics Box 10.2 Descriptive Statistics Graphs.
Unsupervised Learning
The Basics of Geography Chapters 1 — 4
GEO-XIII Plenary, 8/10/2016 St Petersburg, Russian Federation
Data Mining – Intro.
5th International Conference on Earth Science & Climate Change
VEGA-GEOGLAM Web-based GIS for crop monitoring and decision support in agriculture Evgeniya Elkina, Russian Space Research Institute The GEO-XIII Plenary.
Chapter Notes 1-1 (Thinking Like A Geographer)
Overview of Downscaling
Meteorological Instrumentation and Observations
ASTER image – one of the fastest changing places in the U.S. Where??
Geography: The World Around Us
Introduction to Geospatial Technologies in Ag
Geographic Information System
Vegetation Enhancements (continued) Lost in Feature Space!
Meng Lu and Edzer Pebesma
5th Workshop on "SMART Cable Systems: Latest Developments and Designing the Wet Demonstrator Project" (Dubai, UAE, April 2016) Contribution of.
Jili Qu Department of Environmental and Architectural College
Shuyi S. Chen, Ben Barr, Milan Curcic and Brandon Kerns
Shuhua Li and Andrew W. Robertson
Climatology of coastal low level jets over the Bohai Sea and Yellow Sea and the relationship with regional atmospheric circulations Delei Li1, Hans von.
Analysis of NASA GPM Early 30-minute Run in Comparison to Walnut Gulch Experimental Watershed Rain Data Adolfo Herrera April Arizona Space Grant.
National Forest Inventory for Great Britain
Comparing NetCDF and a multidimensional array database on managing and querying large hydrologic datasets: a case study of SciDB– P5 Haicheng Liu.
Dimension reduction : PCA and Clustering
Lecture 2 Components of GIS
Introduction to Connectivity Analyses
Evaluating the Ability to Derive Estimates of Biodiversity from Remote Sensing Kaitlyn Baillargeon Scott Ollinger, Andrew Ouimette,
Visualization of Global Argo Metadata:
Unsupervised Learning
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Database management system Data analytics system: Modeling change from large-scale high-dimensional spatio-temporal array data Meng Lu and Edzer Pebesma Institute for Geoinformatics, University of Muenster, Germany Contact: Meng Lu meng.lu@uni-muenster.de Introduction The massive data that comes from earth observation satellites and other sensors provide significant information for modeling global change. At the same time, high dimensionality and large size of the data has brought challenges in data acquisition, management, effective querying and processing. In addition, the output of earth system modeling tends to be data intensive and needs methodologies for storage, validation, analysis and visualization (e.g. as maps). An important proportion of earth system observations and simulated data can be represented as multi-dimensional array data, which has received increasing attention in big data management and spatio-temporal analysis. Array based data management and analysis brings opportunities in modeling change with large-scale high-dimensional data. Examples of Array Data 1-D: Time series 2-D: Satellite images 3-D: Image time series 3-D: Sediment/nutrient in flow 4-D: Hyper-spectral remote sensing time series data x,y stand for spatial coordinates, t stands for time, z for height Time series analysis Goal Spatial-temporal change detection and quantification from multi-dimensional array data Time domain Frequency domain Trend analysis Pattern identification Seasonal variation Periodical pattern Forecasting Other cyclic variation Multi-dimensional array on t Change detection in NDVI time series Source: Verbesselt et al. (2009) Analysis in frequency domain: Spectrum density estimation Spatio-temporal analysis Research Questions How to model spatio-temporal change? • How to reduce dimensions spatially and temporally, or thematically? How to analyze array data? • How to extend existing GIS functions to work on multidimensional arrays? • How to combine data sets of different dimensionality or different resolutions? • Can map algebra be extended to an intelligible array algebra? • In what sense are space and time special, as dimensions, compared to other properties? Dimension Reduction --Practical PCA (Principle Component Analysis) Works Time Spatial rainfall variability 1) What is the spatial variability of the long- term rainfall? 1967 1968 1969 … 1 245 286 256 2 223 271 268 3 234 264 253 …. Gage ID 2-Dimensional array data for summer average rainfall amount, from year 1967 to 1999, 83 gages selected Map of rainfall gage network in a semi-arid watershed (ca. 150 square Km) (Walnut Gulch Experimental Watershed, Arizona, US) Temporal rainfall variability 2) What is the temporal variability of rainfall within the whole watershed ? reflectance PC 1 PC 2 PC 3 PC 4 loading loading loading loading Spatial distribution of PC1 loadings (top), PC2 loadings (middle) and PC3 loadings (bottom); gage as variable, time as record Tools: Database Management Systems PC 1st 2nd 3rd 4th Others Proportion of variance 80% 6% 3% 2.3% 8.7% Data analytics system year year year year Database management system + Data analytics system: The way to go? PC 1 PC 2 PC 3 PC 4 Variance explained by each PC, gage as variable, time as record Interpretation: Each of the circle represent the loadings of PC for each gage. The size of the circle indicates the magnitude of the PC loadings (how much the variable contributes to the variance). The blue and black indicated negative and positive of PC loadings. The first PC explains more than 80% of the total variance. The rather uniform distribution of PC1 loadings suggests that the spatial variability of climate across the watershed is small. PC2 contributes to 6% of total variance, but is more interesting. It forms a loading pattern that varies from west to east, which could be interpreted as weather variability. PC3 also shows spatial weather pattern, which is likely to show the variability from south to north. Noting that the PC3 explains less variance than PC2, the signal in west-east direction is dominating the spatial rainfall variability related to weather. A programming language for statistics and graphics Open source Difficult to scale R Main memory SciDB Array-based Database management and analytics system Open source Partially open source Scalable commercial version scalable AQL (Array Query Language) AFL (Array Functional Language) mixture of SQL syntax and trees of algebraic operators RaSQL (Raster SQL) based on SQL-92 and implemented array algebra Chunks maps onto a disk block. Tiles stored as BLOBs (binary large objects) in RDBMS R and Python interface No R or Python interface General features License Scalability Query Language Storage Interaction with .. predicted predicted predicted predicted Gage ID Gage ID Gage ID Gage ID Loadings of PC 1 to PC 4 (up) and predictions using different loadings from PC1 to PC4 (bottom). Time as variable, gage as record Interpretation: the loadings of PC1 to PC4 represent the variability of each year that contribute to the PC. Note that when using gages as variables (shown on right), the first PC represents the mean rainfall of the period. The data has been centered with the removal of the long-term climate information. The weather variability in each gage could be observed from each of the PCs. The bottom plot shows the predictions (scores) for each record, from which the variability of rainfall that each gage received could be observed.