1 Peter Fox GIS for Science ERTH 4750 (98271) Week 9, Tuesday, March 27, 2012 Using uncertainties, analysis and use of discrete entities.

Slides:



Advertisements
Similar presentations
Statistical Techniques I
Advertisements

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
GIS for Environmental Science
Spatial Statistics II RESM 575 Spring 2010 Lecture 8.
Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction is clear.
Correlation and Autocorrelation
Maximum likelihood (ML) and likelihood ratio (LR) test
Motion Analysis Slides are from RPI Registration Class.
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Chapter 12 Simple Regression
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Geographic Information Systems
Statistics: Data Analysis and Presentation Fr Clinic II.
Chapter 4 Multiple Regression.
Maximum likelihood (ML) and likelihood ratio (LR) test
Statistics: Data Presentation & Analysis Fr Clinic I.
Inference about a Mean Part II
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Dr. David Liu Objectives  Understand what a GIS is  Understand how a GIS functions  Spatial data representation  GIS application.
Quantitative Genetics
Intro. To GIS Lecture 6 Spatial Analysis April 8th, 2013
Lecture II-2: Probability Review
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Lecture 3-2 Summarizing Relationships among variables ©
Inference for regression - Simple linear regression
Intermediate Statistical Analysis Professor K. Leppel.
Chapter 10 Review: Matrix Algebra
Equation --- An equation is a mathematical statement that asserts the equality of twomathematicalstatement expressions. An equation involves an unknown,
ESRM 250 & CFR 520: Introduction to GIS © Phil Hurvitz, KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation,
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Some matrix stuff.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
1 Peter Fox GIS for Science ERTH 4750 (98271) Week 8, Tuesday, March 20, 2012 Analysis and propagation of errors.
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Managerial Economics Demand Estimation & Forecasting.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 Peter Fox GIS for Science ERTH 4750 (98271) Week 10, Friday, April 6, 2012 Lab:
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
1 Spatial Data Models and Structure. 2 Part 1: Basic Geographic Concepts Real world -> Digital Environment –GIS data represent a simplified view of physical.
Raster Analysis. Learning Objectives Develop an understanding of the principles underlying lab 4 Introduce raster operations and functions Show how raster.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
1 Peter Fox GIS for Science ERTH 4750 (98271) Week 5, Tuesday, February 21, 2012 Introduction to geostatistics. Interpolation techniques continued (regression,
Sampling Design and Analysis MTH 494 Lecture-22 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Peter Fox GIS for Science ERTH 4750 (98271) Week 7, Friday, March 9, 2012 Lab: assignment (10%)
1 Peter Fox GIS for Science ERTH 4750 (98271) Week 4, Tuesday, February 14, 2012 Geocoding, Simple Interpolation, Sampling.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
EEE 431 Computational Methods in Electrodynamics
13.4 Product of Two Matrices
Raster Analysis Ming-Chun Lee.
PCB 3043L - General Ecology Data Analysis.
URBDP 422 Urban and Regional Geo-Spatial Analysis
Mathematical Modeling
Problems with Vector Overlay Analysis (esp. Polygon)
Review- vector analyses
Mathematical Modeling Making Predictions with Data
Mathematical Modeling Making Predictions with Data
Presentation transcript:

1 Peter Fox GIS for Science ERTH 4750 (98271) Week 9, Tuesday, March 27, 2012 Using uncertainties, analysis and use of discrete entities

Contents Using uncertainties –Regression Projects 2

Using uncertainties in regressions The regressions we've done so far using Excel did not take into account the uncertainties in the measurements. Uncertainties in the data impact both the regression parameters and their uncertainties. First, we discuss the general method of curve-fitting and then how we can include the data uncertainties. Then we show how it is done in Excel. 3

Define… N - the number of observed values K - the number of unknowns (coefficients to be estimated) O - the matrix containing the known y's (aka the observations); 1 column by N rows M - the matrix containing the known x's; K columns by N rows M T - transpose of M (swap rows and columns); N columns by K rows P - the matrix containing the unknowns; 1 column by K rows 4

Equations The linear equations can be written as O = M P. The goal is to estimate the unknowns in matrix P. If you have a set of equations where N = K (number of equations = number of unknowns), the solution is P = M -1 O –where the superscript ( -1 ) indicates the matrix inverse. 5

N>K When we have more data than unknowns (N>K) and the data are subject to errors, we use the least-squares procedure. It has the solution: P = (M T M) -1 M T O The uncertainties in the estimated unknowns are contained in the matrix (M T M) -1, which is the parameter covariance matrix. 6

In Excel… Excel can perform all the matrix operations needed so we can do the regression long- hand. The functions are: –TRANSPOSE(data range) - transpose of a matrix –MMULT(matrix1, matrix2) - multiply two matrices together –MINVERSE(data range) - get inverse of a matrix 7

Example We see that the calculations match the results of the Excel trend-line function. Regression with equal weighting of all data implied. 8

9

Including uncertainties If we want to include uncertainties, we define 2 new matrices: –C - the covariance matrix containing the uncertainties (variances) in the observations; N columns by N rows –W - the weight matrix which is the inverse of the covariance matrix; N columns by N rows The least-squares solution becomes: P = (M T W M) -1 M T W O 10

Adjusted e.g. The final data point has little influence on the solution when its uncertainty is increased by a factor of 3. Weighting (equal) also influences the standard errors in the estimated unknowns. 11

Unequal weighting 12

Discrete entities (GIS) Entity –Has attributes, possibly derived from other attributes –Has location Proximity / connectivity – topology New attributes U = f (A1, A2, …) The function f can be logical, (Boolean algebra, True or False), arithmetic, statistical, etc. 13

Boolean operations Think of each attribute for the entities as a set. The condition of the query ‘select all where A1 = red’ is same as the Boolean ‘A1 = red’ in which you would select True results. Or you could select ‘not (A1 = red)’ which gets everything except red. For multiple sets (attributes) we might want the intersection or union of them. This process can be used to re-classify your map, i.e., cut down on the number of discrete attributes. 14

Examples U = (A1 = red ) AND (A2 = blue) gives intersection of sets, both conditions are required to be true U = (A1 = red ) OR (A2 = blue) gives union of sets, either condition can be true U = (A1 = red ) XOR (A2 = blue) gives set where A1 = red and A2 <> blue plus the set where A1 <> red and A2 = blue The OR statement above would return A1 = red and A2 = any color plus the set where A1 = any color and A2 = blue 15

Soil example Set A: soil type = ‘Oregon loam’ Set B: pH >= 7.0 A and B = all soils of OL with pH >= 7.0 A or B = all soils of OL and all soil types with pH >= 7.0 A xor B = all OL with pH = 7.0 A not B = all OL with pH <

17

Statistical operations to determine similar regions Given a distribution of data points, we may want to collect them into a finite number of polygons where each polygon contains values within a specified range or with similar statistical distribution. For example, your company has hired 5 ‘bill collectors’ and their methods of collection range from thumb-breaking to persistent whining. ;-) 18

Statistical operations to determine similar regions You’d like to assign the collectors to different sales regions based on the history of compliance and you want to give each collector a similar area to cover. Since you don’t want your thumb-breaker (e.g., Rocky) to deal with a large number of people who normally pay their bills, you want to assign him or her to the region with a large number of deadbeats and a small number of payers. So you want the mean rate of non-compliance to be high but the variance to be low. 19

Or Or perhaps you are interested in bio-diversity. You want to divide your field area into a finite number of regions based on the diversity of the flora and see what regions support the largest variety. But you may want to exclude plants that have only a few representatives in a given sector. 20

Or Or given that you already have natural polygons, say counties for example, you may want to group the entities according to similar statistical distributions of the attribute of interest. For example, for radon levels you could group the counties by similar mean counts or by similar variance in them. There are many statistical operations you could use, depending on what your particular goal is.  GIS for science! 21

Buffering Buffering generally involves operations that depend only on distance or proximity between entities. However, we can derive other attributes and connectivity that depend on distance only. Such topology can be simple or complex functions of distance (population density) E.g. the errors in the red star sites and roads from last week 22

Connectivity Attributes relating to connectivity are generally in the database. For example, the time it takes to drive a particular segment of road should be an attribute. From such data, the time it takes to get from point A to point B can be calculated. 23

Using connectivity We can make inferences about connectivity from data not specifically related to connectivity. For example we can assume that travel time along a road path is given by the distance divided by some nominal speed, plus a delay for each traffic light along the way, and so on. We could also make the assumed speed depend on whether the road is rural or urban, inferred from land-use data. In addition, time of day could be factored in. Connectivity data and GIS are now used frequently for guidance of emergency vehicles. 24

Contouring Contours are lines of equal value of a surface field. They are easily calculated from the gridded data by finding where the contours intersect the sides of each grid element. 25

Topography 26

More elevations 27

New York elevation gridded data 28

For contours… Effectiveness of color versus lines? Plan view versus perspective? Colors? Important to browse your thematic map options before selecting one 29

Summary Topics for GIS (for Science) –Including uncertainties –Working with entity types to enhance your map For learning purposes remember: –Demonstrate proficiency in using geospatial applications and tools (commercial and open-source). –Present verbally relational analysis and interpretation of a variety of spatial data on maps. –Demonstrate skill in applying database concepts to build and manipulate a spatial database, SQL, spatial queries, and integration of graphic and tabular data. –Demonstrate intermediate knowledge of geospatial analysis methods and their applications. 30

Reading for this week None… aren’t you lucky! Watch out for next week though! 31

Next classes Note March 30 – open lab (no assignment, work on your projects, get help from Max), attendance will be taken Tuesday, April 2, Graphs, grouping, pie charts Friday, April 6, Lab: more statistics and maps (no assignment) 32