The Modelling Process Dr Andy Evans. This lecture The modelling process: Identify interesting patterns Build a model of elements you think interact and.

Slides:



Advertisements
Similar presentations
Cluster Analysis Purpose and process of clustering Profile analysis Selection of variables and sample Determining the # of clusters.
Advertisements

Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression.
Modeling and Simulation Monte carlo simulation 1 Arwa Ibrahim Ahmed Princess Nora University.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
CHAPTER 8: LINEAR REGRESSION
1 Procedural Analysis or structured approach. 2 Sometimes known as Analytic Induction Used more commonly in evaluation and policy studies. Uses a set.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Confidence Intervals for Proportions.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 4: Mathematical Tools for Econometrics Statistical Appendix (Chapter 3.1–3.2)
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
Determining the # Of PCs Remembering the process Some cautionary comments Statistical approaches Mathematical approaches “Nontrivial factors” approaches.
1 Simulation Modeling and Analysis Verification and Validation.
1 Validation and Verification of Simulation Models.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
The Calibration Process
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect5_1.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Principal Components An Introduction
Reliability, Validity, & Scaling
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 – Multiple comparisons, non-normality, outliers Marshall.
Gaussian process modelling
NASA Langley Research Center - 1Workshop on UQEE Prediction of Computational Quality for Aerospace Applications Michael J. Hemsch, James M. Luckring, Joseph.
Programming for Geographical Information Analysis: Advanced Skills Lecture 10: Modelling II: The Modelling Process Dr Andy Evans.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Hydrologic Modeling: Verification, Validation, Calibration, and Sensitivity Analysis Fritz R. Fiedler, P.E., Ph.D.
Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.
Understanding Statistics
Evidence Based Medicine
Simulation Prepared by Amani Salah AL-Saigaly Supervised by Dr. Sana’a Wafa Al-Sayegh University of Palestine.
Other Regression Models Andy Wang CIS Computer Systems Performance Analysis.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
The Modelling Process Dr Andy Evans. This lecture The modelling process: Identify interesting patterns Build a model of elements you think interact and.
Accuracy Assessment Having produced a map with classification is only 50% of the work, we need to quantify how good the map is. This step is called the.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Regression Analysis A statistical procedure used to find relations among a set of variables.
Validation Dr Andy Evans. Preparing to model Verification Calibration/Optimisation Validation Sensitivity testing and dealing with error.
Intro to Scientific Research Methods in Geography Chapter 2: Fundamental Research Concepts.
Discussion of time series and panel models
Propagation of Error Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.
Chapter 10 Verification and Validation of Simulation Models
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
M ONTE C ARLO SIMULATION Modeling and Simulation CS
Lecture 10: Correlation and Regression Model.
The Modelling Process Dr Andy Evans. This lecture The modelling process: Identify interesting patterns Build a model of elements you think interact and.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
ABM: Issues Dr Andy Evans. Structuring a model Models generally comprise: Objects. Environment. I/O code. Data reporting code. Some kind of time sequencing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics Presentation Ch En 475 Unit Operations.
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
An Introduction to Scientific Research Methods in Geography Chapter 2: Fundamental Research Concepts.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Implementing Dynamic Data Assimilation in the Social Sciences Andy Evans Centre for Spatial Analysis and Policy With: Jon Ward, Mathematics; Nick Malleson,
Stats Methods at IC Lecture 3: Regression.
Selecting the Best Measure for Your Study
Linear Regression.
Chapter 10 Verification and Validation of Simulation Models
Modelling Dr Andy Evans In this lecture we'll look at modelling.
Statistical Reasoning December 8, 2015 Chapter 6.2
More on Maxent Env. Variable importance:
Introduction to modelling
Presentation transcript:

The Modelling Process Dr Andy Evans

This lecture The modelling process: Identify interesting patterns Build a model of elements you think interact and the processes / decide on variables Verify model Optimise/Calibrate the model Validate the model/Visualisation Sensitivity testing Model exploration and prediction Prediction validation

Preparing to model Verification Calibration/Optimisation Validation Sensitivity testing and dealing with error

Preparing to model What questions do we want answering? Do we need something more open-ended? Literature review what do we know about fully? what do we know about in sufficient detail? what don't we know about (and does this matter?). What can be simplified, for example, by replacing them with a single number or an AI? Housing model: detail of mortgage rates’ variation with economy, vs. a time-series of data, vs. a single rate figure. It depends on what you want from the model.

Data review Outline the key elements of the system, and compare this with the data you need. What data do you need, what can you do without, and what can't you do without?

Data review Model initialisation Data to get the model replicating reality as it runs. Model calibration Data to adjust variables to replicate reality. Model validation Data to check the model matches reality. Model prediction More initialisation data.

Model design If the model is possible given the data, draw it out in detail. Where do you need detail. Where might you need detail later? Think particularly about the use of interfaces to ensure elements of the model are as loosely tied as possible. Start general and work to the specifics. If you get the generalities flexible and right, the model will have a solid foundation for later.

Model design Agent Step Person GoHome GoElsewhere Thug Fight Vehicle Refuel

Preparing to model Verification Calibration/Optimisation Validation Sensitivity testing and dealing with error

Verification Does your model represent the real system in a rigorous manner without logical inconsistencies that aren't dealt with? For simpler models attempts have been made to automate some of this, but social and environmental models are waaaay too complicated. Verification is therefore largely by checking rulesets with experts, testing with abstract environments, and thorough validation.

Verification Test on abstract environments. Adjust variables to test model elements one at a time and in small subsets. Do the patterns look reasonable? Does causality between variables seem reasonable?

Model runs Is the system stable over time (if expected)? Do you think the model will run to an equilibrium or fluctuate? Is that equilibrium realistic or not?

Calibration Our model will contain variables (“parameters”) we can’t be sure of. Additionally, our model may not match the world perfectly. We may, therefore, need to try lots of values to see which are best: calibration. However, there may be too many to try all of them. Either way, we need to compare our model results with reality: validation.

Preparing to model Verification Calibration/Optimisation Validation Sensitivity testing and dealing with error

Validation Can you quantitatively replicate known data? Important part of calibration and verification as well. Need to decide on what you are interested in looking at. Visual or “face” validation eg. Comparing two city forms. One-number statistic eg. Can you replicate average price? Spatial, temporal, or interaction match eg. Can you model city growth block-by-block?

Validation If we can’t get an exact prediction, what standard can we judge against? Randomisation of the elements of the prediction. eg. Can we do better at geographical prediction of urban areas than randomly throwing them at a map. Doesn’t seem fair as the model has a head start if initialised with real data. Business-as-usual If we can’t do better than no prediction, we’re not doing very well. But, this assumes no known growth, which the model may not.

Visual comparison

Total Absolute Error If we’re just predicting values. Just take values in one dataset from another, and sum the absolute differences.

Comparison stats: space and class Could compare number of geographical predictions that are right against chance randomly right: Kappa stat. Construct a confusion matrix / contingency table: for each area, what category is it in really, and in the prediction. Fraction of agreement = ( ) / ( ) = 0.6 Probability Predicted A = ( ) / ( ) = 0.5 Probability Real A = (10 + 5) / ( ) = 0.3 Probability of random agreement on A = 0.3 * 0.5 = 0.15 Predicted APredicted B Real A10 areas5 areas Real B15 areas20 areas

Comparison stats Equivalents for B: Probability Predicted B = (5 + 20) / ( ) = 0.5 Probability Real B = ( ) / ( ) = 0.7 Probability of random agreement on B = 0.5 * 0.7 = 0.35 Probability of not agreeing = = 0.65 Total probability of random agreement = = 0.5 Total probability of not random agreement = 1 – ( ) = 0.5 κ = fraction of agreement - probability of random agreement probability of not agreeing randomly = 0.1 / 0.50 = 0.2

Comparison stats Tricky to interpret κStrength of Agreement < 0None 0.0 — 0.20Slight 0.21 — 0.40Fair 0.41 — 0.60Moderate 0.61 — 0.80Substantial 0.81 — 1.00Almost perfect

Comparison stats The problem is that you are predicting in geographical space and time as well as categories. Which is a better prediction?

Comparison stats The solution is a fuzzy category statistic and/or multiscale examination of the differences (Costanza, 1989). Scan across the real and predicted map with a larger and larger window, recalculating the statistics at each scale. See which scale has the strongest correlation between them – this will be the best scale the model predicts at? The trouble is, scaling correlation statistics up will always increase correlation coefficients.

Correlation and scale Correlation coefficients tend to increase with the scale of aggregations. Robinson (1950) compared illiteracy in those defined as in ethnic minorities in the US census. Found high correlation in large geographical zones, less at state level, but none at individual level. Ethnic minorities lived in high illiteracy areas, but weren’t necessarily illiterate themselves. More generally, areas of effect overlap: Road accidents Dog walkers

Comparison stats So, we need to make a judgement – best possible prediction for the best possible resolution.

Comparison stats: Graph / SIM flows Make an origin-destination matrix for model and reality. Compare the two using some difference statistic. Only problem is all the zero origins/destinations, which tend to reduce the significance of the statistics, not least if they give an infinite percentage increase in flow. Knudsen and Fotheringham (1986) test a number of different statistics and suggest Standardised Root Mean Squared Error is the most robust.

Preparing to model Verification Calibration/Optimisation Validation Sensitivity testing and dealing with error

Errors Model errors Data errors: Errors in the real world Errors in the model Ideally we need to know if the model is a reasonable version of reality. We also need to know how it will respond to minor errors in the input data.

Sensitivity testing Tweak key variables in a minor way to see how the model responds. The model maybe ergodic, that is, insensitive to starting conditions after a long enough run. If the model does respond strongly is this how the real system might respond, or is it a model artefact? If it responds strongly what does this say about the potential errors that might creep into predictions if your initial data isn't perfectly accurate? Is error propagation a problem? Where is the homeostasis?

Prediction If the model is deterministic, one run will be much like another. If the model is stochastic (ie. includes some randomisation), you’ll need to run in multiple times. In addition, if you’re not sure about the inputs/parameters, you may need to vary them to cope with the uncertainty: Monte Carlo testing runs 1000’s of models with a variety of potential inputs, and generates probabilistic answers.

Analysis Models aren’t just about prediction. They can be about experimenting with ideas. They can be about testing ideas/logic of theories. They can be to hold ideas.