GEOGG121: Methods Linear & non-linear models Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel: 7670 0592

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Lumped Parameter Modelling UoL MSc Remote Sensing Dr Lewis
Vegetation Science Lecture 4 Non-Linear Inversion Lewis, Disney & Saich UCL.
Optimization : The min and max of a function
Mike Barnsley & Tristan Quaife, UWS. GLC2000, Ispra, March Estimating Land Surface Biophysical Properties using SPOT-4 VGT Mike Barnsley and Tristan.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)
EARS1160 – Numerical Methods notes by G. Houseman
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
x – independent variable (input)
Motion Analysis (contd.) Slides are from RPI Registration Class.
CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.
Curve-Fitting Regression
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
Advanced Topics in Optimization
Classification and Prediction: Regression Analysis
Radial Basis Function Networks
Collaborative Filtering Matrix Factorization Approach
Neural Networks Lecture 8: Two simple learning algorithms
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Geometry Optimisation Modelling OH + C 2 H 4 *CH 2 -CH 2 -OH CH 3 -CH 2 -O* 3D PES.
GEOGG121: Methods Inversion II: non-linear methods Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:
Genetic Algorithm.
Efficient Model Selection for Support Vector Machines
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Mathematical Models & Optimization?
BIOPHYS: A Physically-based Algorithm for Inferring Continuous Fields of Vegetative Biophysical and Structural Parameters Forrest Hall 1, Fred Huemmrich.
LAI/ fAPAR. definitions: see: MOD15 Running et al. LAI - one-sided leaf area per unit area of ground - dimensionless fAPAR - fraction of PAR (SW radiation.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
GEOGG121: Methods Inversion I: linear approaches Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:
An Introduction to Optimal Estimation Theory Chris O´Dell AT652 Fall 2013.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Chapter 2-OPTIMIZATION
EEE502 Pattern Recognition
Objectives The Li-Sparse reciprocal kernel is based on the geometric optical modeling approach developed by Li and Strahler, in which the angular reflectance.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Chapter 6 Neural Network.
Geogg124: Data assimilation P. Lewis. What is Data Assimilation? Optimal merging of models and data Models Expression of current understanding about process.
1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Today’s Lecture Neural networks Training
GEOG2021 Environmental Remote Sensing
Chapter 7. Classification and Prediction
Deep Feedforward Networks
GEOGG121: Methods Monte Carlo methods, revision
LECTURE 11: Advanced Discriminant Analysis
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Collaborative Filtering Matrix Factorization Approach
Lumped Parameter Modelling
Boltzmann Machine (BM) (§6.4)
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
UoL MSc Remote Sensing Non-Linear Inversion
Lumped Parameter Modelling
Presentation transcript:

GEOGG121: Methods Linear & non-linear models Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:

Linear models and inversion –Least squares revisited, examples –Parameter estimation, uncertainty Non-linear models, inversion –The non-linear problem –Parameter estimation, uncertainty –Approaches Lecture outline

Linear models and inversion –Linear modelling notes: Lewis, 2010; ch 2 of Press et al. (1992) Numerical Recipes in C ( Non-linear models, inversion –Gradient descent: NR in C, 10,7 eg BFGS; XXX –Conjugate gradient etc.: NR in C, ch. 10; XXX –Liang, S. (2004) Quantitative Remote Sensing of the Land Surface, ch. 8 (section 8.2). Reading

Linear Models For some set of independent variables x = {x 0, x 1, x 2, …, x n } have a model of a dependent variable y which can be expressed as a linear combination of the independent variables.

Linear Models?

Linear Mixture Modelling Spectral mixture modelling: –Proportionate mixture of (n) end-member spectra –First-order model: no interactions between components

Linear Mixture Modelling r = {r , r , … r m, 1.0} –Measured reflectance spectrum (m wavelengths) nx(m+1) matrix:

Linear Mixture Modelling n=(m+1) – square matrix Eg n=2 (wavebands), m=2 (end-members)

Reflectance Band 1 Reflectance Band 2 11 22 33 r

Linear Mixture Modelling as described, is not robust to error in measurement or end-member spectra; Proportions must be constrained to lie in the interval (0,1) –- effectively a convex hull constraint; m+1 end-member spectra can be considered; needs prior definition of end-member spectra; cannot directly take into account any variation in component reflectances –e.g. due to topographic effects

Linear Mixture Modelling in the presence of Noise Define residual vector minimise the sum of the squares of the error e, i.e. Method of Least Squares (MLS)

Error Minimisation Set (partial) derivatives to zero

Error Minimisation Can write as: Solve for P by matrix inversion

e.g. Linear Regression

RMSE

y x xx1x1 x2x2

Weight of Determination (1/w) Calculate uncertainty at y(x)

P0 P1 RMSE

P0 P1 RMSE

Issues Parameter transformation and bounding Weighting of the error function Using additional information Scaling

Parameter transformation and bounding Issue of variable sensitivity –E.g. saturation of LAI effects –Reduce by transformation Approximately linearise parameters Need to consider ‘average’ effects

Weighting of the error function Different wavelengths/angles have different sensitivity to parameters Previously, weighted all equally –Equivalent to assuming ‘noise’ equal for all observations

Weighting of the error function Can ‘target’ sensitivity –E.g. to chlorophyll concentration –Use derivative weighting (Privette 1994)

Using additional information Typically, for Vegetation, use canopy growth model –See Moulin et al. (1998) Provides expectation of (e.g.) LAI –Need: planting date Daily mean temperature Varietal information (?) Use in various ways –Reduce parameter search space –Expectations of coupling between parameters

Scaling Many parameters scale approximately linearly –E.g. cover, albedo, fAPAR Many do not –E.g. LAI Need to (at least) understand impact of scaling

Crop Mosaic LAI 1LAI 4LAI 0

Crop Mosaic 20% of LAI 0, 40% LAI 4, 40% LAI 1. ‘real’ total value of LAI: –0.2x0+0.4x4+0.4x1=2.0. LAI1LAI1 LAI4LAI4 LAI0LAI0 visible: NIR

canopy reflectance over the pixel is 0.15 and 0.60 for the NIR. If assume the model above, this equates to an LAI of 1.4. ‘real’ answer LAI 2.0

Linear Kernel-driven Modelling of Canopy Reflectance Semi-empirical models to deal with BRDF effects –Originally due to Roujean et al (1992) –Also Wanner et al (1995) –Practical use in MODIS products BRDF effects from wide FOV sensors –MODIS, AVHRR, VEGETATION, MERIS

Satellite, Day 1 Satellite, Day 2 X

AVHRR NDVI over Hapex-Sahel, 1992

Linear BRDF Model of form: Model parameters: Isotropic Volumetric Geometric-Optics

Linear BRDF Model of form: Model Kernels: Volumetric Geometric-Optics

Volumetric Scattering Develop from RT theory –Spherical LAD –Lambertian soil –Leaf reflectance = transmittance –First order scattering Multiple scattering assumed isotropic

Volumetric Scattering If LAI small:

Volumetric Scattering Write as: RossThin kernel Similar approach for RossThick

Geometric Optics Consider shadowing/protrusion from spheroid on stick (Li-Strahler 1985)

Geometric Optics Assume ground and crown brightness equal Fix ‘shape’ parameters Linearised model –LiSparse –LiDense

Kernels Retro reflection (‘hot spot’) Volumetric (RossThick) and Geometric (LiSparse) kernels for viewing angle of 45 degrees

Kernel Models Consider proportionate (  ) mixture of two scattering effects

Using Linear BRDF Models for angular normalisation Account for BRDF variation Absolutely vital for compositing samples over time (w. different view/sun angles) BUT BRDF is source of info. too! MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)

MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)

BRDF Normalisation Fit observations to model Output predicted reflectance at standardised angles –E.g. nadir reflectance, nadir illumination Typically not stable –E.g. nadir reflectance, SZA at local mean And uncertainty via

Linear BRDF Models to track change Examine change due to burn (MODIS) FROM: days of Terra (blue) and Aqua (red) sampling over point in Australia, w. sza (T: orange; A: cyan). Time series of NIR samples from above sampling

MODIS Channel 5 Observation DOY 275

MODIS Channel 5 Observation DOY 277

Detect Change Need to model BRDF effects Define measure of dis-association

MODIS Channel 5 Prediction DOY 277

MODIS Channel 5 Discrepency DOY 277

MODIS Channel 5 Observation DOY 275

MODIS Channel 5 Prediction DOY 277

MODIS Channel 5 Observation DOY 277

So BRDF source of info, not JUST noise! Use model expectation of angular reflectance behaviour to identify subtle changes 54 Dr. Lisa Maria Rebelo, IWMI, CGIAR.

Detect Change Burns are: –negative change in Channel 5 –Of ‘long’ (week’) duration Other changes picked up –E.g. clouds, cloud shadow –Shorter duration –or positive change (in all channels) –or negative change in all channels

Day of burn Roy et al. (2005) Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data, RSE 97,

Non-linear inversion If we cannot phrase our problem in linear form, then we have non-linear inversion problem Key tool for many applications –Resource allocation –Systems control –(Non-linear) Model parameter estimation AKA curve or function fitting: i.e. obtain parameter values that provide “best fit” (in some sense) to a set of observations And estimate of parameter uncertainty, model fit 57

Options for Numerical Inversion Same principle as for linear –i.e. find minimum of some cost func. expressing difference between model and obs –We want some set of parameters (x*) which minimise cost function f(x) i.e. for some (small) tolerance δ > 0 so for all x –Where f(x*) ≤ f(x). So for some region around x*, all values of f(x) are higher (a local minima – may be many) –If region encompasses full range of x, then we have the global minima

Numerical Inversion Iterative numerical techniques –Can we differentiate model (e.g. adjoint)? YES: Quasi-Newton (eg BFGS etc) NO: Powell, Simplex, Simulated annealing etc. Knowledge-based systems (KBS) Artificial Neural Networks (ANNs) Genetic Algorithms (GAs) Look-up Tables (LUTs) Errico (1997) What is an adjoint model?, BAMS, 78,

Local and Global Minima (general: optima) Need starting point

How to go ‘downhill’? Bracketing of a minimum

How far to go ‘downhill’? Golden Mean Fraction =w= z=(x-b)/(c-a) w=(b-a)/(c-a) w 1-w Choose: z+w=1-w For symmetry Choose: w=z/(1-w) To keep proportions the same z=w-w 2 =1-2w 0= w 2 -3w+1 Slow, but sure

Parabolic Interpolation Inverse parabolic interpolation More rapid

Brent’s method Require ‘fast’ but robust inversion Golden mean search –Slow but sure Use in unfavourable areas Use Parabolic method –when get close to minimum

Multi-dimensional minimisation Use 1D methods multiple times –In which directions? Some methods for N-D problems –Simplex (amoeba)

Downhill Simplex Simplex: –Simplest N-D N+1 vertices Simplex operations: –a reflection away from the high point –a reflection and expansion away from the high point –a contraction along one dimension from the high point –a contraction along all dimensions towards the low point. Find way to minimum

Simplex

Direction Set (Powell's) Method Multiple 1-D minimsations –Inefficient along axes

Powell

Direction Set (Powell's) Method Use conjugate directions –Update primary & secondary directions Issues –Axis covariance

Powell

Simulated Annealing Previous methods: –Define start point –Minimise in some direction(s) –Test & proceed Issue: –Can get trapped in local minima Solution (?) –Need to restart from different point

Simulated Annealing

Annealing –Thermodynamic phenomenon –‘slow cooling’ of metals or crystalisation of liquids –Atoms ‘line up’ & form ‘pure cystal’ / Stronger (metals) –Slow cooling allows time for atoms to redistribute as they lose energy (cool) –Low energy state Quenching –‘fast cooling’ –Polycrystaline state

Simulated Annealing Simulate ‘slow cooling’ Based on Boltzmann probability distribution: k – constant relating energy to temperature System in thermal equilibrium at temperature T has distribution of energy states E All (E) states possible, but some more likely than others Even at low T, small probability that system may be in higher energy state

Simulated Annealing Use analogy of energy to RMSE As decrease ‘temperature’, move to generally lower energy state Boltzmann gives distribution of E states –So some probability of higher energy state i.e. ‘going uphill’ –Probability of ‘uphill’ decreases as T decreases

Implementation System changes from E 1 to E 2 with probability exp[-(E 2 - E 1 )/kT] –If(E 2 1 (threshold at 1) System will take this option –If(E 2 > E 1 ), P<1 Generate random number System may take this option Probability of doing so decreases with T

Simulated Annealing T P= exp[-(E 2 -E 1 )/kT] – rand() - OK P= exp[-(E 2 -E 1 )/kT] – rand() - X

Simulated Annealing Rate of cooling very important Coupled with effects of k –exp[-(E 2 -E 1 )/kT] –So 2xk equivalent to state of T/2 Used in a range of optimisation problems Not much used in Remote Sensing

(Artificial) Neural networks (ANN) Another ‘Natural’ analogy –Biological NNs good at solving complex problems –Do so by ‘training’ system with ‘experience’

(Artificial) Neural networks (ANN) ANN architecture

(Artificial) Neural networks (ANN) ‘Neurons’ –have 1 output but many inputs –Output is weighted sum of inputs –Threshold can be set Gives non-linear response

(Artificial) Neural networks (ANN) Training –Initialise weights for all neurons –Present input layer with e.g. spectral reflectance –Calculate outputs –Compare outputs with e.g. biophysical parameters –Update weights to attempt a match –Repeat until all examples presented

(Artificial) Neural networks (ANN) Use in this way for canopy model inversion Train other way around for forward model Also used for classification and spectral unmixing –Again – train with examples ANN has ability to generalise from input examples Definition of architecture and training phases critical –Can ‘over-train’ – too specific –Similar to fitting polynomial with too high an order Many ‘types’ of ANN – feedback/forward

(Artificial) Neural networks (ANN) In essence, trained ANN is just a (essentially) (highly) non- linear response function Training (definition of e.g. inverse model) is performed as separate stage to application of inversion –Can use complex models for training Many examples in remote sensing Issue: –How to train for arbitrary set of viewing/illumination angles? – not solved problem

Genetic (or evolutionary) algorithms (GAs) Another ‘Natural’ analogy Phrase optimisation as ‘fitness for survival’ Description of state encoded through ‘string’ (equivalent to genetic pattern) Apply operations to ‘genes’ –Cross-over, mutation, inversion

Genetic (or evolutionary) algorithms (GAs) E.g. of BRDF model inversion: Encode N-D vector representing current state of biophysical parameters as string Apply operations: –E.g. mutation/mating with another string –See if mutant is ‘fitter to survive’ (lower RMSE) –If not, can discard (die)

Genetic (or evolutionary) algorithms (GAs) General operation: –Populate set of chromosomes (strings) –Repeat: Determine fitness of each Choose best set Evolve chosen set –Using crossover, mutation or inversion –Until a chromosome found of suitable fitness

Genetic (or evolutionary) algorithms (GAs) Differ from other optimisation methods –Work on coding of parameters, not parameters themselves –Search from population set, not single members (points) –Use ‘payoff’ information (some objective function for selection) not derivatives or other auxilliary information –Use probabilistic transition rules (as with simulated annealing) not deterministic rules

Genetic (or evolutionary) algorithms (GAs) Example operation: 1.Define genetic representation of state 2.Create initial population, set t=0 3.Compute average fitness of the set -Assign each individual normalised fitness value -Assign probability based on this 4.Using this distribution, select N parents 5.Pair parents at random 6.Apply genetic operations to parent sets -generate offspring -Becomes population at t+1 7. Repeat until termination criterion satisfied

Genetic (or evolutionary) algorithms (GAs) –Flexible and powerful method –Can solve problems with many small, ill-defined minima –May take huge number of iterations to solve –Not applied to remote sensing model inversions

Knowledge-based systems (KBS) Seek to solve problem by incorporation of information external to the problem Only RS inversion e.g. Kimes et al (1990;1991) –VEG model Integrates spectral libraries, information from literature, information from human experts etc Major problems: –Encoding and using the information –1980s/90s ‘fad’?

LUT Inversion Sample parameter space Calculate RMSE for each sample point Define best fit as minimum RMSE parameters –Or function of set of points fitting to a certain tolerance Essentially a sampled ‘exhaustive search’

LUT Inversion Issues: –May require large sample set –Not so if function is well-behaved for many optical EO inversions In some cases, may assume function is locally linear over large (linearised) parameter range Use linear interpolation –Being developed UCL/Swansea for CHRIS-PROBA –May limit search space based on some expectation E.g. some loose VI relationship or canopy growth model or land cover map Approach used for operational MODIS LAI/fAPAR algorithm (Myneni et al)

LUT Inversion Issues: –As operating on stored LUT, can pre-calculate model outputs Don’t need to calculate model ‘on the fly’ as in e.g. simplex methods Can use complex models to populate LUT –E.g. of Lewis, Saich & Disney using 3D scattering models (optical and microwave) of forest and crop –Error in inversion may be slightly higher if (non-interpolated) sparse LUT But may still lie within desirable limits –Method is simple to code and easy to understand essentially a sort operation on a table

Summary Range of options for non-linear inversion ‘traditional’ NL methods: –Powell, AMOEBA Complex to code –though library functions available Can easily converge to local minima –Need to start at several points Calculate canopy reflectance ‘on the fly’ –Need fast models, involving simplifications Not felt to be suitable for operationalisation

Summary Simulated Annealing –Can deal with local minima –Slow –Need to define annealing schedule ANNs –Train ANN from model (or measurements) –ANN generalises as non-linear model –Issues of variable input conditions (e.g. VZA) –Can train with complex models –Applied to a variety of EO problems

Summary GAs –Novel approach, suitable for highly complex inversion problems –Can be very slow –Not suitable for operationalisation KBS –Use range of information in inversion –Kimes VEG model –Maximises use of data –Need to decide how to encode and use information

Summary LUT –Simple method Sort –Used more and more widely for optical model inversion Suitable for ‘well-behaved’ non-linear problems –Can operationalise –Can use arbitrarily complex models to populate LUT –Issue of LUT size Can use additional information to limit search space Can use interpolation for sparse LUT for ‘high information content’ inversion