UoL MSc Remote Sensing Non-Linear Inversion

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.

Vegetation Science Lecture 4 Non-Linear Inversion Lewis, Disney & Saich UCL.

G53MLE | Machine Learning | Dr Guoping Qiu

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

Simulated Annealing Methods Matthew Kelly April 12, 2011.

CS 678 –Boltzmann Machines1 Boltzmann Machine Relaxation net with visible and hidden units Learning algorithm Avoids local minima (and speeds up learning)

Face Recognition & Biometric Systems Support Vector Machines (part 2)

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.

1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 25 Instructor: Paul Beame.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Simulated Annealing G.Anuradha. What is it? Simulated Annealing is a stochastic optimization method that derives its name from the annealing process used.

Collaborative Filtering Matrix Factorization Approach

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

GEOGG121: Methods Inversion II: non-linear methods Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:

Genetic Algorithm.

Efficient Model Selection for Support Vector Machines

Artificial Neural Networks

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Soft Computing Lecture 18 Foundations of genetic algorithms (GA). Using of GA.

Genetic Algorithms Michael J. Watts

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.

GEOGG121: Methods Linear & non-linear models Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:

Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.

Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.

Edge Assembly Crossover

Probabilistic Algorithms Evolutionary Algorithms Simulated Annealing.

Optimization Problems

GEOGG121: Methods Inversion I: linear approaches Dr. Mathias (Mat) Disney UCL Geography Office: 113, Pearson Building Tel:

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.

Selection and Recombination Temi avanzati di Intelligenza Artificiale - Lecture 4 Prof. Vincenzo Cutello Department of Mathematics and Computer Science.

Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Chapter 12 Case Studies Part B. Control System Design.

Today’s Lecture Neural networks Training

Optimization Problems

Optimization via Search

Genetic Algorithms.

Fall 2004 Backpropagation CS478 - Machine Learning.

Artificial Neural Networks

GEOGG121: Methods Monte Carlo methods, revision

LECTURE 11: Advanced Discriminant Analysis

Learning in Neural Networks

Non-linear Minimization

Local Search Algorithms

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Artificial Intelligence (CS 370D)

Structure learning with deep autoencoders

Machine Learning Today: Reading: Maria Florina Balcan

CSC 578 Neural Networks and Deep Learning

Collaborative Filtering Matrix Factorization Approach

Optimization Problems

Neuro-Computing Lecture 4 Radial Basis Function Network

Perceptron as one Type of Linear Discriminants

GENETIC ALGORITHMS & MACHINE LEARNING

More on Search: A* and Optimization

EE368 Soft Computing Genetic Algorithms.

Boltzmann Machine (BM) (§6.4)

Artificial Intelligence 12. Two Layer ANNs

A Gentle introduction Richard P. Simpson

More on HW 2 (due Jan 26) Again, it must be in Python 2.7.

Neural Network Training

Local Search Algorithms

Simulated Annealing & Boltzmann Machines

Presentation transcript:

UoL MSc Remote Sensing Non-Linear Inversion Dr Lewis plewis@geog.ucl.ac.uk

Introduction Previously considered forward models Model reflectance/backscatter as fn of biophysical parameters Now consider model inversion Infer biophysical parameters from measurements of reflectance/backscatter

Linear Model Inversion Dealt with in previous lecture Define RMSE Minimise wrt model parameters Solve for minimum MSE is quadratic function Single (unconstrained) minimum

P1 RMSE P0

P1 RMSE P0

Issues Parameter transformation and bounding Weighting of the error function Using additional information Scaling

Parameter transformation and bounding Issue of variable sensitivity E.g. saturation of LAI effects Reduce by transformation Approximately linearise parameters Need to consider ‘average’ effects

Weighting of the error function Different wavelengths/angles have different sensitivity to parameters Previously, weighted all equally Equivalent to assuming ‘noise’ equal for all observations

Weighting of the error function Can ‘target’ sensitivity E.g. to chlorophyll concentration Use derivative weighting (Privette 1994)

Using additional information Typically, for Vegetation, use canopy growth model See Moulin et al. (1998) Provides expectation of (e.g.) LAI Need: planting date Daily mean temperature Varietal information (?) Use in various ways Reduce parameter search space Expectations of coupling between parameters

Scaling Many parameters scale approximately linearly Many do not E.g. cover, albedo, fAPAR Many do not E.g. LAI Need to (at least) understand impact of scaling

Crop Mosaic LAI 1 LAI 4 LAI 0

Crop Mosaic 20% of LAI 0, 40% LAI 4, 40% LAI 1. ‘real’ total value of LAI: 0.2x0+0.4x4+0.4x1=2.0. visible: NIR

canopy reflectance over the pixel is 0.15 and 0.60 for the NIR. If assume the model above, this equates to an LAI of 1.4. ‘real’ answer LAI 2.0

Options for Numerical Inversion Iterative numerical techniques Quasi-Newton Powell Knowledge-based systems (KBS) Artificial Neural Networks (ANNs) Genetic Algorithms (GAs) Look-up Tables (LUTs)

Local and Global Minima Need starting point

Bracketing of a minimum How to go ‘downhill’? Bracketing of a minimum

How far to go ‘downhill’? Slow, but sure How far to go ‘downhill’? w=(b-a)/(c-a) z=(x-b)/(c-a) Choose: z+w=1-w For symmetry Choose: w=z/(1-w) To keep proportions the same w 1-w z=w-w2=1-2w 0= w2 -3w+1 Golden Mean Fraction =w=0.38197

Parabolic Interpolation Inverse parabolic interpolation More rapid

Brent’s method Require ‘fast’ but robust inversion Golden mean search Slow but sure Use in unfavourable areas Use Parabolic method when get close to minimum

Multi-dimensional minimisation Use 1D methods multiple times In which directions? Some methods for N-D problems Simplex (amoeba)

Downhill Simplex Simplex: Simplex operations: Find way to minimum Simplest N-D N+1 vertices Simplex operations: a reflection away from the high point a reflection and expansion away from the high point a contraction along one dimension from the high point a contraction along all dimensions towards the low point. Find way to minimum

Simplex

Direction Set (Powell's) Method Multiple 1-D minimsations Inefficient along axes

Powell

Direction Set (Powell's) Method Use conjugate directions Update primary & secondary directions Issues Axis covariance

Powell

Simulated Annealing Previous methods: Issue: Solution (?) Define start point Minimise in some direction(s) Test & proceed Issue: Can get trapped in local minima Solution (?) Need to restart from different point

Simulated Annealing

Simulated Annealing Annealing Quenching Thermodynamic phenomenon ‘slow cooling’ of metals or crystalisation of liquids Atoms ‘line up’ & form ‘pure cystal’ / Stronger (metals) Slow cooling allows time for atoms to redistribute as they lose energy (cool) Low energy state Quenching ‘fast cooling’ Polycrystaline state

Simulated Annealing Simulate ‘slow cooling’ Based on Boltzmann probability distribution: k – constant relating energy to temperature System in thermal equilibrium at temperature T has distribution of energy states E All (E) states possible, but some more likely than others Even at low T, small probability that system may be in higher energy state

Simulated Annealing Use analogy of energy to RMSE As decrease ‘temperature’, move to generally lower energy state Boltzmann gives distribution of E states So some probability of higher energy state i.e. ‘going uphill’ Probability of ‘uphill’ decreases as T decreases

Implementation System changes from E1 to E2 with probability exp[-(E2-E1)/kT] If(E2< E1), P>1 (threshold at 1) System will take this option If(E2> E1), P<1 Generate random number System may take this option Probability of doing so decreases with T

Simulated Annealing T P= exp[-(E2-E1)/kT] – rand() - OK P= exp[-(E2-E1)/kT] – rand() - X T

Simulated Annealing Rate of cooling very important Coupled with effects of k exp[-(E2-E1)/kT] So 2xk equivalent to state of T/2 Used in a range of optimisation problems Not much used in Remote Sensing

(Artificial) Neural networks (ANN) Another ‘Natural’ analogy Biological NNs good at solving complex problems Do so by ‘training’ system with ‘experience’

(Artificial) Neural networks (ANN) ANN architecture

(Artificial) Neural networks (ANN) ‘Neurons’ have 1 output but many inputs Output is weighted sum of inputs Threshold can be set Gives non-linear response

(Artificial) Neural networks (ANN) Training Initialise weights for all neurons Present input layer with e.g. spectral reflectance Calculate outputs Compare outputs with e.g. biophysical parameters Update weights to attempt a match Repeat until all examples presented

(Artificial) Neural networks (ANN) Use in this way for canopy model inversion Train other way around for forward model Also used for classification and spectral unmixing Again – train with examples ANN has ability to generalise from input examples Definition of architecture and training phases critical Can ‘over-train’ – too specific Similar to fitting polynomial with too high an order Many ‘types’ of ANN – feedback/forward

(Artificial) Neural networks (ANN) In essence, trained ANN is just a (essentially) (highly) non-linear response function Training (definition of e.g. inverse model) is performed as separate stage to application of inversion Can use complex models for training Many examples in remote sensing Issue: How to train for arbitrary set of viewing/illumination angles? – not solved problem

Genetic (or evolutionary) algorithms (GAs) Another ‘Natural’ analogy Phrase optimisation as ‘fitness for survival’ Description of state encoded through ‘string’ (equivalent to genetic pattern) Apply operations to ‘genes’ Cross-over, mutation, inversion

Genetic (or evolutionary) algorithms (GAs) E.g. of BRDF model inversion: Encode N-D vector representing current state of biophysical parameters as string Apply operations: E.g. mutation/mating with another string See if mutant is ‘fitter to survive’ (lower RMSE) If not, can discard (die)

Genetic (or evolutionary) algorithms (GAs) General operation: Populate set of chromosomes (strings) Repeat: Determine fitness of each Choose best set Evolve chosen set Using crossover, mutation or inversion Until a chromosome found of suitable fitness

Genetic (or evolutionary) algorithms (GAs) Differ from other optimisation methods Work on coding of parameters, not parameters themselves Search from population set, not single members (points) Use ‘payoff’ information (some objective function for selection) not derivatives or other auxilliary information Use probabilistic transition rules (as with simulated annealing) not deterministic rules

Genetic (or evolutionary) algorithms (GAs) Example operation: Define genetic representation of state Create initial population, set t=0 Compute average fitness of the set Assign each individual normalised fitness value Assign probability based on this Using this distribution, select N parents Pair parents at random Apply genetic operations to parent sets generate offspring Becomes population at t+1 7. Repeat until termination criterion satisfied

Genetic (or evolutionary) algorithms (GAs) Flexible and powerful method Can solve problems with many small, ill-defined minima May take huge number of iterations to solve Not applied to remote sensing model inversions

Knowledge-based systems (KBS) Seek to solve problem by incorporation of information external to the problem Only RS inversion e.g. Kimes et al (1990;1991) VEG model Integrates spectral libraries, information from literature, information from human experts etc Major problems: Encoding and using the information 1980s/90s ‘fad’?

LUT Inversion Sample parameter space Calculate RMSE for each sample point Define best fit as minimum RMSE parameters Or function of set of points fitting to a certain tolerance Essentially a sampled ‘exhaustive search’

LUT Inversion Issues: May require large sample set Not so if function is well-behaved for many optical EO inversions In some cases, may assume function is locally linear over large (linearised) parameter range Use linear interpolation Being developed UCL/Swansea for CHRIS-PROBA May limit search space based on some expectation E.g. some loose VI relationship or canopy growth model or land cover map Approach used for operational MODIS LAI/fAPAR algorithm (Myneni et al)

LUT Inversion Issues: As operating on stored LUT, can pre-calculate model outputs Don’t need to calculate model ‘on the fly’ as in e.g. simplex methods Can use complex models to populate LUT E.g. of Lewis, Saich & Disney using 3D scattering models (optical and microwave) of forest and crop Error in inversion may be slightly higher if (non-interpolated) sparse LUT But may still lie within desirable limits Method is simple to code and easy to understand essentially a sort operation on a table

Summary Range of options for non-linear inversion ‘traditional’ NL methods: Powell, AMOEBA Complex to code though library functions available Can easily converge to local minima Need to start at several points Calculate canopy reflectance ‘on the fly’ Need fast models, involving simplifications Not felt to be suitable for operationalisation

Summary Simulated Annealing ANNs Can deal with local minima Slow Need to define annealing schedule ANNs Train ANN from model (or measurements) ANN generalises as non-linear model Issues of variable input conditions (e.g. VZA) Can train with complex models Applied to a variety of EO problems

Summary GAs Novel approach, suitable for highly complex inversion problems Can be very slow Not suitable for operationalisation KBS Use range of information in inversion Kimes VEG model Maximises use of data Need to decide how to encode and use information

Summary LUT Simple method Sort Used more and more widely for optical model inversion Suitable for ‘well-behaved’ non-linear problems Can operationalise Can use arbitrarily complex models to populate LUT Issue of LUT size Can use additional information to limit search space Can use interpolation for sparse LUT for ‘high information content’ inversion