1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling.

1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling Lecture 7. Neural networks, search and optimization Coryn Bailer-Jones

2 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Nonlinear mapping with a transfer function

3 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Introducing interactions between inputs

4 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Simultaneous regression: multiple outputs

5 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Pictogram: multilayer perceptron bias nodes not shown

6 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Notation

7 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Training the network

8 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Gradient descent

9 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Back propagation: weight update equations

10 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Regularization (weight decay)

11 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization MLPs for classification: cross-entropy error function

12 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Cross-entropy for more than two classes

13 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Linear outputs

14 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Two hidden layers

15 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Network flexibility/complexity ● Flexibility is controlled by number of hidden nodes and number of hidden layers – in theory one hidden layer is sufficient: with sufficiently large J, network can approximate any continuous function to arbitrary accuracy (caveat generalization ability) – in practice using two means fewer weights ● Need sufficient training vectors to overdetermine solution – NK error functions – (I+1)J + (J+1)K weights – but in practice weights are not independent

16 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Originally developed for modelling neural function No longer useful for this. Also not a useful or relevant analogy

17 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Optimization algorithms ● With gradient information – gradient descent – add second derivative (Hessian): Newton, quasi-Newton, Levenberg- Marquardt – conjugate gradients (Hessian not explicitly calculated) ● pure gradient methods get stuck in local minima – random restart – committee/ensemble of models – momentum terms (non-gradient info.) ● Without gradient information – simulated annealing – genetic algorithms

18 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Local minima in optimization

19 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Genetic algorithms

Photometric filter design problem ● design an optimal multiband filter system to estimate several stellar atmospheric parameters from spectra – subject to constraints – multiple conflicting demands on filter system – manual design complex, and gives no idea of optimality ● cast as a mathematical optimization problem: – parametrize filter system – establish a figure-of-merit of filter system performance – maximise this as a function of the filter system parameters ● see Bailer-Jones (2004)

Evolutionary algorithms ● population-based methods overcome local optima and permit a more efficient search of the parameter space ● 1 individual in population = 1 candidate filter system ● Evolutionary Algorthms (EAs) use the principle of natural selection from biological evolution – Genetic Algorithms (GAs), Evolutionary Strategies (ESs), Evolutionary Programming (EPs) ● genetic operators – reproduction: recombination; mutation (exploration) – selection (exploitation) ● provides a stochastic (but not random) search ● population evolves towards optimum (or optima)

Heuristic filter design (HFD) model ● figure-of-merit / fitness function: – solving a full regression model to test every filter system too slow and unnecessary – construct a measure of ability of filter system to maximally “separate” stars with a range of APs represented by a (synthetic) grid – grid showing variance in T eff, logg, [Fe/H], A V – use a simple instrument model to simulate counts and errors in candidate filter systems ● fixed instrument parameters and number of filters ● evolve population and find fittest filter system

Typical spectra used to calculate fitness wavelength / Å flux density (area normalised) AVAV logg T eff [Fe/H] enhanced × 1.1

HFD model initialise population simulate counts (and errors) from each star in each filter system calculate fitness of each filter system select fitter filter systems (probability fitness) mutate filter system parameters a

Filter system representation Each filter system consists of I filters each with 3 parameters: c central wavelength w half width at half maximum t fractional integration time (of total available for all filters) Generalised Gaussian profile with  g  = 4 y = exp(− ln2 [(l-c)/w] g  )

Fitness: SNR distance SNR distance of star r from neighbour n: p i,n = photon counts in filter i for star n  i,n = expectation of error in p i,n photon counts (and errors) are area normalised, i.e.  i p i,n = 1 (generalization of forming colours)

Fitness: vector separation a c b  For each source, a, and each AP, j, find nearest neighbour (NN) which differ only in j (“isovars”), e.g. b and c Calculate angle,  , between vectors: Nearer to 90 ° => better separation (less degeneracy) Calculate magnitude of cross product: V a,b,c = d a,b d a,c sin a d a,b d a,c

Fitness: final measure Cross product: V a,b,c = d a,b d a,c sin a Introduce sensitivity to APs add weighting of APs to boost significance of weak APs (esp. [Fe/H] and logg) difference in AP j between a and c weight for AP j, i.e. that which differs between a and c  a  b,c f a,b,c Fitness = sum over all NNs for source and for all sources in grid

Genetic operators Selection Individuals from parent population (generation g) selected (with replacement) with probability proportional to fitness. Elitism used to guarantee selection of best few. => intermediate population (IP) Mutation Parameters of each individual {c i, w i, t i } in IP mutated with a finite probability: c i (g+1) = c i (g) + N(0,  c ) h i (g+1) = h i (g)[1 + N(0,  h )] => children (next generation) t i (g+1) = t i (g)[1 + N(0,  t )]

HFD application Limits of search domain: ● l limits from CCD/instrument QE ● 80 Å < h < 4000 Å ● no limits on t (just normalization)

BBP-5 system: fitness evolution maximum fitness minimum fitness mean fitness median fitness

BBP-5 system: filter system evolution Evolution of all filter system parameters (200*5 for each parameter type at each generation)

BBP-5 system red = filter transmission x fractional integration time blue = Instrument*CCD QE (scaled) ● broad filters (bright limit: fitness a sum photons collected) ● overlapping filters ● 4 effective filters (1 almost “turned off”)

12-filter MBP: fitness evolution maximum fitness minimum fitness mean fitness median fitness

12-filter MBP: filter system evolution Evolution of all filter system parameters (200*12 for each parameter type at each generation)

12-filter MBP: optimized systems transmission multiplied by fractional integration time

37 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Summary ● feedforward neural networks – multilayer perceptron – radial basis function network – flexible, nonlinear regression/classification models – error function: RSS, cross-entropy – regularization (weight decay) – backpropagation algorithm ● optimization/search – gradient methods: problem of local minima – stochastic methods: simulated annealing, genetic algorithms

1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling.

Similar presentations

Presentation on theme: "1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling.

Similar presentations

Presentation on theme: "1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling."— Presentation transcript:

Similar presentations

About project

Feedback