1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling.

Slides:

Advertisements

Similar presentations

Chapter 8 Geocomputation Part B:

Advertisements

EE-M /7: IS L7&8 1/24, v3.0 Lectures 7&8: Non-linear Classification and Regression using Layered Perceptrons Dr Martin Brown Room: E1k

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Tuomas Sandholm Carnegie Mellon University Computer Science Department

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Computer vision: models, learning and inference

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Machine Learning Neural Networks

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Chapter 5 NEURAL NETWORKS

Neural Networks Marco Loog.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Radial Basis Function (RBF) Networks

Radial Basis Function Networks

Collaborative Filtering Matrix Factorization Approach

Radial Basis Function Networks

Die Vermessung der Milchstraße: Hipparcos, Gaia, SIM Vorlesung von Ulrich Bastian ARI, Heidelberg Sommersemester 2004.

1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.

Biointelligence Laboratory, Seoul National University

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Classification Part 3: Artificial Neural Networks

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.

A Comparison of Nature Inspired Intelligent Optimization Methods in Aerial Spray Deposition Management Lei Wu Master’s Thesis Artificial Intelligence Center.

Multi-Layer Perceptrons Michael J. Watts

Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

Classification / Regression Neural Networks 2

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.

Exact and heuristics algorithms

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

CS621 : Artificial Intelligence

Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.

CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.

Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.

D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.

129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Evolutionary Computation Evolving Neural Network Topologies.

Coryn Bailer-Jones, ADASS XVII, September 2007, Kensington A method for exploiting domain information in parameter estimation Coryn Bailer-Jones Max Planck.

1 C.A.L. Bailer-Jones. Machine Learning. Support vector machines Machine learning, pattern recognition and statistical data modelling Lecture 9. Support.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Today’s Lecture Neural networks Training

Machine Learning Supervised Learning Classification and Regression

CS 9633 Machine Learning Support Vector Machines

Fall 2004 Backpropagation CS478 - Machine Learning.

Machine learning, pattern recognition and statistical data modelling

Machine learning, pattern recognition and statistical data modelling

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Classification of GAIA data

Machine Learning Today: Reading: Maria Florina Balcan

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Collaborative Filtering Matrix Factorization Approach

Neuro-Computing Lecture 4 Radial Basis Function Network

Artificial Neural Networks

Deep Neural Networks (DNN)

Boltzmann Machine (BM) (§6.4)

Introduction to Radial Basis Function Networks

Presentation transcript:

1 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Machine learning, pattern recognition and statistical data modelling Lecture 7. Neural networks, search and optimization Coryn Bailer-Jones

2 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Nonlinear mapping with a transfer function

3 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Introducing interactions between inputs

4 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Simultaneous regression: multiple outputs

5 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Pictogram: multilayer perceptron bias nodes not shown

6 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Notation

7 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Training the network

8 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Gradient descent

9 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Back propagation: weight update equations

10 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Regularization (weight decay)

11 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization MLPs for classification: cross-entropy error function

12 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Cross-entropy for more than two classes

13 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Linear outputs

14 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Two hidden layers

15 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Network flexibility/complexity ● Flexibility is controlled by number of hidden nodes and number of hidden layers – in theory one hidden layer is sufficient: with sufficiently large J, network can approximate any continuous function to arbitrary accuracy (caveat generalization ability) – in practice using two means fewer weights ● Need sufficient training vectors to overdetermine solution – NK error functions – (I+1)J + (J+1)K weights – but in practice weights are not independent

16 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Originally developed for modelling neural function No longer useful for this. Also not a useful or relevant analogy

17 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Optimization algorithms ● With gradient information – gradient descent – add second derivative (Hessian): Newton, quasi-Newton, Levenberg- Marquardt – conjugate gradients (Hessian not explicitly calculated) ● pure gradient methods get stuck in local minima – random restart – committee/ensemble of models – momentum terms (non-gradient info.) ● Without gradient information – simulated annealing – genetic algorithms

18 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Local minima in optimization

19 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Genetic algorithms

Photometric filter design problem ● design an optimal multiband filter system to estimate several stellar atmospheric parameters from spectra – subject to constraints – multiple conflicting demands on filter system – manual design complex, and gives no idea of optimality ● cast as a mathematical optimization problem: – parametrize filter system – establish a figure-of-merit of filter system performance – maximise this as a function of the filter system parameters ● see Bailer-Jones (2004)

Evolutionary algorithms ● population-based methods overcome local optima and permit a more efficient search of the parameter space ● 1 individual in population = 1 candidate filter system ● Evolutionary Algorthms (EAs) use the principle of natural selection from biological evolution – Genetic Algorithms (GAs), Evolutionary Strategies (ESs), Evolutionary Programming (EPs) ● genetic operators – reproduction: recombination; mutation (exploration) – selection (exploitation) ● provides a stochastic (but not random) search ● population evolves towards optimum (or optima)

Heuristic filter design (HFD) model ● figure-of-merit / fitness function: – solving a full regression model to test every filter system too slow and unnecessary – construct a measure of ability of filter system to maximally “separate” stars with a range of APs represented by a (synthetic) grid – grid showing variance in T eff, logg, [Fe/H], A V – use a simple instrument model to simulate counts and errors in candidate filter systems ● fixed instrument parameters and number of filters ● evolve population and find fittest filter system

Typical spectra used to calculate fitness wavelength / Å flux density (area normalised) AVAV logg T eff [Fe/H] enhanced × 1.1

HFD model initialise population simulate counts (and errors) from each star in each filter system calculate fitness of each filter system select fitter filter systems (probability fitness) mutate filter system parameters a

Filter system representation Each filter system consists of I filters each with 3 parameters: c central wavelength w half width at half maximum t fractional integration time (of total available for all filters) Generalised Gaussian profile with  g  = 4 y = exp(− ln2 [(l-c)/w] g  )

Fitness: SNR distance SNR distance of star r from neighbour n: p i,n = photon counts in filter i for star n  i,n = expectation of error in p i,n photon counts (and errors) are area normalised, i.e.  i p i,n = 1 (generalization of forming colours)

Fitness: vector separation a c b  For each source, a, and each AP, j, find nearest neighbour (NN) which differ only in j (“isovars”), e.g. b and c Calculate angle,  , between vectors: Nearer to 90 ° => better separation (less degeneracy) Calculate magnitude of cross product: V a,b,c = d a,b d a,c sin a d a,b d a,c

Fitness: final measure Cross product: V a,b,c = d a,b d a,c sin a Introduce sensitivity to APs add weighting of APs to boost significance of weak APs (esp. [Fe/H] and logg) difference in AP j between a and c weight for AP j, i.e. that which differs between a and c  a  b,c f a,b,c Fitness = sum over all NNs for source and for all sources in grid

Genetic operators Selection Individuals from parent population (generation g) selected (with replacement) with probability proportional to fitness. Elitism used to guarantee selection of best few. => intermediate population (IP) Mutation Parameters of each individual {c i, w i, t i } in IP mutated with a finite probability: c i (g+1) = c i (g) + N(0,  c ) h i (g+1) = h i (g)[1 + N(0,  h )] => children (next generation) t i (g+1) = t i (g)[1 + N(0,  t )]

HFD application Limits of search domain: ● l limits from CCD/instrument QE ● 80 Å < h < 4000 Å ● no limits on t (just normalization)

BBP-5 system: fitness evolution maximum fitness minimum fitness mean fitness median fitness

BBP-5 system: filter system evolution Evolution of all filter system parameters (200*5 for each parameter type at each generation)

BBP-5 system red = filter transmission x fractional integration time blue = Instrument*CCD QE (scaled) ● broad filters (bright limit: fitness a sum photons collected) ● overlapping filters ● 4 effective filters (1 almost “turned off”)

12-filter MBP: fitness evolution maximum fitness minimum fitness mean fitness median fitness

12-filter MBP: filter system evolution Evolution of all filter system parameters (200*12 for each parameter type at each generation)

12-filter MBP: optimized systems transmission multiplied by fractional integration time

37 C.A.L. Bailer-Jones. Machine Learning. Neural networks, search and optimization Summary ● feedforward neural networks – multilayer perceptron – radial basis function network – flexible, nonlinear regression/classification models – error function: RSS, cross-entropy – regularization (weight decay) – backpropagation algorithm ● optimization/search – gradient methods: problem of local minima – stochastic methods: simulated annealing, genetic algorithms