" The Maximum Likelihood Problem and Fitting the Sagittarius Dwarf Tidal Stream " Matthew Newby Astronomy Seminar RPI Oct. 22,
2 Introduction The Sagittarius Stream SDSS Locating Maximum Likelihood Methods Differential Evolution Monte-Carlo Markov-Chain Gradient Descent Genetic Search Particle Swarm Revisit the Sagittarius Stream BOINC Overview Current and Future Work Overview:
3 Introduction Modern Astronomy – No longer staring through a telescope Automated Surveys produce large data sets Image : NASA.gov Errors in measurements – statistical methods needed Fast and accurate computer routines are needed in order to analyze this information! Image : Wikimedia Commons computer$ go faster_
4 The Sloan Digital Sky Survey (SDSS): Image: sdss.org 230+ million objects 8,400 square degrees in the sky Large percentage of north galactic cap Very little data in galactic plane (too much dust) Several hundred thousand stars
The Sagittarius Dwarf Tidal Stream 5 Image (above): [Ibata et al. 1997, AJ] Image (left): David Martinez-Delgado (MPIA) & Gabriel Perez (IAC) The Sagittarius Dwarf Galaxy is merging with the Milky Way The dwarf is being tidally disrupted by the Milky Way, creating long tails. Provide information on matter distribution in Milky Way Provide constraints on Galactic Halo Mapping the Tidal Stream will:
6 The Milky Way: Halo Bulge Thin Disk Thick Disk ~30 kiloparsecs (100,000 light-years) Sun Sagittarius Dwarf Galaxy Tidal Stream Data Wedge
7 Data Stripe: Stripe 82 (southern galactic cap) F-turnoff stars on the H-R diagram Image: Newberg & Yanny 2006, JoP Conference series (modified by N. Cole
8 Cole, N. Sag. Stream: Model Assume stream is a cylinder Radial drop-off given by a Gaussian Distribution 2 background parameters r0, q 6 parameters per stream ε, μ, r, θ, φ, σ At least 8 parameters in the search – 8-dimensional solutions space! Background distribution:
9 Maximum Likelihood: Bayesian Method Must assume a prior – a model explaining the data Find the parameters that are the most likely in a data set, given the prior Law of large numbers Can assume that large data sets have normally distributed data points Find probability that each data point lies in the given distribution The you can get the likelihood: L(Q|D) = DataPointProb i
10 Computational Algorithms Overview: Set up problem Parameter space: all allowed values of parameters Likelihood evaluator for given parameters Evaluation method – moves in parameter space in an efficient way End conditions: when change in best is below a limit, or a predefined number of iterations is reached. Problems: Likelihood calculation is usually time-consuming Need to avoid local maximums – find global max What is the best method?
11 Computational Methods: No Free Lunch (David H. Wolpert, William G. Macready) Only eats meat Vegetarian Low Carb Diet Poor Students: Prices differ by restaurant! Not everyone can eat cheaply! One restaurant cannot be the best solution for every person (problem)! Burger PalaceGourmet SaladsNo Carbs at All Local Eateries, same menus, random prices: One solution method (or algorithm) will not be ideal for all problems! Need to choose the best solution for the job at hand! RosencrantzOpheliaGuildenstern
Conjugate Gradient Descent (CGD) 12 Calculates the gradient of the surface for each parameter Moves towards best likelihood using a line search Conjugate gradient uses the gradient of the previous step to converge faster Requires many likelihood calculations per move Unfortunately, may end at local maximums Need to run from several different directions in order to find global best Gradient Descent: 1-dimensional case location gradient Likelihood vs. Position best solution Local Maximum L = likelihood function Q = Parameter (i or j) hi = step size for ith parameter The gradient, G:
13 Line Search example (left): The first search does not find a better likelihood for the middle point (yellow), so the distance is doubled. This time, the new middle point (red) has the best likelihood. The next iteration of CGD will start at this point. Line Search starting point first middle point first end point next middle point next end point Evaluates two points in direction of gradient: one a distance 1d away, the other 2d d is usually related to the gradient (slope) If the middle point is not at a better likelihood than the end points, d is doubled and the process repeated If the middle point is higher, then the middle point becomes the starting point for another CGD Line Search causes the algorithm to reach the best likelihood efficiently
14 Monte-Carlo Markov-Chain (MCMC) A random walk method Samples parameter space well Automatically produces error distribution Easy to code Sensitive to running time and step size Never truly converges Metropolis-Hastings: Take a step in each direction (parameter) Step size/direction is random, drawn from a normal distribution If the new location has a better likelihood, move to it If the new location has a worse likelihood, then there is a chance of moving to it The trajectory of a 1000 step MCMC straight-line fit (top) and the distribution in b (bottom).
15 Genetic Search Inspired by natural selection Start with multiple individuals (positions) in parameter space Evaluate likelihood for each individual Remove individuals with the worst likelihoods Replace the removed individuals with children of the remaining individuals (parents) Parents can be chosen randomly or from the best likelihoods Create children through crossover and mutation: Crossover: A child inherits the parameters of multiple parents, either by averaging the parents parameters or by inheriting select parameters from each parent Mutation: Replace a parameter with a new, randomly generated one Repeat until end conditions are met
Differential Evolution 16 An individual moves according to the weighted difference between the locations of two parent individuals If the new position has a worse likelihood, then the individual does not move Parents may be random or chosen from the population best Also, multiple pairs of parents may be used (averaging over the differences) (center is global best) Difference Vector Change in position X No Change
Particle-Swarm Optimization 17 Physically Intuitive – based on animal behavior Particles have velocities Forces towards personal best, global best particle Global best velocity to global best to personal best Personal best Parameter Space Position (x) change at step t: w, c1,c2 are weighting parameters, p is personal best, g is global best, rand() is a random number
18 BOINC Berkeley Open Infrastructure for Network Computing stats:TotalActive Users37,25116,010 Hosts79,02325,101 Teams1, Countries Total Credit9,302,434,280 Recent average credit RAC52,731,529 Average floating point operations per second 527,315.3 GigaFLOPS / TeraFLOPS Users volunteer spare processor / graphics card time to the project Massively parallel Graphics processor technology has created a large increase in processing power is now the #2 ranked BOINC project You can help, too:
19 Sgr Stream StarsNon-Sgr Stream StarsSgr Stream Stars Separation: Stripe 82
20 Conclusions: Modern astronomy produces large data sets The Maximum Likelihood method is ideal for analyzing this data Powerful computer algorithms exist to perform MLE Mapping the Sagittarius Stream is possible by using these methods
21 The Sloan Digital Sky Survey BOINC.com Prof. Heidi Newberg, Rensselaer Polytechnic Institute Nathan Cole, Maximum Likelihood Fitting of Tidal Streams with Applications to the Sagittarius Dwarf Tidal Tails (PhD Thesis, Rensselaer Polytechnic Institute, 2008) Travis Desell, Aysnchronous [sic] Global Optimization for Massively Distributed Computing (PhD candidacy document, 2009) Shakespeare, et al. Hamlet Credits
22 3 stream search: