Data Analysis Through Segmentation: Bayesian Blocks and Beyond Space Science Division NASA Ames Research Center Collaborators:

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Applications of one-class classification
Bayesian Belief Propagation
RHESSI Studies of Solar Flare Hard X-Ray Polarization Mark L. McConnell 1, David M. Smith 2, A. Gordon Emslie 4, Martin Fivian 3, Gordon J. Hurford 3,
Jeroen Stil Department of Physics & Astronomy University of Calgary Stacking of Radio Surveys.
X-ray Astrostatistics Bayesian Methods in Data Analysis Aneta Siemiginowska Vinay Kashyap and CHASC Jeremy Drake, Nov.2005.
Solar flare waiting time distribution (WTD) First steps Oscar Olmedo.
A giant flare from the magnetar SGR a tsunami of gamma-rays Søren Brandt Danish National Space Center.
Deriving and fitting LogN-LogS distributions An Introduction Andreas Zezas University of Crete.
A New Approach to Cluster Finding and Hit Reconstruction in Cathode Pad Chambers and its Development for the Forward Muon Spectrometer of ALICE A.Zinchenko,
EARS1160 – Numerical Methods notes by G. Houseman
Tidal Disruptions of Stars by Supermassive Black Holes Suvi Gezari (Caltech) Chris Martin & GALEX Team Bruno Milliard (GALEX) Stephane Basa (SNLS)
GLAST Science Support CenterAugust 9, 2004 Likelihood Analysis of LAT Data James Chiang (GLAST SSC – SLAC)
1 Hierarchical Image-Motion Segmentation using Swendsen-Wang Cuts Adrian Barbu Siemens Corporate Research Princeton, NJ Acknowledgements: S.C. Zhu, Y.N.
Andrea Caliandro 1 Andrea Caliandro (INFN - Bari) on behalf the FERMI-LAT collaboration PSR J : the youngest gamma-ray pulsar in the Galaxy?
GLAST LAT Project Astrostatistics Workshop, HEAD meeting, 10 September 2004 James Chiang (GSSC/UMBC) 1 Gamma-ray Large Area Space Telescope Challenges.
Constraining Astronomical Populations with Truncated Data Sets Brandon C. Kelly (CfA, Hubble Fellow, 6/11/2015Brandon C. Kelly,
Bayesian Analysis of X-ray Luminosity Functions A. Ptak (JHU) Abstract Often only a relatively small number of sources of a given class are detected in.
Statistical Analysis of High-Energy Astronomical Time Series Jeff Scargle NASA Ames – Fermi Gamma Ray Space Telescope Jeffrey D. Scargle Space Science.
Growth of Structure Measurement from a Large Cluster Survey using Chandra and XMM-Newton John R. Peterson (Purdue), J. Garrett Jernigan (SSL, Berkeley),
Dimensional reduction, PCA
Deriving and fitting LogN-LogS distributions Andreas Zezas Harvard-Smithsonian Center for Astrophysics.
Presenting: Assaf Tzabari
A Cosmology Independent Calibration of Gamma-Ray Burst Luminosity Relations and the Hubble Diagram Nan Liang Collaborators: Wei-Ke Xiao, Yuan Liu, Shuang-Nan.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
GLAST Science Support Center June 29, 2005Data Challenge II Software Workshop GRB Analysis David Band GSFC/UMBC.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
The Statistical Properties of Large Scale Structure Alexander Szalay Department of Physics and Astronomy The Johns Hopkins University.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
July 2004, Erice1 The performance of MAGIC Telescope for observation of Gamma Ray Bursts Satoko Mizobuchi for MAGIC collaboration Max-Planck-Institute.
Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR TokyoUniv.Trento and INFN IGEC time coincidence.
1 Probability and Statistics  What is probability?  What is statistics?
Random Sampling, Point Estimation and Maximum Likelihood.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Your Name Your Title Your Organization (Line #1) Your Organization (Line #2) About technique of alignment and stacking of TGF Vybornov V. Pozanenko.
Medical Image Analysis Image Reconstruction Figures come from the textbook: Medical Image Analysis, by Atam P. Dhawan, IEEE Press, 2003.
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
The Long and the Short of Gamma-Ray Bursts Kevin Hurley UC Berkeley Space Sciences Laboratory.
GRBs & VIRGO C7 run Alessandra Corsi & E. Cuoco, F. Ricci.
Fermi LAT Monash University Nov 21, 2009 R.DuboisFermi LAT Science Analysis Tutorial1 Issues in a Nutshell LS5039 Low stats: 4k photons in 1 yr Strong.
1 Statistics, Data Analysis and Image Processing Lectures Vlad Stolojan Advanced Technology Institute University of Surrey.
Dec 16, 2005GWDAW-10, Brownsville Population Study of Gamma Ray Bursts S. D. Mohanty The University of Texas at Brownsville.
Population synthesis and binary black hole merger rates Richard O’Shaughnessy Vicky Kalogera, Chris Belczynski LSC LIGO-G Z.
Extending the cosmic ladder to z~7 and beyond: using SNIa to calibrate GRB standard candels Speaker: Speaker: Shuang-Nan Zhang Collaborators: Nan Liang,
3rd International Workshop on Dark Matter, Dark Energy and Matter-Antimatter Asymmetry NTHU & NTU, Dec 27—31, 2012 Likelihood of the Matter Power Spectrum.
J. Jasche, Bayesian LSS Inference Jens Jasche La Thuile, 11 March 2012 Bayesian Large Scale Structure inference.
Top mass error predictions with variable JES for projected luminosities Joshua Qualls Centre College Mentor: Michael Wang.
Collaboration Meeting Moscow, 6-10 Jun 2011 Collaboration Meeting Moscow, 6-10 Jun 2011 Agustín Sánchez Losa IFIC (CSIC – Universitat de València)
Finding Black Hole Systems in Nearby Galaxies With Simbol-X Paul Gorenstein Harvard-Smithsonian Center for Astrophysics.
Data Visualization Fall The Data as a Quantity Quantities can be classified in two categories: Intrinsically continuous (scientific visualization,
LIGO- G Z AJW, Caltech, LIGO Project1 A Coherence Function Statistic to Identify Coincident Bursts Surjeet Rajendran, Caltech SURF Alan Weinstein,
Fermi GBM Observations of Gamma-Ray Bursts Michael S. Briggs on behalf of the Fermi GBM Team Max-Planck-Institut für extraterrestrische Physik NASA Marshall.
Automated Classification of X-ray Sources for Very Large Datasets Susan Hojnacki, Joel Kastner, Steven LaLonde Rochester Institute of Technology Giusi.
A Cosmology Independent Calibration of GRB Luminosity Relations and the Hubble Diagram Speaker: Speaker: Liang Nan Collaborators: Wei Ke Xiao, Yuan Liu,
Fermi Gamma-ray Burst Monitor
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Biointelligence Laboratory, Seoul National University
Photon Event Maps Source Detection Transient Detection Jeff Scargle
Searching for gravitational-wave transients with Advanced detectors
Deriving and fitting LogN-LogS distributions An Introduction
Ch8: Nonparametric Methods
Overview G. Jogesh Babu.
Wavelet Analysis for Sources Detection
Dynamic Causal Model for evoked responses in M/EEG Rosalyn Moran.
Bootstrap Segmentation Analysis and Expectation Maximization
Presentation transcript:

Data Analysis Through Segmentation: Bayesian Blocks and Beyond Space Science Division NASA Ames Research Center Collaborators: Brad Jackson, Mathematics Department, San Jose State University Jay Norris, NASA Goddard Spaceflight Center Mahmoud K. Quweider, U. Texas, Brownsville Astrostatistics Workshop High Energy Astrophysics Division American Astronomical Society September10, 2004

● Overview of Bayesian Blocks Data Modes Model Fitness Model Optimization ● Sample Results

From Data to Astronomical Goals Input data: measurements distributed in some space Intermediate product: estimate of signal, image, density... End goal: estimates of values of scientifically relevant quantities sequential data (time series, spectra,... ) images, photon maps redshift surveys higher dimensional data

From Data to Astronomical Goals Input data: measurements distributed in some space Intermediate product: estimate of signal, image, density... End goal: estimates of values of scientifically relevant quantities sequential data (time series, spectra,... ) images, photon maps redshift surveys higher dimensional data Not Important

Exploratory analysis:little prior information – few assumptions Use simplest possible nonparametric model: piecewise constant (allows exact calculation of marginalized likelihoods) Take account of: observational noise exposure variations arbitrary sampling, gaps point spread functions (1D, 2D) Just say no to: pretty pictures; smoothing; continuous representations bins or pixels (unless raw data in this form) methods tuned for specific structures, e.g. beamlets resampling sensitivity to some global structures, e.g. periodic From Data to Astronomical Goals } intrinsic } constructs

Sequential Data Types Measurements at arbitrary times (known error distribution) ● instantaneous ● averages over finite interval, no overlap ● averages over finite interval, overlap (CMB spatial power spectra) Event (point) data ● one event per time tag (0-1 process) (BATSE time-tagged data) ● duplicate time tags OK Counts of events in bins ● evenly spaced bins ● arbitrary bins Time to accumulate fixed number of events (BATSE time-to-spill)

DATA CELLS: Definition Data space: the set of possible measurements in some experiment data cell: a data structure representing an individual measurement For a segmented model, the cells must contain all information needed to compute the model cost function. Data cells usually: ● Are in one-to-one correspondence to the measurements ● Partition the entire data space ● Do not overlap each other ● Leave no gaps between cells ● Contain information on adjacency to other cells... but exceptions to all of these are possible.

Signal Model: Piecewise Constant ● Simplest Possible Model ● Nonparametric ● Few prior assumptions about the signal: prior on signal amplitudes prior on number of partition elements ● No limitation on the resolution in the independent variable. ● Representation, while discontinuous, is convenient for further analysis ● Local structure, not global Represent signal as constant over elements of a partition of the data space.

Blocks Block: a set of data cells Two cases: ● connected (can't break into distinct parts) ● not constrained to be connected

DATA CELLS: Event (Point) Data Time tags always discrete: quantim of time = tick Likelihood of tick with n events: x n e -x / n! (x = model Poisson rate parameter) Block likelihood: L =  x n e -x / n! (product over bins in block)... a little algebra, and ignoring model-independent factors... = x N e -Mx (N = number of events in block M = number of ticks in block ) Marginal likelihood: Integrate L(x) P(x) dx, where P(x) is prior on x: P( Block | Data ) = Gamma( N + 1 ) ( M + 1 ) -(N+1) improper prior (finite, proper prior --> incomplete gamma function)

DATA CELLS: Event (Point) Data Measurements:Point coordinate, no duplicates Data Space:Space of any dimension Signal:Point density (deterministic or probabilistic) Data Cell:Voronoi cells for the data points Suf. StatisticsN = number of points in block V = volume of block Posterior:  (N+1, V–N+1) [B = beta function] Example: any problem usually approached with histograms (1D) positions of objects from a sky survey (2D) positions of objects in a redshift survey (3D)

DATA CELLS: Binned Data Measurements:Counts of points in bins, pixels,... Data Space:Interval, area, volume,... Signal:Point density (deterministic or probabilistic) Data Cell:Bin, pixel,... Suf. StatisticsN = number of points in block V = volume of block Posterior:  (N+1) / (W+1) N+1 W = bin size x exposure Example:BATSE burst data (64 ms bins)

DATA CELLS: Serial Measurements Measurements:Values and error distribution of dependent variable at given values of independent variable, e.g. X(t) ~ N(x,  ) Data Space:Interval, area, volume,... Signal:Variation of dependent variable Data Cell:Measurement point Suf. Statisticsx n = x n ( t n ), t n, parameter(s) of error distribution:  n  log posterior:… Example: time series, spectra, images,...

DATA CELLS: Distributed Measurements Measurement:Dependent variable averaged over a range of independent variable Data Space:Space of any dimension Signal:Physical variable Data Cell:Measurement and its interval Suf. Statistics:x,  x, W(t) = window function Posterior:see Bretthorst, G-S orthogonalization Example:spatial power spectra of CMB

no data Piecewise Constant Model (Partition) Can simply ignore gaps – model says nothing about signal there.

no data Piecewise Constant Model (Partition)... Or can interpolate from surrounding partition element.

The Optimizer Best = []; last = []; for R = 1:num_cells [ best(R), last(R) ] = max( [0 best] +... reverse( log_post( cumsum( data_cells(R:-1:1, :) ), prior, type ) ) ); if first > 0 & last(R) > first % Option: trigger on first significant block changepoints = last(R); return end % Now locate all the changepoints index = last( num_cells ); changepoints = []; while index > 1 changepoints = [ index changepoints ]; index = last( index - 1 ); end

What is Needed Data (data cells) Cost Function Must be additive over blocks E.g.: Likelihood for individual measurement Compute likelihood for Block Specifiy prior on rate parameter Integral to marginalize rate parameter Prior on number of changepoints E.g.,: P( k ) ~ gamma -k (k = number of changepoints) geometric prior [log P(k) ~ -k log( gamma ) = - k ncp_prior ]

What's New New Algorithm: Exact, global optimum guaranteed by dynamic programming Implicit search of exponentially large (2N) space in O(N 2 ) Efficient Real-Time mode (ideal for triggers) More Cost Functions (and more experience with old ones) Solved Problem of Dependence on Scale of Cell Size (finite prior) Extensions to 2D, 3D, and Higher (time/space/energy triggers!) Multivariate Bayesian Blocks (e.g. several BATSE detectors) Point-Spread Functions

Optimum Partitions in Higher Dimensions ● Blocks are collections of Voronoi cells (1D,2D,...) ● Relax condition that blocks be connected ● Cell location now irrelevant ● Order cells by volume Theorem: Optimum partition consists of blocks that are connected in this ordering ● Now can use the 1D algorithm, O(N 2 ) ● Postprocessing step identifies connected block fragments

Studies in Astronomical Time Series Analysis. V. Bayesian Blocks, a New Method to Analyze Structure in Photon Counting Data, Jeffrey D. Scargle, Ap. J., 504 (1998), An Algorithm for Optimal Partitioning of Data on an Interval B. Jackson, J. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, and T. Tsai, IEEE Signal Processing Letters, in press Adaptive Piecewise-constant Modeling of Signals in Multidimensional Spaces, J. Scargle,B. Jackson, J. Norris PHYSTAT2003, SLAC, Stanford, California, September 8-11, Two papers on edge detection in images, in preparation.

First Simultaneous NIR/X-ray Detection of a Flare from Sgr A Eckart, Baganoff, Morris, Bautz, Brandt, Garmire, Genzel, Ott, Ricker, Straubmeier, Viehmann, Schödel. Chandra Deep Field-North Survey: XVII. Evolution of magnetic activity in old late- type stars, Feigelson, Hornschemeier, Micela, Bauer, Alexander, Brandt, Favata, Sciortino, and Garmire An XMM-Newton and Chandra Investigation of the Nuclear Accretion in the Sombrero Galaxy (NGC 4594), Pellegrini, Baldi, Fabbiano and D.-W. Kim Lens or Binary? Chandra Observations of the Wide-Separation Broad Absorption Line Quasar Pair UM 425, Aldcroft and Green Chandra X-Ray Observations of NGC 1316 (Fornax A), Kim and Fabbiano 1WGA J : Chandra Finds an Extremely Steep Ultraluminous X-Ray Source, Cagnoni, Turolla, Treves,Huang, Kim, Elvis Celotti, astro-ph/ X-Ray Stars in the Orion Nebula, Feigelson et al., ApJ, 574, 258 Chandra Multiwavelength Project. I. First X-Ray Source Catalog, Kim et al., The Astrophysical Journal Supplement Series, 150:19–41, 2004

A Bayesian Approach to Solar Flare Prediction Wheatland, Ap. J., 609 (2004) The Origin of the Solar Flare Waiting-Time Distribution Wheatland, Ap. J., 536: L109-L112, 2000 June 20 The Coronal Mass Ejection Waiting-Time Distribution Wheatland The Waiting-Time Distribution of Solar Flare Hard X-Ray Bursts, Wheatland and Sturrock, Ap. J. 509 (1998), Sympathetic Coronal Mass Ejections, Moon, Choe, Wang, and Park Statistical Evidence for Sympathetic Flares, Ap. J., 574 (2002), 434 Moon, Choe, Park, Wang, Gallagher, Chae, Yun and Goode Solar Flare Waiting Time Distribution: Varying-Rate Poisson or Lévy Function? Lepreti, Carbone and Veltri

NGC 4261 and NGC 4697: Rejuvenated Elliptical Galaxies, Zezas, Hernquist, Fabbiano, and Miller A Gamma-Ray Burst Trigger Tool Kit, David L. Band, Ap. J. 578, (2002) Gamma-Ray Bursts Have Millisecond Variability Walker, Schaefer, and Fenimore Attributes of GRB Pulses: Bayesian Blocks Analysis of TTE Data; a Microburst in GRB , Scargle, Norris, Bonnell, 4 th Huntsville Gamma-ray Burst Symposium