Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analysis Through Segmentation: Bayesian Blocks and Beyond Space Science Division NASA Ames Research Center Collaborators:

Similar presentations


Presentation on theme: "Data Analysis Through Segmentation: Bayesian Blocks and Beyond Space Science Division NASA Ames Research Center Collaborators:"— Presentation transcript:

1 Data Analysis Through Segmentation: Bayesian Blocks and Beyond Jeffrey.D.Scargle@nasa.gov Space Science Division NASA Ames Research Center Collaborators: Brad Jackson, Mathematics Department, San Jose State University Jay Norris, NASA Goddard Spaceflight Center Mahmoud K. Quweider, U. Texas, Brownsville Astrostatistics Workshop High Energy Astrophysics Division American Astronomical Society September10, 2004

2 ● Overview of Bayesian Blocks Data Modes Model Fitness Model Optimization ● Sample Results

3

4 From Data to Astronomical Goals Input data: measurements distributed in some space Intermediate product: estimate of signal, image, density... End goal: estimates of values of scientifically relevant quantities sequential data (time series, spectra,... ) images, photon maps redshift surveys higher dimensional data

5 From Data to Astronomical Goals Input data: measurements distributed in some space Intermediate product: estimate of signal, image, density... End goal: estimates of values of scientifically relevant quantities sequential data (time series, spectra,... ) images, photon maps redshift surveys higher dimensional data Not Important

6 Exploratory analysis:little prior information – few assumptions Use simplest possible nonparametric model: piecewise constant (allows exact calculation of marginalized likelihoods) Take account of: observational noise exposure variations arbitrary sampling, gaps point spread functions (1D, 2D) Just say no to: pretty pictures; smoothing; continuous representations bins or pixels (unless raw data in this form) methods tuned for specific structures, e.g. beamlets resampling sensitivity to some global structures, e.g. periodic From Data to Astronomical Goals } intrinsic } constructs

7 Sequential Data Types Measurements at arbitrary times (known error distribution) ● instantaneous ● averages over finite interval, no overlap ● averages over finite interval, overlap (CMB spatial power spectra) Event (point) data ● one event per time tag (0-1 process) (BATSE time-tagged data) ● duplicate time tags OK Counts of events in bins ● evenly spaced bins ● arbitrary bins Time to accumulate fixed number of events (BATSE time-to-spill)

8 DATA CELLS: Definition Data space: the set of possible measurements in some experiment data cell: a data structure representing an individual measurement For a segmented model, the cells must contain all information needed to compute the model cost function. Data cells usually: ● Are in one-to-one correspondence to the measurements ● Partition the entire data space ● Do not overlap each other ● Leave no gaps between cells ● Contain information on adjacency to other cells... but exceptions to all of these are possible.

9 Signal Model: Piecewise Constant ● Simplest Possible Model ● Nonparametric ● Few prior assumptions about the signal: prior on signal amplitudes prior on number of partition elements ● No limitation on the resolution in the independent variable. ● Representation, while discontinuous, is convenient for further analysis ● Local structure, not global Represent signal as constant over elements of a partition of the data space.

10

11 Blocks Block: a set of data cells Two cases: ● connected (can't break into distinct parts) ● not constrained to be connected

12 DATA CELLS: Event (Point) Data Time tags always discrete: quantim of time = tick Likelihood of tick with n events: x n e -x / n! (x = model Poisson rate parameter) Block likelihood: L =  x n e -x / n! (product over bins in block)... a little algebra, and ignoring model-independent factors... = x N e -Mx (N = number of events in block M = number of ticks in block ) Marginal likelihood: Integrate L(x) P(x) dx, where P(x) is prior on x: P( Block | Data ) = Gamma( N + 1 ) ( M + 1 ) -(N+1) improper prior (finite, proper prior --> incomplete gamma function)

13 DATA CELLS: Event (Point) Data Measurements:Point coordinate, no duplicates Data Space:Space of any dimension Signal:Point density (deterministic or probabilistic) Data Cell:Voronoi cells for the data points Suf. StatisticsN = number of points in block V = volume of block Posterior:  (N+1, V–N+1) [B = beta function] Example: any problem usually approached with histograms (1D) positions of objects from a sky survey (2D) positions of objects in a redshift survey (3D)

14 DATA CELLS: Binned Data Measurements:Counts of points in bins, pixels,... Data Space:Interval, area, volume,... Signal:Point density (deterministic or probabilistic) Data Cell:Bin, pixel,... Suf. StatisticsN = number of points in block V = volume of block Posterior:  (N+1) / (W+1) N+1 W = bin size x exposure Example:BATSE burst data (64 ms bins)

15 DATA CELLS: Serial Measurements Measurements:Values and error distribution of dependent variable at given values of independent variable, e.g. X(t) ~ N(x,  ) Data Space:Interval, area, volume,... Signal:Variation of dependent variable Data Cell:Measurement point Suf. Statisticsx n = x n ( t n ), t n, parameter(s) of error distribution:  n  log posterior:… Example: time series, spectra, images,...

16 DATA CELLS: Distributed Measurements Measurement:Dependent variable averaged over a range of independent variable Data Space:Space of any dimension Signal:Physical variable Data Cell:Measurement and its interval Suf. Statistics:x,  x, W(t) = window function Posterior:see Bretthorst, G-S orthogonalization Example:spatial power spectra of CMB

17 no data Piecewise Constant Model (Partition) Can simply ignore gaps – model says nothing about signal there.

18 no data Piecewise Constant Model (Partition)... Or can interpolate from surrounding partition element.

19 The Optimizer Best = []; last = []; for R = 1:num_cells [ best(R), last(R) ] = max( [0 best] +... reverse( log_post( cumsum( data_cells(R:-1:1, :) ), prior, type ) ) ); if first > 0 & last(R) > first % Option: trigger on first significant block changepoints = last(R); return end % Now locate all the changepoints index = last( num_cells ); changepoints = []; while index > 1 changepoints = [ index changepoints ]; index = last( index - 1 ); end

20 What is Needed Data (data cells) Cost Function Must be additive over blocks E.g.: Likelihood for individual measurement Compute likelihood for Block Specifiy prior on rate parameter Integral to marginalize rate parameter Prior on number of changepoints E.g.,: P( k ) ~ gamma -k (k = number of changepoints) geometric prior [log P(k) ~ -k log( gamma ) = - k ncp_prior ]

21

22 What's New New Algorithm: Exact, global optimum guaranteed by dynamic programming Implicit search of exponentially large (2N) space in O(N 2 ) Efficient Real-Time mode (ideal for triggers) More Cost Functions (and more experience with old ones) Solved Problem of Dependence on Scale of Cell Size (finite prior) Extensions to 2D, 3D, and Higher (time/space/energy triggers!) Multivariate Bayesian Blocks (e.g. several BATSE detectors) Point-Spread Functions

23

24

25

26 Optimum Partitions in Higher Dimensions ● Blocks are collections of Voronoi cells (1D,2D,...) ● Relax condition that blocks be connected ● Cell location now irrelevant ● Order cells by volume Theorem: Optimum partition consists of blocks that are connected in this ordering ● Now can use the 1D algorithm, O(N 2 ) ● Postprocessing step identifies connected block fragments

27

28

29

30

31

32

33

34

35

36

37

38

39 Studies in Astronomical Time Series Analysis. V. Bayesian Blocks, a New Method to Analyze Structure in Photon Counting Data, Jeffrey D. Scargle, Ap. J., 504 (1998), 405-418 An Algorithm for Optimal Partitioning of Data on an Interval B. Jackson, J. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulcharoen, L. Tan, and T. Tsai, IEEE Signal Processing Letters, in press http://arxiv.org/abs/math/0309285 Adaptive Piecewise-constant Modeling of Signals in Multidimensional Spaces, J. Scargle,B. Jackson, J. Norris PHYSTAT2003, SLAC, Stanford, California, September 8-11, 2003 http://www-conf.slac.stanford.edu/phystat2003/papers/TUHT003.pdf Two papers on edge detection in images, in preparation.

40 First Simultaneous NIR/X-ray Detection of a Flare from Sgr A Eckart, Baganoff, Morris, Bautz, Brandt, Garmire, Genzel, Ott, Ricker, Straubmeier, Viehmann, Schödel. http://arxiv.org/abs/astro-ph/0403577 Chandra Deep Field-North Survey: XVII. Evolution of magnetic activity in old late- type stars, Feigelson, Hornschemeier, Micela, Bauer, Alexander, Brandt, Favata, Sciortino, and Garmire An XMM-Newton and Chandra Investigation of the Nuclear Accretion in the Sombrero Galaxy (NGC 4594), Pellegrini, Baldi, Fabbiano and D.-W. Kim Lens or Binary? Chandra Observations of the Wide-Separation Broad Absorption Line Quasar Pair UM 425, Aldcroft and Green Chandra X-Ray Observations of NGC 1316 (Fornax A), Kim and Fabbiano 1WGA J1216.9+3743: Chandra Finds an Extremely Steep Ultraluminous X-Ray Source, Cagnoni, Turolla, Treves,Huang, Kim, Elvis Celotti, astro-ph/0207654 X-Ray Stars in the Orion Nebula, Feigelson et al., ApJ, 574, 258 Chandra Multiwavelength Project. I. First X-Ray Source Catalog, Kim et al., The Astrophysical Journal Supplement Series, 150:19–41, 2004

41 A Bayesian Approach to Solar Flare Prediction Wheatland, Ap. J., 609 (2004) 1134-1139. The Origin of the Solar Flare Waiting-Time Distribution Wheatland, Ap. J., 536: L109-L112, 2000 June 20 The Coronal Mass Ejection Waiting-Time Distribution Wheatland The Waiting-Time Distribution of Solar Flare Hard X-Ray Bursts, Wheatland and Sturrock, Ap. J. 509 (1998), 448-455 Sympathetic Coronal Mass Ejections, Moon, Choe, Wang, and Park Statistical Evidence for Sympathetic Flares, Ap. J., 574 (2002), 434 Moon, Choe, Park, Wang, Gallagher, Chae, Yun and Goode Solar Flare Waiting Time Distribution: Varying-Rate Poisson or Lévy Function? Lepreti, Carbone and Veltri

42 NGC 4261 and NGC 4697: Rejuvenated Elliptical Galaxies, Zezas, Hernquist, Fabbiano, and Miller A Gamma-Ray Burst Trigger Tool Kit, David L. Band, Ap. J. 578, (2002) 806- 811 Gamma-Ray Bursts Have Millisecond Variability Walker, Schaefer, and Fenimore Attributes of GRB Pulses: Bayesian Blocks Analysis of TTE Data; a Microburst in GRB 920229, Scargle, Norris, Bonnell, 4 th Huntsville Gamma-ray Burst Symposium


Download ppt "Data Analysis Through Segmentation: Bayesian Blocks and Beyond Space Science Division NASA Ames Research Center Collaborators:"

Similar presentations


Ads by Google