Download presentation
Presentation is loading. Please wait.
Published byMoris Palmer Modified over 9 years ago
1
Codes for astrostatistics: StatCodes & VOStat Eric Feigelson Penn State
2
Vast range of statistical problems in modern astronomy Poisson processes: point processes, time series analysis Image analysis: MLE deconvolution, adaptive smoothing, wavelet analyses Multivariate analysis & classification (w/ meas errors) Survival analysis (censoring & truncation w/ meas errors) Parametric models: Model selection, non-linear regression Non-parametric methods Confidence limits: bootstrap resampling Prior knowledge: Bayesian inference (see talk at PhysStat 2003 conference)
3
The problem Astronomers are insufficiently trained in modern applied statistics ….. but even if they knew what to do, they inadequate access to computer codes.
4
Astronomers never use large commercial statistical packages like SAS, SPSS, Statistica Some astronomers sometimes use UNIX-based command- line systems like MatLab or S-Plus. Astronomers like mini-codes in Numerical Recipes & often write their own codes. Many like IDL which has simple statistics. NASA/NSF observatories produce huge data analysis codes (IRAF, AIPS, CIAO, …) which by policy avoid proprietary codes A few specialized stand-along astrostat codes written under NASA funding: ROSTAT, ASURV, SLOPES, StatPy Altogether this is a very bad situation: vast statistical needs with very inadequate codes
5
The rise of the Virtual Observatory Vast collections of calibrated data (images, spectra, time series), extracted catalogs (rows=sources, columns=properties), and source bibliographies emerged during the 1990s. NASA Science Archive Centers (MAST, HEASARC, IRSA, LAMDA), bibliographic databases (ADS, SIMBAD, NED), & more are being transformed into a federated (though still distributed & heterogeneous) system. XML metadata (VOTable), SOAP protocols, … for data mining & extraction. but originally no plan for visualization & statistical analysis of extracted datasets
6
StatCodes: A partial solution In late-1990s, the Penn State group created a Web metasite with annotated links to ~200 open source packages & codes of utility to astronomers. Quite successful: 50-100 hits/day for 7 years. Multivariate & time series methods most popular. But the collection of on-line codes was very inhomogeneous and incomplete
7
R Finally a broad public-domain statistical software system emerges Based on the successful commercial UNIX-based S/S-Plus, R has an interactive command-line feel (like IDL), flexible data I/O, acceptable graphics, integration to C/Fortran/Python/…, and quite a lot of sophisticated statistical methods. Core R: 2000-page manual with ~200 functionalities, some very complex & advanced CRAN: 300 add-on packages, dozens useful to astronomers. Some are themselves full systems.
8
VOStat: A Web service 1.Web form interface providing simple statistical R functions with VOTable inputs 2.Same R functions provided through a more sophisticated Java-based grid-computing mode. User data bases Dispersed VO VOStat server Heavy statistical computation Answers Requests Heavy data
9
VOStat may be a big improvement but … Generic Web-based services are inherently inflexible & limited. VOStat may serve to entice the astronomer to download R & perform the real analysis at home. Astronomers need training in advanced methods before using them with R. Penn State has just created a Center for Astrostatistics to develop curriculum, conduct tutorials, provide template R code, etc. R/CRAN does not serve huge VO datasets or some special astrostat needs. New methodological/code development underway (CMU, Cornell, PSU, UCIrv,…)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.