Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Design of Experiments Lecture I
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Simple Linear Regression
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Statistical Methods Chichang Jou Tamkang University.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Chapter 14 Inferential Data Analysis
Statistics Continued. Purpose of Inferential Statistics Try to reach conclusions that extend beyond the immediate data Make judgments about whether an.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Difference Two Groups 1. Content Experimental Research Methods: Prospective Randomization, Manipulation Control Research designs Validity Construct Internal.
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Academic Research Academic Research Dr Kishor Bhanushali M
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
CS Statistical Machine learning Lecture 24
Lecture 2: Statistical learning primer for biologists
© Copyright McGraw-Hill 2004
Course Outline Presentation Reference Course Outline for MTS-202 (Statistical Inference) Fall-2009 Dated: 27 th August 2009 Course Supervisor(s): Mr. Ahmed.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Canadian Bioinformatics Workshops
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Model Inference and Averaging
Ch3: Model Building through Regression
Overview G. Jogesh Babu.
Data Mining Lecture 11.
Filtering and State Estimation: Basic Concepts
Statistical NLP: Lecture 4
Ch13 Empirical Methods.
Analytics – Statistical Approaches
Machine Learning: Lecture 6
Introductory Statistics
Presentation transcript:

Overview G. Jogesh Babu

Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots in astronomy (starting with Hipparchus in 4th c. BC) Relevance of statistics in astronomy today State of astrostatistics today Methodological challenges for astrostatistics in 2000s

Descriptive Statistics Introduction to R programming language, an integrated suite of software facilities for data manipulation, calculation and graphical display. Descriptive statistics helps in extracting the basic features of data & provide summaries about the sample and the measures. Commonly used techniques such as, graphical description, tabular description, and summary statistics, are illustrated through R.

Exploratory Data Analysis An approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to: –maximize insight into a data set –uncover underlying structure –extract important variables –detect outliers and anomalies –formulate hypotheses worth testing –develop parsimonious models –provide a basis for further data collection through surveys or experiments

Probability theory Conditional probability & Bayes theorem (Bayesian analysis) Expectation, variance, standard deviation (units free estimates) density of a continuous random variable (as opposed to density defined in physics) Normal (Gaussian) distribution, Chi-square distribution (not Chi-square statistic) Probability inequalities and the CLT

Correlation & Regression Correlation coefficient Underlying principles of linear and multiple linear regression Least squares estimation Ridge regression Principal components

Linear regression issues in astronomy Compares different regression lines used in astronomy Illustrates them with Faber-Jackson relation.

Statistical Inference While Descriptive Statistics provides tools to describe what the data shows, the statistical inference helps in reaching conclusions that extend beyond the immediate data alone. Statistical inference helps in making judgments of an observed difference between groups is a dependable one or one that might have happened by chance in a study. Topics to be covered include: –Point estimation –Confidence intervals for unknown parameters –Principles of testing of hypotheses

Maximum Likelihood Estimation Likelihood - differs from that of a probability –Probability refers to the occurrence of future events –while a likelihood refers to past events with known outcomes MLE is used for fitting a mathematical model to data. Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit.

MLE Contd. Thomas Hettmansperger's lecture includes: –Maximum likelihood method for linear regression, an alternative to least squares method –Cramer-Rao inequality, which sets a lower bound on the error (variance) of an estimator of parameter. It helps in finding the `best' estimator. Analysis of data from two or more different populations involve mixture models. –The likelihood calculations are difficult, so an iterative device called EM algorithm will be introduced. Computations are illustrated in the Lab

Nonparametric Statistics These statistical procedures make no assumptions about the probability distributions of the population. The model structure is not specified a priori but is instead determined from data. As non-parametric methods make fewer assumptions, their applicability is much wider Procedures described include: –Sign test –Mann-Whitney two sample test –Kruskal-Wallis test for comparing several samples

Bayesian Inference As evidence accumulates, the degree of belief in a hypothesis ought to change Bayesian inference takes prior knowledge into account The quality of Bayesian analysis depends on how best one can convert the prior information into mathematical prior probability Tom Loredo describes methods for parameter estimation, model assessment etc Illustrates with examples from astronomy

Multivariate analysis Analysis of data on two or more attributes (variables) that may depend on each other –Principle components analysis, to reduce the number of variables –Canonical correlation –Tests of hypotheses –Confidence regions –Multivariate regression –Discriminant analysis (supervised learning). Computational aspects are covered in the lab

Bootstrap How to get most out of repeated use of the data. Bootstrap is similar to Monte Carlo method but the `simulation' is carried out from the data itself. A very general, mostly non-parametric procedure, and is widely applicable. Applications to regression, cases where the procedure fails, and where it outperforms traditional procedures will be also discussed

Goodness of Fit Curve (model) fitting or goodness of fit using bootstrap procedure. Procedure like Kolmogorov-Smirnov does not work in multidimensional case, or when the parameters of the curve are estimated. Bootstrap comes to rescue Some of these procedures are illustrated using R in a lab session on Hypothesis testing and bootstrapping

Model selection, evaluation, and likelihood ratio tests The model selection procedures covered include: Chi-square test Rao's score test Likelihood ratio test Cross validation

Time Series & Stochastic Processes Time domain procedures State space models Kernel smoothing Poisson processes Spectral methods for inference A brief discussion of Kalman filter Illustrations with examples from astronomy

Monte Carlo Markov Chain MCMC methods are a collection of techniques that use pseudo-random (computer simulated) values to estimate solutions to mathematical problems MCMC for Bayesian inference Illustration of MCMC for the evaluation of expectations with respect to a distribution MCMC for estimation of maxima or minima of functions MCMC procedures are successfully used in the search for extra-solar planets

Spatial Statistics Spatial point processes Intensity function Homogeneous and inhomogeneous Poisson processes Estimation of Ripley's K function (useful for point pattern analysis).

Cluster Analysis Data mining techniques Classifying data into clusters –k-means –Model clustering –Single linkage (friends of friends) –Complete linkage clustering algorithm