Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper1 Statistical Tools A Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

BI Web Intelligence 4.0. Business Challenges Incorrect decisions based on inadequate data Lack of Ad hoc reporting and analysis Delayed decisions.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Lab3: writing up results and ANOVAs with within and between factors 1.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Statistical Image Modelling and Particle Physics Comments on talk by D.M. Titterington Glen Cowan RHUL Physics PHYSTAT05 Glen Cowan Royal Holloway, University.
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Statistical Background
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Semi-Supervised Learning
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
30th September 2005ROOT2005 Workshop 1 Developments in other math and statistical classes Anna Kreshuk, PH/SFT, CERN.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Page 1 Charles Plager LJ+MET, March 23, 2009 Charles Plager UCLA LJ+MET Meeting March 23, 2008 “Throwing PEs” and More.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
CHAPTER 2 Statistical Inference, Exploratory Data Analysis and Data Science Process cse4/587-Sprint
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Molecular Systematics
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
October 19, 2000ACAT 2000, Fermilab, Suman B. Beri Top Quark Mass Measurements Using Neural Networks Suman B. Beri, Rajwant Kaur Panjab University, India.
Tracking Multiple Cells By Correspondence Resolution In A Sequential Bayesian Framework Nilanjan Ray Gang Dong Scott T. Acton C.L. Brown Department of.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 27 March 2000CL Workshop, Fermilab, Harrison B. Prosper The Reverend Bayes and Solar Neutrinos Harrison B. Prosper Florida State University 27 March,
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.
Canadian Bioinformatics Workshops
Confidence Intervals Lecture 2 First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
Referee Report on Open charm production results for summer conferences, 2010 Peter Clarke Marcel Merk “Observations” and “Comments” The referees thank.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Developments in other math and statistical classes
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Multivariate Analysis Past, Present and Future
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Multidimensional Integration Part I
Top mass measurements at the Tevatron and the standard model fits
Presentation transcript:

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper1 Statistical Tools A Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop March 2004

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper2 Outline  Issues  Wish List  Example  Summary

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper3 Statistical Tools: Issues  Some difficulties with tools used in HEP  Difficult to express ideas cleanly and clearly  Tools scattered over different (typically, monolithic) programs  Interface between heterogeneous data formats and disparate tools is a headache  Histograms are tightly coupled to their viewers  Algebra of histograms relatively crude  Inadequate support for systematic study of ensembles

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper4 Issues – II  In a systematic statistical study one may wish to:  Generate different ensembles of observations, possibly with conditioning, and study various statistical properties (bias, variance, coverage etc.)  Assess robustness with respect to  prior densities and likelihoods  Study different confidence limit procedures  Study different optimization criteria

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper5 Issues – III  One may wish to study:  Type I and type II error rates  Consistency – both convergence to, and rate of convergence to, the true answer as sample size increases  Probability densities p(z) given underlying distributions p(x)

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper6 Wish List  Decoupling  Statistical tool separate from, and independent of, the environment in which it might be used.  However, provide bindings for different environments/languages (R, Root, Python, Java, etc.)  Modularity  Each statistical tool encapsulates a single coherent statistical idea. Avoid monoliths.  Histograms  Histogram and histogram viewers independent of each other. (A sensible idea from Marc Paterno!)  Elegant algebra of histograms h = a*h 1 +b*h 2 /h 3 etc.  Powerful, intuitive tools for multi-dim. data exploration

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper7 Wish List – II  Likelihoods  Flexible method for reporting them; maybe as swarms of points generated via MCMC?  Frequency Methods  Flexible ensemble generator, which allows easily extracted sub-ensembles  Flexible query of ensembles (to get coverage, error rates, variances, bias etc.)  Bayesian Methods  Flexible robustness studies (prior family, likelihood family etc.)  Multi-dimensional integration (adaptive and Markov chain MC)

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper8 Example: A Current Statistical Problem From DØ Single Top Group  Set limit on  (p+pbar → t + X) given an histogram for each of  4 signal channels  tq(EC), tqb(EC), tq(CC), tqb(CC)  4 background sources per signal channel  QCD, ttbar(l+jets), ttbar(ll), W+Jets  Some histograms are weighted, some unweighted  We would like to study different limit procedures, including Bayesian, and study their frequency properties. Currently using ad hoc and rather inflexible pieces of homegrown C++!

Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper9 Summary  The Good  Lots of statistical tools already exist  A lot more needed – opportunity for creativity!  The Bad  Use of current tools, however, often requires familiarity with several frameworks/languages  The Ugly  Lack of a simple, but powerful, language for expression of statistical ideas. Rapid “what if” analyses done with C++. This is crazy! I don’t want to think about pointers and de-referencing when I’m trying to think about mathematics.