Download presentation
Presentation is loading. Please wait.
Published byGrant Wilcox Modified over 8 years ago
1
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper Florida State University PHYSTAT Workshop 2005 15 August 2005
2
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper2 Outline Analysis Example Available Software Wish List Summary
3
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper3 Example - DØ Single Top Group – I
4
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper4 Example - DØ Single Top Group – II Search for p+pbar → t + (q) + b + X 8 signal channels 7 background sources per signal channel QCD, ttbar(lj), ttbar(ll), Wjj, Wbb, WW, WZ Each data bin is the sum of tb, tqb, QCD, ttbar(lj), ttbar(ll), Wjj, Wbb, WW, WZ
5
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper5 Example - DØ Single Top Group – III Basic Statistical Quantity: Binned Likelihood Goal To measure s and t
6
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper6 Example – Statistical Problems – IV Background Modeling Model/data comparisons in multiple dimensions to determine region with “best” match. Background events are generally weighted, for example, by the probability that it could contain a b-jet. These “tag-rate functions” are the results of fits to 2 – 3 dimensional empirical densities.
7
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper7 Example – Statistical Problems – V Discriminant Variable Selection From a list of potentially useful variables, select the “best” sub-set. Multivariate Analyses Random Grid Search Neural Networks Decision Trees Bayesian Neural Networks
8
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper8 Example – Statistical Problems – VI Posterior Density Computation Must marginalize over hundreds of variables (acceptances and background yields) and must do so taking into account known dependencies. Analysis Validation Ideally, the entire analysis is run repeatedly on fake data-sets to study its frequency behavior.
9
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper9 Available Software – I Fitting Minuit applied to Root histograms PoissonGammaFit (more later!) Multivariate Methods RGSearch(a few incompatible versions) Jetnet (v3.4) (with C++ binding MLPfit(several versions) oo_neural (OOP version of BP)
10
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper10 Available Software – II Classifier (decision tree) C2.4 (decision tree) TerraFerma (misc. methods) BNN(Bayesian NN) Limit Setting top_statistics(Bayes, CLs) blimit(Bayes – more robust version of DØ web-calculator)
11
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper11 Available Software – III Adaptive Numerical Integration AdBayes(C++ binding of Alan Genz’s Fortran code) Python Bindings RGSearch, Jetnet, AdBayes, PoissonGammaFit, CLHEP, Coin, Root, etc.
12
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper12 PoissonGammaFit Model For each bin i we write the (mean) data count d i as a linear sum of N (mean) source counts Likelihood for observed distribution D ={D i }
13
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper13 PoissonGammaFit – II Bayesian Inference for Moments m r of p Prior (given source counts A ji )
14
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper14 What’s Available? C++ Class PoissonGammaFit (vvdouble& A, vdouble&D, stringprior=“flat”, bool scale=true, inttotal=10000) Methods m = o.mean() v = o.variance() vdouble= vector vvdouble = vector >
15
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper15 What’s Available? – II Main Program Usage: pgammafit-h -f [hist-file-list (histfile.list)] -n [# of sampling points (10000)] -o [name of plot (pgammafit.gif)] Uses HistogramCache, PoissonGammaFit, Minuit
16
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper16
17
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper17 Bayesian Neural Networks y(x,w) x1x1 x2x2 u, a v, b w = (u, a, v, b) weights For binary (0,1) classification p(1|x) y(x,w) → p(1|x)
18
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper18 HT_AllJets_MinusBestJets Dots p(1|H T ) = H tqb /(H tqb +H Wbb ) H is a 1-D histogram Curves individual NNs y(H T, w n ) Black curve Bayesian Neural Networks – II
19
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper19
20
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper20 What’s Available? Radford Neal’s Package C-codes compiled and linked into a set of programs: net-specSpecify network data-specSpecify training data net-genInitialize network mc-specSpecify MCMC parameters net-mcRun MCMC net-displayDisplay network parameters netwrite.pyWrite results to a C++ function
21
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper21 The Bad It’s a Jungle Out There! Difficult to express ideas clearly Tools typically cannot be moved, easily, from one framework to another No clear protocol for interface between heterogeneous data formats No algebra of histograms Histograms tightly coupled to their viewers: Use Root or die!
22
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper22 The Bad – II Inadequate Support For: Generating ensembles of observations, possibly with conditioning, to study bias, variance, coverage etc. Assessing robustness with respect to likelihoods and prior densities Studying different confidence limit procedures Studying different optimization criteria
23
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper23 The Non-Existent DØ has no (or inadequate) tools to: Browse data in truly interesting ways Perform goodness-of-fit tests that go beyond KS and χ 2 Construct Bayesian models, systematically Perform sensitivity analyses, systematically No domain-specific language
24
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper24 Wish List – I Free At Last! Statistical tool separate from, and independent of, the environment in which it might be used. However, provide bindings for different environments/languages (R, Root, Ruby, Python, Java, etc.) Less Is More! Each statistical tool should encapsulate a single coherent statistical idea.
25
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper25 Wish List – II Histograms Histogram and histogram viewers should be independent of each other. (A sensible idea from Marc Paterno!) Elegant algebra of histograms h = a*h 1 +b*h 2 /h 3 etc. Powerful, intuitive tools for multi-dim. data exploration
26
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper26 Wish List – III Likelihoods Flexible method for reporting them; perhaps as swarms of points generated via MCMC? Frequency Methods Flexible ensemble generator, with easily extracted sub-ensembles Flexible query of ensembles (to get coverage, error rates, variances, bias etc.)
27
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper27 Wish List – IV Bayesian Methods Flexible robustness studies (prior family, likelihood family etc.) Multi-dimensional integration (adaptive and Markov Chain MC) Domain Specific Language No dereferencing, auto_ptr, dynamic_cast, pointers, templates etc. please,… we’re British!
28
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper28 Summary The Good Many statistical tools are in use at DØ A lot more needed – opportunity for creativity! The Bad Current tools are a reflection of non-interacting idiosyncratic minds! The Non-Existent Lack of a domain-specific language for expression of statistical ideas. I don’t want to think about pointers and const-correctness when I’m trying to think about mathematics.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.