TerraFerMA A Suite of Multivariate Analysis Tools

Slides:



Advertisements
Similar presentations
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Advertisements

Java Packages CSci 1130 Intro to Computer Programming with Java Instructor Tatyana Volk.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
Luca Stanco – INFN - Padova2 December Combining p-values i.e. what happens to SIGNIFICANCE when next event comes ? There are two ways: 1) difficult,
Matthew Schwartz Harvard University March 24, Boost 2011.
Continuous simulation of Beyond-Standard-Model processes with multiple parameters Jiahang Zhong (University of Oxford * ) Shih-Chang Lee (Academia Sinica)
Neural Computation Final Project -Earthquake Prediction , Spring Alon Talmor Ido Yariv.
Summary of Results and Projected Sensitivity The Lonesome Top Quark Aran Garcia-Bellido, University of Washington Single Top Quark Production By observing.
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
Data Mining Techniques Outline
Overview of Non-Parametric Probability Density Estimation Methods Sherry Towers State University of New York at Stony Brook.
1 TerraFerMA A Suite of Multivariate Analysis Tools Sherry Towers SUNY-SB TerraFerMA is now ROOT-dependent only (ie; it is CLHEP-free) www-d0.fnal.gov/~smjt/multiv.html.
Update on NC/CC separation At the previous phone meeting I presented a method to separate NC/CC using simple cuts on reconstructed quantities available.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Data Mining.
Basics of discriminant analysis
Multivariate Analysis A Unified Perspective
Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook.
WW  e ν 14 April 2007 APS April Meeting WW/WZ production in electron-neutrino plus dijet final state at CDFAPS April Meeting April 2007 Jacksonville,
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
Relex Reliability Software “the intuitive solution
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
GLAST Science Support CenterAugust 9, 2004 Implementation of the Standard Analysis Environment (SAE) James Peachey (HEASARC/GLAST SSC—GSFC/L3)
Java Root IO Part of the FreeHEP Java Library Tony Johnson Mark Dönszelmann
Michigan REU Final Presentations, August 10, 2006Matt Jachowski 1 Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Use of Multivariate Analysis (MVA) Technique in Data Analysis Rakshya Khatiwada 08/08/2007.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Adam Amara, Thomas Kitching, Anais Rassat, Alexandre Refregier.
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
5/9/111 Update on TMVA J. Bouchet. 5/9/112 What changed background and signal have increased statistic to recall, signal are (Kpi) pairs taken from single.
A new clustering tool of Data Mining RAPID MINER.
Kalanand Mishra BaBar Coll. Meeting February, /8 Development of New Kaon Selectors Kalanand Mishra University of Cincinnati.
Various Rupak Mahapatra (for Angela, Joel, Mike & Jeff) Timing Cuts.
Flavor tagging – status & prospects M. Bruinsma, UCI Tools Workshop Oct 1 st 2005, SLAC
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Introduction to FFI: Why and how FFI was developed Introduction to FFI: Why and how FFI was developed 04/02/2013.
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.
What types of problems we study, Part 1: Statistical problemsHighlights of the theoretical results What types of problems we study, Part 2: ClusteringFuture.
Progress Apama Fundamentals
CSE 4705 Artificial Intelligence
FIXED ETHERNET SWITCH COMPARISON TOOL (v6- presentation mode will activate tool) The information in this document is confidential to Juniper Networks.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Coupling and Cohesion Rajni Bhalla.
Chapter 6 The Traditional Approach to Requirements.
Distributed Shared Memory
PERL.
Multiple Imputation using SOLAS for Missing Data Analysis
First Evidence for Electroweak Single Top Quark Production
Global PID MICE CM43 29/10/15 Celeste Pidcott University of Warwick
User Documents and Examples I
Multi-dimensional likelihood
Webinar – New KStutor Overview 25th October 2013
Creating and Using Classes
IX International Workshop ACAT
Project on H →ττ and multivariate methods
Neural Networks Geoff Hulten.
Comparisons of Clustering Detection and Neural Network in E-Miner, Clementine and I-Miner Jong-Hee Lee and Yong-Seok Choi.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
2 Getting Started.
2 Getting Started.
Java Analysis Studio and the hep.lcd classes
Statistical Methods for Data Analysis Multivariate discriminators with TMVA Luca Lista INFN Napoli.
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Measurement of the Single Top Production Cross Section at CDF
What's New in eCognition 9
JLSim – customizable traffic simulation application
Presentation transcript:

TerraFerMA A Suite of Multivariate Analysis Tools Sherry Towers SUNY-SB smjt@fnal.gov Today I will be talking to you about work I have done over the past couple of years in the field of multivariate analysis techniques. We will begin by briefly summarising the strengths and weaknesses of various common multivariate techniques. We will then discuss the development of GEM, a multivariate tool I developed as a graduate student…as we will see, GEM is quite similar to the PDE multivariate approach, with a few differences, which will become apparent as we discuss the performance of GEM. Now, a little talked about problem in high energy physics today is the proliferation of the number of variables used in analyses…the last decade or so has witnessed a sharp rise in the popularity of neural networks in physics analyses. Neural nets are a very powerful and valuable tool. Unfortunately, the ease of adding discriminators to a neural net has led to some analyses with literally dozens of variables. This is a phenomenon I call “overloading the dimensionality of an analysis”. I’ll talk about how this can be detrimental, and we will discuss a simple step-by-step method that can be used to avoid this problem. I will then show the result of applying the variable-reduction method to tauID and topID At D0. I’ll also show in these two examples how the GEM algorithm performs compared to the neural networks previously used in these analysis. Version 1.0 has been released useable by anyone with access to the CLHEP and Root libraries www-d0.fnal.gov/~smjt/multiv.html

TerraFerMA=Fermilab Multivariate Analysis (aka “FerMA”) TerraFerMA is, foremost, a convenient interface to various disparate multivariate analysis packages (ex: MLPfit, Jetnet, PDE/GEM, Fisher discriminant, binned likelihood, etc) User first fills signal and background (and data) “Samples”, which are then used as input to TerraFerMA methods. A Sample consists of variables filled for many different events.

Using a multivariate package chosen by user (ie; NN’s, PDE’s, Fisher Discriminants, etc), TerraFerMA methods yield probability that a data event is signal or background. TerraFerMA also includes useful statistics tools (means, RMS’s, and correlations between the variables in a Sample), and a method to detect outliers.

TerraFerMA makes it trivial to compare performance of different multivariate techniques (ie; simple to switch between using a NN and a PDE (for instance) because in TerraFerMA both use the same interface!) TerraFerMA makes it easy to reduce the number of discriminators used in an analysis (optional TerraFerMA methods sort variables to determine which have best signal/background discrimination power) TerraFerMA web page includes full documentation/descriptions

Future Plans… To package TerraFerMA with ROOT, dependencies on CLHEP matrix methods and random number generators must be excised. Process has begun. Predicted completion date: December. In the meantime, FerMA is fully useable/interfaceable with root-tuples by using it in compiled mode. See the package and users’ guide for detailed examples.

TerraFerMA documentation: TerraFerMA Version 1.0 TerraFerMA documentation: www-d0.fnal.gov/~smjt/ferma.ps TerraFerMA users’ guide: www-d0.fnal.gov/~smjt/guide.ps TerraFerMA package: …/ferma.tar.gz (includes example programs)