Classification and Regression Trees for Glast Analysis: to IM or not to IM? Toby Burnett Data Challenge Meeting 15 June 2003.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
Forecasting Models With Linear Trend. Linear Trend Model If a modeled is hypothesized that has only linear trend and random effects, it will be of the.
Outline input analysis input analyzer of ARENA parameter estimation
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Bill Atwood, August, 2003 GLAST 1 Covariance & GLAST Agenda Review of Covariance Application to GLAST Kalman Covariance Present Status.
Analysis Meeting 24Oct 05T. Burnett1 UW classification: new background rejection trees.
Analysis Meeting 26 Sept 05T. Burnett1 UW classification: background rejection (A summary – details at a future meeting)
Bill Atwood, July, 2003 GLAST 1 A GLAST Analysis Agenda Overarching Approach & Strategy Flattening Analysis Variables Classification Tree Primer Sorting.
Bill Atwood, SCIPP/UCSC, Oct, 2005 GLAST 1 DC2 Discussion – What Next? 1)Alternatives - Bill's IM analysis - ???? 2)Backing the IM Analysis into Gleam.
Analysis Meeting 31Oct 05T. Burnett1 Classification, etc. status at UW Implement “nested trees” Generate “boosted” trees for all of the Atwood categories.
Simulation / Reconstruction Working group Toby Burnett University of Washington Jan 2000 T.
Analysis Meeting 31Oct 05T. Burnett1 Classification, etc. status Implement pre-filters Energy estimation.
C&A 8 May 06 1 Point Source Localization: Optimizing the CTBCORE cut Toby Burnett University of Washington.
C&A 10April06 1 Point Source Detection and Localization Using the UW HealPixel database Toby Burnett University of Washington.
Bill Atwood, Nov. 2002GLAST 1 Classification PSF Analysis A New Analysis Tool: Insightful Miner Classification Trees From Cuts Classification Trees: Recasting.
Analysis Meeting 4/09/05 - T. Burnett 1 Classification tree Update Toby Burnett Frank Golf.
1 Using Insightful Miner Trees for Glast Analysis Toby Burnett Analysis Meeting 2 June 2003.
Introduction to Probability and Statistics Linear Regression and Correlation.
T. Burnett: IRF status 5-feb-061 DC2 IRF Status Meeting agenda, references at:
Analysis Meeting 14Nov 05T. Burnett1 Classification, etc. status at UW New application to apply trees to existing tuple Background rejection for the v7r2.
Analysis Meeting 8 Aug 05T. Burnett1 Status of UW classification and PSF fits Data selection Classification PSF.
Statistics 03 Hypothesis Testing ( 假设检验 ). When we have two sets of data and we want to know whether there is any statistically significant difference.
Tracker Reconstruction SoftwarePerformance Review, Oct 16, 2002 Summary of Core “Performance Review” for TkrRecon How do we know the Tracking is working?
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Discriminant Analysis Testing latent variables as predictors of groups.
Chemometrics Method comparison
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele, University of.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
Lecture 3 Forestry 3218 Avery and Burkhart, Chapter 3 Shiver and Borders, Chapter 2 Forest Mensuration II Lecture 3 Elementary Sampling Methods: Selective,
Basic Probability (Chapter 2, W.J.Decoursey, 2003) Objectives: -Define probability and its relationship to relative frequency of an event. -Learn the basic.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
DC2 at GSFC - 28 Jun 05 - T. Burnett 1 DC2 C++ decision trees Toby Burnett Frank Golf Quick review of classification (or decision) trees Training and testing.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
October 14, 2004 Single Spin Asymmetries 1 Single Spin Asymmetries for charged pions. Overview  One physics slide  What is measured, kinematic variables.
Feb. 7, 2007First GLAST symposium1 Measuring the PSF and the energy resolution with the GLAST-LAT Calibration Unit Ph. Bruel on behalf of the beam test.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Correlation & Regression Analysis
Chapter 10: Determining How Costs Behave 1 Horngren 13e.
GLAST Calorimeter Crystal Position Measurement Zach Fewtrell, NRL/Praxis GLAST Integration & Test Workshop SLAC July 14, 2005.
Chapter 11: The ANalysis Of Variance (ANOVA)
Analysis of Experiments
Tutorial I: Missing Value Analysis
GLAST/LAT analysis group 31Jan05 - T. Burnett 1 Creating Decision Trees for GLAST analysis: A new C++-based procedure Toby Burnett Frank Golf.
RECITATION 4 MAY 23 DPMM Splines with multiple predictors Classification and regression trees.
Kalanand Mishra February 23, Branching Ratio Measurements of Decays D 0  π - π + π 0, D 0  K - K + π 0 Relative to D 0  K - π + π 0 decay Giampiero.
Paolo Massarotti Kaon meeting March 2007  ±  X    X  Time measurement use neutral vertex only in order to obtain a completely independent.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical hypothesis Statistical hypothesis is a method for testing a claim or hypothesis about a parameter in a papulation The statement H 0 is called.
Survival Skills for Researchers Study Design. Typical Process in Research Design study Generate hypotheses Develop tentative new theories Analyze & interpret.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
CPH Dr. Charnigo Chap. 9 Notes To begin with, have a look at Figure 9.5 on page 315. One can get an intuitive feel for how a tree works by examining.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Machine Learning: Ensemble Methods
Background Rejection Activities in Italy
BINARY LOGISTIC REGRESSION
Simple Linear Regression
Background Rejection Prep Work
SIMPLE LINEAR REGRESSION MODEL
Correlation and Simple Linear Regression
° status report analysis details: overview; “where we are”; plans: before finalizing result.. I.Larin 02/13/2009.
Chapter 11: The ANalysis Of Variance (ANOVA)
Correlation and Simple Linear Regression
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Lecture 10/24/ Tests of Significance
Product moment correlation
Tower A: A First Look Finding the Data: Login: THIS ONE! user: glast
Decision trees MARIO REGIN.
Presentation transcript:

Classification and Regression Trees for Glast Analysis: to IM or not to IM? Toby Burnett Data Challenge Meeting 15 June 2003

T. Burnett GLAST Data Challenge Workshop 2 The problem Bill is using IM classification and regression tree analysis for analysis: –Calorimeter validity –PSF tail suppression –background suppression IM is proprietary, and rather expensive ($5K): only UW and UCSC have academic licenses ($500 single; $1K for 10)

T. Burnett GLAST Data Challenge Workshop 3 Bill’s IM worksheet (PSFAnalysis_14) Training region Analyze results Input tuple Predicion tree

T. Burnett GLAST Data Challenge Workshop 4 The Trees: calculate 4 values with 11 nodes Good calorimeter measurement [1 node] vertex vs. 1 track (thin and thick) [2 nodes] Core vs tail (thin/thick and vtx/1 trk) [4 nodes] Prediction of recon direction error [ 4 nodes] Example: A Good CAL/Bad Cal prediction node CalTwrEdge =26.58, CalTwrEdge 3,611.48, CalTrackDoca>3.96, CalXtalRatio 1.76

T. Burnett GLAST Data Challenge Workshop 5 Bill’s result* * Flawed by G4 problems

T. Burnett GLAST Data Challenge Workshop 6 A Solution IM saves its results as XML files, which are easy to interpret A new package, “classification” defines a class classification::Tree that does the following: –accepts a “lookup” object to obtain a pointer to the double associated with named quantities –parses the XML file, creating trees for each prediction tree found –returns a value from each tree Merit creates and fills the new tuple variables, in a new class ClassificationTree. –duplicates the logic defining the 4 categories –evaluates each of the 4 variables

T. Burnett GLAST Data Challenge Workshop 7 Current Procedure Bill releases an IM file. I strip it down, removing nodes not required for analysis –size reduced by 1/2, to 500 Kb. Rename it, and check it in to cvs as classification/xml/PSF_Analysis.xml Create a tuple with merit, containing the new tuple quantities Feed that tuple to this IM worksheet, which writes a new tuple with both versions

T. Burnett GLAST Data Challenge Workshop 8 Results: the good The comparisons were with generated 100 MeV normal The vertex classification (used to select vertex vs. 1 Track direction estimate) is perfect, as is the core vs. tail

T. Burnett GLAST Data Challenge Workshop 9 Results: the bad The results of the “regression tree” to predict the psf error has two populations! The agreement is rather poor for the “thin vertex” category; otherwise perfect. An explanation: Bill generated two different trees from different data sets, of 1000, and 243 events. (The latter has only two nodes and can only generate 3 values.) –The merit evaluation is only the first tree –The evaluation uses an average of the two trees. –Note that there are three branches.

T. Burnett GLAST Data Challenge Workshop 10 Results: the ugly This is the comparison of the prediction for good energy measurement Again, Bill created two trees, which are apparently being averaged.

T. Burnett GLAST Data Challenge Workshop 11 Observations Fixing the “disagreement” –Bill: will train only one tree –me: average all the trees Using IM to train the classification or regression trees –The current procedure is exploratory –If we decide to use these trees in the final analysis, they must be trained systematically –Another possibility (idea from Tracy): use the classification/regression analysis in S-PLUS, which manages tree objects.

T. Burnett GLAST Data Challenge Workshop 12 S-PLUS No question about academic licenses, ($100 per license at UW) Linux version available Open source alternative: R Scriptable, also callable from C++ Supports the same classification and regression tree functions (we think!) Fit a Regression or Classification Tree DESCRIPTION: Grows a tree object from a specified formula and data. USAGE: tree(formula, data= >, weights= >, subset= >, na.action=na.fail, method="recursive.partition", control= >, model=NULL, x=F, y=T,...) REQUIRED ARGUMENTS: formula a formula expression as for other regression models, of the form `response ~ predictors'.

T. Burnett GLAST Data Challenge Workshop 13 Status Work done by a summer student –Explore classification tree with random x, y in 0,1; good=x<y; See validity plot at right –Explore regression tree: feed it x, y=x^2, have it create a predictor for y. In progress: direct comparison –Choose the GoodCAL category: ifelse((EvtMcEnergySigma > -5. ), "GoodCAL","BadCal") –Use IM (v2) to create classification with independent variables used by Bill. –Write the results to a file for S-PLUS Next steps: –Run the same analysis in S-PLUS, compare –Establish procedures to construct tree predictions with R or S-PLUS