Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I
Gordon Watts (UW) APSNW Meeting May 13, SingleTop Challenges Overwhelming Background! Straight Cuts Difficulty taking advantage of correlations (and counting experiments) Multivariate Cuts (and shape fitting) Designed to take advantage of correlations and irreducible backgrounds
Gordon Watts (UW) APSNW Meeting May 13, Asymmetries in t-Channel Production b Pair Production Lots of variables give small separation (Use ME, phase space, etc.)
Gordon Watts (UW) APSNW Meeting May 13, Combine Variables! Multivariate Likelihood Fit 7 variables means 7 dimensions… Neural Network Many inputs and a single output Trained on signal and background sample Well understood and mostly accepted in HEP Decision Tree Many inputs and a single output Trained on signal and background sample Used mostly in life sciences & business (MiniBOONE - physics/ ).
Gordon Watts (UW) APSNW Meeting May 13, Decision Tree Trained Decision Tree (Binned Likelihood Fit) (Limit)
Gordon Watts (UW) APSNW Meeting May 13, Internals of a Trained Tree Every Event belongs to a single leaf node! “Rooted Binary Tree” “You can see a decision tree”
Gordon Watts (UW) APSNW Meeting May 13, Training Determine a branch point Calculate Gini Improvement As a function of a interesting variable (H T in this case) Choose the largest improvement as the cut point Repeat for all interesting variables HT, Jet pT, Angular Variables, etc. Best improvement is this node’s decision.
Gordon Watts (UW) APSNW Meeting May 13, Gini Process Requires a Variable to optimize separation. W s – Weight of Signal Events W b – Weight of Background Events Purity Gini G is zero for pure background or signal!
Gordon Watts (UW) APSNW Meeting May 13, Gini Improvement Data (S) S1S1 S2S2 For each node GI = G(S) – G(S 1 ) – G(S 2 ) Repeat the process for each subdivision of data
Gordon Watts (UW) APSNW Meeting May 13, And Cut… Determine the Purity of each leaf Stop process and generate a leaf. We used statistical sample error (# of events) Use Tree as Estimator of Purity Each event belongs to a unique leaf The leaf’s purity is the estimator of the event
Gordon Watts (UW) APSNW Meeting May 13, DT in the Single Top Search DT Wbb DT tt l+jets Two DTs 2d Histogram used in binned likelihood fit Trained on signal and Wbb as background Trained on signal and tt lepton + jets as background DØ This part is identical to a NN based analysis Separate DT for muon & electron Backgrounds: W+Jets, QCD, top Pair Production Fake Leptons
Gordon Watts (UW) APSNW Meeting May 13, Results Expected Limits s-channel: 4.5 pb (NN: 4.5) t-channel: 6.4 pb (NN: 5.8) Actual Limits s-channel: 8.3 pb (NN: 6.4) t-channel: 8.1 pb (NN: 5.0) Expected Results Close to NN
Gordon Watts (UW) APSNW Meeting May 13, Future of the Analysis Use a Single Decision Tree Train it against all backgrounds Pruning Train until each leaf has only a single event Recombine leaves (pruning) using statistical estimator Boosting Combine multiple trees, each weighted Train trees on event samples that have mis- classified event weights enhanced
Gordon Watts (UW) APSNW Meeting May 13, References & Introduction MiniBooNE Paper: hep-ex/ Recent Advances in Predictive (Machine) Learning Jerome H. Friedman, Conf. Proceedings I have then linked and other on my web page conferences
Gordon Watts (UW) APSNW Meeting May 13, Conclusions Decision Trees are good… –Model is obvious in form of 2d binary tree. –Not as sensitive to outliers in input data as other methods –Easily accommodate integer inputs (N Jets ) or missing variable inputs. –Easy to implement (several months to go from scratch to working code) Decision Trees aren’t so good… –Well understood input variables are a must Similar for Neural Networks, of course. –Minor changes in the input events can make for major changes in tree layout and results. –Estimator is not a continuous function Don’t have to deal with hidden nodes –Separate training of background or other issues