e - / π separation with TRD Gennady Ososkov LIT JINR, Dubna Semen Lebedev GSI, Darmstadt and LIT JINR, Dubna Claudia Höhne, Florian Uhlig GSI, Darmstadt Pavel Nevski BNL-CERN 14th CBM Collaboration meeting October , Split, Croatia
G.Ososkov e/pi separation CBM Collaboration Meeting, TR Production (recall) dE/dx is described reasonably well TR is calculated using a model: M. Castellano et al. Computer Physics Communication 61 (1990) Parameters of the model are chosen to describe the measured data Nr. of foils Foil thickness Gas thickness Parameters are adjusted only for 1.5 GeV/c Trunk version of CBM ROOT from SEPT09 was used. Florian Uhlig, CBM Collaboration Meeting, March 2009 π contribution = dE/dx e - contribution= dE/dx+TR See also presentation from S.Lebedev et al on energy loss simulation discussions
G.Ososkov e/pi separation CBM Collaboration Meeting, NIM A326 (1993) From: Dolgoshein TRD overview NIM A326 (1993) …two methods of signal processing have been mainly used : 1) total energy deposition ("Q-method") The main factor limiting the rejection of nonradiating particles in this case is the Landau "tail" of ionization loss which simulates a large energy deposition comparable with that of a radiating particle Inspired by discussion with Pavel Nevski (BNL,CERN) There are important distinctions between the above two methods (Q and N) - for N-method compared to the Q-method thinner foils and detector gas layers are needed; - The readout for the two methods is also different. ADCs (or FADCs) are needed for the Q-method. N-method requires fast discriminators and sealers. With the second method the avalanches produced by X-ray photoelectrons are recorded and counted when they exceed a given threshold (typically 4-5 keV). Nonradiating particles should provide fewer detectable clusters. cluster counting ("N-method”), proposed yet in ) cluster counting ("N-method”), proposed yet in 1980
G.Ososkov e/pi separation CBM Collaboration Meeting, Default method e - /π identification in CBM 30 years passed and new more powerful classifiers appeared Method which is used now on CBM as a defaut (V.Ivanov, S.Lebedev, T.Akishina, O.Denisova) is based on applying a neural network. Energy losses from all 12 TRD layers are normalized i=1,..,12 ordered, then the values of Landau distribution function are calculated to be input to NN Pion suppression result depends on parameters of NN class. For fixed choice of initial weights it is 550
G.Ososkov e/pi separation CBM Collaboration Meeting, Threshold methods on the CBM TRD: (1) photon cluster counting π e-e- 1) Easy algorithm for N counting: - cleare N; - compare with cut=5 KeV, if > 5, then increase N by 1; - repeate for each of 12 TRD layers After 12 ifs N is the photon cluster size to be histogrammed separately for pions and electrons. 2) Then PID algorithm is common: - if N > threshold, then e - is indentified, else – pion is identified \ Pion suppression with cut=5 KeV is 448 After the cut optimization it is 584, although the e - efficiency drops down to 88.3% Boris Dolgoshein (1993): «In general, the cluster counting (N) method, should be the best method due to the distinction of the Poisson distributed number of the ionization clusters produced by nonradiated particles against the Landau tail of dE/dx losses in case of Q-method. But the comparison of the two methods is complicated and should be done for each individual case, because of the different optimum structures required for both methods and the problem of the cluster counting ability of TR chambers.» The main lesson: a transformation needed to reduce Landau tails of dE/dx losses
G.Ososkov e/pi separation CBM Collaboration Meeting, Threshold methods on the CBM TRD: (2) ordered statistics Now, the easy threshold algorithm with corresponding cut on λ 1 gives the pion suppresion 10. Median, which is 6th order statistic λ 6 with the distribution function proportional to [F Landau (x)(1-F Landau (x))] 6, gives pion suppression = 374 π e-e- Such trasformation can be provided by ordering the ΔE i sample i=1,..,12. For instance, the first order statistic λ 1 =ΔE min has the distribution function F(x)=P{λ 1 <x}= P{ΔE 1 <x, ΔE 2 <x,…, ΔE 12 <x}=[F Landau (x)] 12 That means a substantional distribution compression along the horizontal axis. i.e. tails diminishing. The main conclusion: The main conclusion: some of ordered statistics can also be used for pion suppression. However, why don’t use the information of all of them as a neural net input? π e-e- original Landau distribution λ 1 distribution Median distribution
G.Ososkov e/pi separation CBM Collaboration Meeting, The idea of ordering signals to be input to NN Distributions of all 12 dE/dx after their ordering NN with input of 12 ordered ΔE gives pion suppression = 685
G.Ososkov e/pi separation CBM Collaboration Meeting, Idea: input to NN probabilities of ΔE Plot of cumulative distributions of ΔE calculated from previous histogramms Note: all ΔE-s must be scaled to interval [0,1] to be input to NN. The excellent guess – input to NN not ΔE-s, but their probabilities calculated individually for each ΔE i One can calculate these probabilities either by pion distribution or by electron one Pion suppression for the first case = When e - distribution is used, it is = 786
G.Ososkov e/pi separation CBM Collaboration Meeting, A thought: -let us try some other classifiers from TMVA (Toolkit for MultiVariate data Analysis with ROOT)
G.Ososkov e/pi separation CBM Collaboration Meeting, Decision trees in Particle Identification (From Ososkov’s lecture on CBM Tracking workshop, 15 June 2009) data sample Single Decision Tree Root Node Branch Leaf Node ● Go through all PID variables, sort them, find the best variable to separate signal from background and cut on it. ● For each of the two subsets repeat the process. ● This forking decision pattern is called a tree. ● Split points are called nodes. ● Ending nodes are called leaves. 1)Multiple cuts on X and Y in a big tree (only grows steps 1-4 shown) However, a danger exists - degrading of classifier performance by demanding perfect training separation, which is called “overtraining” all cuts for the decision tree
G.Ososkov e/pi separation CBM Collaboration Meeting, How to boost Decision Trees weights of misclassified events ● Given a training sample, boosting increases the weights of misclassified events (background wich is classified as signal, or vice versa), such that they have a higher chance of being correctly classified in subsequent trees. ● Trees with more misclassified events are also weighted, having a lower weight than trees with fewer misclassified events. ● Build many trees (~1000) and do a weighted sum of event scores from all trees 1-1 (score is 1 if signal leaf, -1 if background leaf). The renormalized sum of all the scores, possibly weighted, is the final score of the event. High scores mean the event is most likely signal and low scores that it is most likely background. Boosted Decision Trees (BDT) Boosting Algorithm has all the advantages of single decision trees, and less susceptibility to overtraining. Many weak trees (single-cut trees) combined (only 4 trees shown) boosting algorithm produces 500 weak trees together
G.Ososkov e/pi separation CBM Collaboration Meeting, e-/ π separation with boosted decision tree BDT output Result for the BDT classifier: pion supression is 2180 for 90% electron efficiency Cut = 0,77
G.Ososkov e/pi separation CBM Collaboration Meeting, Summary and outlook Comparative study of e/pi separation methods was accomplished for - 1D cut methods - photon cluster counting - ordered statistics of dE/dx - default Neural Net classifier - Neural Net classifiers with input of - ordered statistics - probabilities of ordered statistics - Boosted Decision Tree classifier The BDT shows the best performance Outlook: - - Correct simulation parameters in order to obtain better correspondence to experimental results - Facilitate the input for NN and BDT by approximations of cumulative distributions - Stability and robustness study for NN and BDT classifiers - Test other classifiers from TMVA ( Toolkit for MultiVariate data Analysis with ROOT)
G.Ososkov e/pi separation CBM Collaboration Meeting, P.Nevski’s comment related to the practical aspects of pi/e rejection: experimental factors For experiments like CBM one should consider not only a rejection procedure, as it is, but it is necessary to take into account its robustness to such experimental factors as calibration of measurements, pile up of signals etc. Since these factors are different for each station, measurements are taken in different conditions and, inevitably, are heterogeneous. That leads to serious violations of all neural network methods That leads to serious violations of all neural network methods. only parameter - threshold Cluster counting methods, as it was shown in practice, is quite stable to its only parameter - threshold and, therefore, it is very robust. However that is the subject for more detailed study. The final question of a mathematician who is not experienced in TRD design and elecronics: - if pion supression, as 500, is enough; - if photon cluster counting is cheaper to carry it out than existing approach, then why do not consider the “N-method” as a real alternative to “Q-methods” despite of all improvements shown above?
G.Ososkov e/pi separation CBM Collaboration Meeting, Thanks for your attention!