Event Reconstruction and Particle Identification Yong LIU 刘 永 The University of Alabama PRC-US workshop Beijing, June 11-18, 2006 MiniBNE On Behalf of the MiniBooNE Collaboration
MiniBooNE Event Reconstruction and Particle Identification Y.Liu, D.Perevalov, I.Stancu University of Alabama S.Koutsoliotas Bucknell University R.A.Johnson, J.L.Raaf University of CincinnatiUniversity of Cincinnati T.Hart, R.H.Nelson, M.Tzanov M.Wilking, E.D.Zimmerman University of Colorado A.A.Aguilar-Arevalo, L.Bugel L.Coney, J.M.Conrad, Z. Djurcic, J.M.Link K.B.M.Mahn, J.Monroe, D.Schmitz M.H.Shaevitz, M.Sorel, G.P.Zeller Columbia UniversityColumbia University D.Smith Embry Riddle Aeronautical UniversityEmbry Riddle Aeronautical University L.Bartoszek, C.Bhat, S.J.Brice B.C.Brown, D. A. Finley, R.Ford, F.G.Garcia, P.Kasper, T.Kobilarcik, I.Kourbanis, A.Malensek, W.Marsh, P.Martin, F.Mills, C.Moore, E.Prebys, A.D.Russell, P.Spentzouris, R.J.Stefanski, T.Williams Fermi National Accelerator LaboratoryFermi National Accelerator Laboratory D.C.Cox, T.Katori, H.Meyer, C.C.Polly R.Tayloe Indiana UniversityIndiana University G.T.Garvey, A.Green, C.Green, W.C.Louis, G.McGregor, S.McKenney G.B.Mills, H.Ray, V.Sandberg, B.Sapp, R.Schirato, R.Van de Water N.L.Walbridge, D.H.White Los Alamos National LaboratoryLos Alamos National Laboratory R.Imlay, W.Metcalf, S.Ouedraogo, M.O.Wascko Louisiana State UniversityLouisiana State University J.Cao, Y.Liu, B.P.Roe, H.J.Yang University of MichiganUniversity of Michigan A.O.Bazarko, P.D.Meyers, R.B.Patterson, F.C.Shoemaker, H.A.Tanaka Princeton UniversityPrinceton University P.Nienaber Saint Mary's University of MinnesotaSaint Mary's University of Minnesota E.Hawker Western Illinois UniversityWestern Illinois University A.Curioni, B.T.Fleming Yale UniversityYale University MiniBooNE Collaboration
MiniBooNE Event Reconstruction and Particle Identification Global solar data and KamLAND S. Ahmed et al., Phys. Rev. Lett. 92, (2004) Super-Kamiokande and K2K data G.Fogli et al., Phys. Rev. D 67, (2003) LSND A. Aguilar et. al., Phys. Rev. D 64, (2001) The primary physics goal of MiniBooNE is to definitely confirm or rule out the oscillation signal seen by LSND experiment Total excess = 87.9±22.4±6.0 (3.8σ)
MiniBooNE Event Reconstruction and Particle Identification To achieve the MiniBooNE physics goal Particle Identification performance efficiency contamination is required in BooNE proposal (Dec. 7, 1997) and accordingly very good resolution of by Event Reconstruction are desired. position direction mass / energy Poor event reconstruction => Poor Particle Identification
MiniBooNE Event Reconstruction and Particle Identification 12-meter diameter spherical tank 1280 PMT in inner region 240 PMT in outer veto region 950,000 liters ultra pure mineral oil
MiniBooNE Event Reconstruction - Overview Reconstruct what? Position (x, y, z, t) Direction (ux, uy, uz) Energy/mass E/m How to reconstruct? Light model Time likelihood - position Charge likelihood – direction Reconstruction Performance Position resolution Direction resolution Energy/Pi0 mass resolution
MiniBooNE Event Reconstruction – light model θcθc η Directional Cherenkov light ρ Isotropic Scintilation light φ Point-like light source model Event track (x y z t) (ux uy uz) (x i y i z i t i q i ) riri Predicted charge cosη f(cosη) Cerenkov light - directional Scintillation light - isotopic Assume Point-like light source model for e Model input parameter 1.Cerenkov angular distribution 2.PMT angular response 3.Cerenkov attenuation length 4.Scintillation attenuation length 5.Relative quantum efficiency Minimize with respective to Cerenkov/Scintillation flux
MiniBooNE Event Reconstruction - Charge Likelihood The probability of measuring a charge q for a predicted charge μ Three method to extract the charge likelihood A.Fill 2-D histogram H(q, μ), normalize q distribution for eachμbin, get –log versus μ for each q bin C. Start from one PE charge response curve, generate P(q;n), assume Possion distribution, calculate P(q;μ), take –log B. From hit/no-hit probability minimization procedure, get H(q, μ ), then same As A.
MiniBooNE Event Reconstruction – Time likelihood 1.Corrected time 2. Cerenkov light t corr (i) distribution 3. Scintillation light t corr (i) distribution 4. Input: Cerenkov light – t 0 cer,σ cer Scintillation light – t 0 sci,σ sci,τ sci 5. Total negative log time likelihood
MiniBooNE Event Reconstruction –Timing parameter Cerenkov: look at hits in Cerenkov cone Scintillation: look at hits in backward direction Get t corr =t corr (μ,E), fit to CER and SCI T(t corr ), iteration
MiniBooNE Event Reconstruction – process chart x i y i z i t i q i x = ∑( x i q i ) / ∑q i t = ∑ q i ( t i – | x i – x |/ c ) / ∑q i Initial guess Fast fit TLLK x y z t dx = ∑q i (x i -x) /|x i -x| ux = dx / |dx| d=R-|x| E=Qf(d) CER = c1 E SCI = c2 E Full fit TLLK+QLLK x y z t ux uy uz d=R-|x| E=Qf(d) CER = c1 E SCI = c2 E Flux fit TLLK+QLLK Cer Sci flux Trak fit TLLK+QLLK Track length Pi0 fit Step 1 x1=x y1=y z1=z t1=t ux1=ux uy1=uy uz1=uz ux2 uy2 uz2 Cer1 Cer2 fcer e1 e2 s1 = s(e1) s2 = s(e2) x1 y1 z1 t1 fcer Θ1 φ1 s1 Θ2 φ2 s2 x y z t Pi0 fit Step 2 sci1 = Cse e1 sci2 = Cse e2 Cer1 Θ1 φ1 s1 Cer2 Θ2 φ2 s2 Pi0 fit Step 3 e1 = Cer1 / Cce e2 = Cer2 / Cce Cer1 Cer2 Sci1 Sci2 Pi0cosine(γ1 γ2) e1 e2 Pi0mass Calibrated data
MiniBooNE Event Reconstruction - performance P r e l i m i n a r y
MiniBooNE Event Reconstruction - performance P r e l i m i n a r y
ParticleID – do what? Signal Events Background Events ParticleID - how to do? Variable - Construction and selection Algorithm - Simple cuts/ANN/Boosting ParticleID – reliable and powerful? Input – variable distribution and correlation Data/MC agree Output Data/MC agree The performance MiniBooNE ParticleID - Overview
For ν e appearance search in MiniBooNE Signal = oscillationν e CCQE events Background = everything else Oscillation sensitivity study shows the most important backgrounds A. Intrinsic ν e from K +, K 0 and μ + decay - indistinguishable from signal C. ν μ CCQE B. NC π o D. Δ radiative decay ν μ + n/p ν μ + n/p + π o Δ N + γ ν μ + n μ - + p MiniBooNE ParticleID – Signal and Background
MiniBooNE ParticleID – π 0 misID cases can be mis-identified as electron due to some physics High energy Pi0, Lorentz boost, two gamma direction close Very asymmetric Pi0 decay, one ring is too small Pi0 close to tank wall, one gamma convert behind PMTs reason and detector limitation
ParticleID basically based on event topology e μ πoπo Real Data Event Display MiniBooNE ParticleID
How to extract event topology from a set of PMT hits information An Event = {(x k, y k, z k ), t k, Q k } k = 1, 2, …, NTankHits What we know is actually the space and time distribution of charge The event topology is characterized by charge/hits fraction in space/time bins θ {(x k, y k, z k ), t k, Q k } rkrk (x, y, z, t) (ux, uy, uz) dt k = t k – r k / c n - t Point-like model θcθc s MiniBooNE ParticleID - space-time information
MiniBooNE ParticleID – Construct input variables Binning cosθin relative to event direction - record hits/PMT number, measured/predicted charge, time/charge likelihood in each cos θ bin Binning corrected time - record hits number, measured/predicted charge, time/charge likelihood in each corrected time bin Binning ring sharpness - record hits/PMT number, measured/predicted charge, time/charge likelihood in each ring sharpness bin Take physically meaningful ratio in certain bin and combination of different bins Dimensionless quantity is preferred How to construct the ParticleID variables
Reconstruced physical observables: - e.g.π o mass, energy, track length and Cerenkov/scintillation light flux, production angle, etc. Reconstructed geometrical quantities: - e.g. radius r, u· r, and distance along track to wall, etc. Difference of likelihood between different hypotheses fitting: - electron/muon/pi0 fitting Other ParticleID variables These variables are very powerful ! MiniBooNE ParticleID – Other input variables
MiniBooNE ParticleID – Use how many inputs How many variables do we need? In ideal case, we can focus on the track instead of PMT hits. The least number of variables needed to describe one track is ~ 10 Radius r - from tank center to MGEP Angle α - between track and radial direction Energy E Light emision in unit length - parametrized by some parameters (x, y, z, t) (ux, uy, uz) α r At most, the number of variables we have {(x h, y h, z h ), t h, Q h } × NTank PMTs = 5 × NTank PMTs But they are highly correlated ! For π o events, twice as many variables needed.
MiniBooNE ParticleID – How to select variables How to select ParticleID variables: reliability & efficiency ParticleID algorithm training and test have to rely on Monte Carlo 1. Does the variable distribution Data/MC agree ? 2. Does the correlation between variables Data/MC agree? These two requirements ensure output Data/MC agree and so the reliability of ParticleID 3. Is the variable/combination powerful in separation Too many inputs may degrade the ParticleID performance Check with open box, cosmic ray calibration and NuMI data/MC The events number in each node of the trees can test correlation between variables, and can be used to look at data/MC comparison naturally. Energy/geometry variable dependence.
MiniBooNE ParticleID – Data/MC comparison The input data/MC comparison
MiniBooNE ParticleID - Data/MC comparison The input data/MC comparison
MiniBooNE ParticleID - Algorithm Choose which algorithm ANN=Artifical Neural NetworkSC=Simple CutsBDT=Boosted decision tree SC ANN BDT Variable Number Up to ~10 ~30 ~200 Parameter to fit 0 ~1000 ~10000 Control Parameters 0 ~10 ~3 Performance Not good good better Boosting is preferred in MiniBooNE to get better sensitivity but Simple Cuts method and ANN can provide cross check. Reasonably more input variables may result in higher performance, but less input variables may be more reliable.
MiniBooNE ParticleID Boosting Boosting – boosted decision tree 1.Boosting: how to split node – choose variable and cut Define GiniIndex = P (1 - P) ∑w (S+B) P =∑w S /∑w (S+B), w is event weight. For a pure background or signal node GiniIndex = 0 G = GiniIndex Father – ( GiniIndex LeftSon + GiniIndex RightSon ) 2. Boosting: how to generate tree – choose node to split Among the existing leaves, find the one which gives the biggest G and split it. Repeat this process to generate a tree of the chosen size. A. Generate tree Start here variable = i Cut = c i variable(i)< c i variable(i)>= c i Variable = k Cut = c k variable(k)< c k variable(k)>= c k For a given node, determine which variable and cut value maximizes
MiniBooNE ParticleID – Boosted decision trees B. Boost tree 3. Boosting: how to boost tree - Choose algorithm to change event weight Take ALL the events in a leaf as signal events if the polarity of that leaf is positive. Otherwise, take all the events as background events. Mark down those events which are misidentified. Reduce the weight of those correctly identified events while increase the weight of those misidentified evens. Then, generate the next tree. 4. Boosting: how to calculate output value - Sum over (polarity × tree weight) in all trees See B. Roe et al. NIM A543 (2005) 577 and references therein for detail C. Output Define polarity of a node: polarity = + 1 if signal is more than background polarity = - 1 if background is more than signal
MiniBooNE ParticleID Simple Cuts and Boosted Decision Tree Simple Cuts Generalization Decision Tree Improvement Boosted Decision Tree All events Var1<c1Var1>=c1 Var2>=c2Var2<c2 variable = i Cut = c 1 variable = 2 Cut = c 2 Var1 =c1 && var2<c2) Simple Cuts can be taken as One Tree, Few Variables, Few Nodes
MiniBooNE ParticleID - conclusion on algorithm Boosting is better than Artificial Neural Network Boosting performance is higher in many variable (>20) case and relatively insensitive to detector MC in comparison to ANN Cascade Boosting is better than non-Cascade Boosting Cascade Boosting training can improve 25~30% or even more relative to non-Cascade training, especially in low background contamination region Combine individual separation outputs can improve further By about 10~20% Some conclusions based on our past experience Cascade Boosting – build first boosting used as cut to select training events for second boosting, use second boosting
MiniBooNE ParticleID - Cascade Boosting 1 st boosting - cascade 2 nd boosting – cascade Combine individual outputs P r e l i m i n a r y
MiniBooNE ParticleID – Output data/MC comparison The output data/MC comparison
MiniBooNE ParticleID – Output data/MC comparison The output data/MC comparison
MiniBooNE ParticleID – How to play Event counting Energy or/and ParticleID spectrum fitting Optimize PID cuts to maximize After some precuts, do Energy spectrum fit PID output distribution fit Energy and PID two dimensional fit to get oscillation sensitivity
MiniBooNE Event Reconstruction and Particle Identification MiniBooNE Event Reconstruction provides Energy resolution ~ 14% Position resolution ~ 23cmDirection resolution ~ 6 o Pi0 mass resolution ~ 23 MeV/c 2 Based on the reconstruction information, with Boosted decision trees Cascade training Combining specialist algorithms a much better ParticleID than BooNE proposal required has been achieved! ~ 67% electron efficiency 1% Pi0 contamination < 0.1% muon contamination Conclusion