Presentation is loading. Please wait.

Presentation is loading. Please wait.

פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4.

Similar presentations


Presentation on theme: "פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4."— Presentation transcript:

1 פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4

2 Simplest variable combination: diagonal cut

3 Combining variables Many variables that weakly separate signal from background Often correlated distributions Complicated to deal with or to use in a fit Easiest to combine into one simple variable Fisher discriminant:

4 Neural networks Continuum MC BB MC BB & qq Background MC Signal MC

5 Input variables for neural net Legendre Fisher Log(  z)cos  T Log(K-D DOCA) Signal BB bgd cc+uds (BtgElectronTag & BtgMuonTag) Lepton tagging

6 Uncorrelated, (approximately) Gaussian- distributed variables “Gaussian-distributed” means the distribution of v is How to combine the information? Option 1: V = v 1 + v 2 Option 2: V = v 1 – v 2 Option 3: V =  1 v 1 +  2 v 2 What are the best weights  i ? How about  i  = ( – ) = difference between the signal & background means Signal Background Signal Background v1v1 v2v2

7 Incorporating spreads in v i – > –, but v 2 has a smaller spreads and more actual separation between S and B  i = ( – )/((  i s ) 2 + (  i b ) 2 ) where (  i s ) 2 = ) 2 > =  e (v ie s – ) 2 / N is the RMS spread in the v i distribution of a pure signal sample (similarly defined for  i b ) You may be familiar with the form ) 2 > = + 2 > – 2 > =  2 Signal Background Signal Background v1v1 v2v2

8 Linearly correlated, Gaussian-distributed variables Linear correlation: – = 0 + c v 2 –(  1 ) 2 independent of v 2  i = ( – ) / ((  i s ) 2 + (  i b ) 2 ) doesn’t account for the correlation Recall (  i s ) 2 = ) 2 > Replace it with the covariance matrix C ij s = ) (v j s – ) >  i =  j ( – ) (C ij s + C ij b )  Fisher discriminant: F   j  i v i Inverse of the sum of the S+B covariance matrices

9 Fisher discriminant properties Best S-B separation for a linearly correlated set of Gaussian-distributed variables Non-Gaussian-ness of v is usually not a problem… There must be a mean difference –  0 Need to calculate  i coefficients using (correctly simulated) Monte Carlo (MC) signal and background samples Should validate using control samples (true for any discriminant) Take abs value

10 More properties F is more Gaussian than its inputs (virtual calorimeter example) Central limit theorem: –If x j (j=1, …n) are independent random variables with means and variances  j 2, then for large n, the sum  j x j is a Gaussian-distributed variable with mean  j and variance  j  j 2 F can usually be fit with 2 Gaussians or a bifurcated Gaussian A cut on F corresponds to an (n-1)-diemensional plane cut through the n- dimensional variable space

11 Nonlinear correlations Linear methods (Fisher) are not optimal for such cases May fail altogether if there is no S-B mean difference

12 Artificial neural networks “Complex nonlinearity” Each neuron –takes many inputs –outputs a response function value The output of each neuron serves as input for the others Neurons divided among layers for efficiency The weight w ij l between neuron i in layer l and neuron j in layer l+1 is calculated using a MC “training sample”

13 Response functions Neuron output =  (inputs, weights) =  (  (inputs, weights))

14 Common usage  = sum in hidden & output layer  = linear in output layer  = tanh in hidden layer

15 Training (calculating weights) Event a (a=1…N) has input variable vector x = (x 1 …x n var ) For each event, calculate the deviation from the desired value (0 for background, 1 for signal) Calculate the error function for random values w of the weights

16 … Training Change the weights so as to cause the most steep decline in E: “online learning”: remove the sums –Requires a randomized training sample

17 What architecture to use? Weierstrass theorem: for a multilayer perceptron, 1 hidden layer is sufficient to approximate a continuous correlation function to any precision, if the number of neurons in the layer is high enough Alternatively: several hidden layers and less neurons may converge faster and be more stable Instability problems: –output distribution changes with different samples

18 What variables to use? Improvement with added variables: Importance of variable i:

19 More info A cut on a NN output = non-linear slice through n-dimensional space NN output shape can be (approximately) Gaussianized: q  q’ = tanh   (q – ½ (q max +q min ) / ½(q max – q min )]


Download ppt "פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4."

Similar presentations


Ads by Google