פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4.

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Beyond Linear Separability
Slides from: Doug Gray, David Poole
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Kriging.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Artificial Neural Networks
Observers and Kalman Filters
Artificial Neural Networks - Introduction -
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Introduction to Statistics and Machine Learning 1 How do we: understandunderstand interpretinterpret our measurements How do we get the data for our measurements.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Radial Basis-Function Networks. Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response.
I welcome you all to this presentation On: Neural Network Applications Systems Engineering Dept. KFUPM Imran Nadeem & Naveed R. Butt &
Neural Networks Marco Loog.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Basics of regression analysis
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Lecture II-2: Probability Review
Radial Basis Function (RBF) Networks
Radial Basis Function Networks
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
Artificial Neural Networks
880.P20 Winter 2006 Richard Kass Propagation of Errors Suppose we measure the branching fraction BR(Higgs  +  - ) using the number of produced Higgs.
Biointelligence Laboratory, Seoul National University
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
DIGITAL IMAGE PROCESSING Dr J. Shanbehzadeh M. Hosseinajad ( J.Shanbehzadeh M. Hosseinajad)
Multi-Layer Perceptrons Michael J. Watts
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Michigan REU Final Presentations, August 10, 2006Matt Jachowski 1 Multivariate Analysis, TMVA, and Artificial Neural Networks Matt Jachowski
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Radial Basis Function Networks:
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Non-Bayes classifiers. Linear discriminants, neural networks.
N. Saoulidou & G. Tzanakos1 ANN Basics : Brief Review N. Saoulidou, Fermilab & G. Tzanakos, Univ. of Athens.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Perceptrons Michael J. Watts
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Geology 6600/7600 Signal Analysis 04 Sep 2014 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.
LCFI physics studies meeting, 7 th December 04Sonja Hillertp. 1 Charge efficiency and leakage rates  purity vs efficiency plots give only part of the.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Machine Learning Supervised Learning Classification and Regression
Deep Feedforward Networks
Data Mining, Neural Network and Genetic Programming
DEPARTMENT: COMPUTER SC. & ENGG. SEMESTER : VII
Christoph Rosemann and Helge Voss DESY FLC
Multi-dimensional likelihood
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Neural Networks Advantages Criticism
Synaptic DynamicsII : Supervised Learning
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב

Simplest variable combination: diagonal cut

Combining variables Many variables that weakly separate signal from background Often correlated distributions Complicated to deal with or to use in a fit Easiest to combine into one simple variable Fisher discriminant:

Neural networks Continuum MC BB MC BB & qq Background MC Signal MC

Input variables for neural net Legendre Fisher Log(  z)cos  T Log(K-D DOCA) Signal BB bgd cc+uds (BtgElectronTag & BtgMuonTag) Lepton tagging

Uncorrelated, (approximately) Gaussian- distributed variables “Gaussian-distributed” means the distribution of v is How to combine the information? Option 1: V = v 1 + v 2 Option 2: V = v 1 – v 2 Option 3: V =  1 v 1 +  2 v 2 What are the best weights  i ? How about  i  = ( – ) = difference between the signal & background means Signal Background Signal Background v1v1 v2v2

Incorporating spreads in v i – > –, but v 2 has a smaller spreads and more actual separation between S and B  i = ( – )/((  i s ) 2 + (  i b ) 2 ) where (  i s ) 2 = ) 2 > =  e (v ie s – ) 2 / N is the RMS spread in the v i distribution of a pure signal sample (similarly defined for  i b ) You may be familiar with the form ) 2 > = + 2 > – 2 > =  2 Signal Background Signal Background v1v1 v2v2

Linearly correlated, Gaussian-distributed variables Linear correlation: – = 0 + c v 2 –(  1 ) 2 independent of v 2  i = ( – ) / ((  i s ) 2 + (  i b ) 2 ) doesn’t account for the correlation Recall (  i s ) 2 = ) 2 > Replace it with the covariance matrix C ij s = ) (v j s – ) >  i =  j ( – ) (C ij s + C ij b )  Fisher discriminant: F   j  i v i Inverse of the sum of the S+B covariance matrices

Fisher discriminant properties Best S-B separation for a linearly correlated set of Gaussian-distributed variables Non-Gaussian-ness of v is usually not a problem… There must be a mean difference –  0 Need to calculate  i coefficients using (correctly simulated) Monte Carlo (MC) signal and background samples Should validate using control samples (true for any discriminant) Take abs value

More properties F is more Gaussian than its inputs (virtual calorimeter example) Central limit theorem: –If x j (j=1, …n) are independent random variables with means and variances  j 2, then for large n, the sum  j x j is a Gaussian-distributed variable with mean  j and variance  j  j 2 F can usually be fit with 2 Gaussians or a bifurcated Gaussian A cut on F corresponds to an (n-1)-diemensional plane cut through the n- dimensional variable space

Nonlinear correlations Linear methods (Fisher) are not optimal for such cases May fail altogether if there is no S-B mean difference

Artificial neural networks “Complex nonlinearity” Each neuron –takes many inputs –outputs a response function value The output of each neuron serves as input for the others Neurons divided among layers for efficiency The weight w ij l between neuron i in layer l and neuron j in layer l+1 is calculated using a MC “training sample”

Response functions Neuron output =  (inputs, weights) =  (  (inputs, weights))

Common usage  = sum in hidden & output layer  = linear in output layer  = tanh in hidden layer

Training (calculating weights) Event a (a=1…N) has input variable vector x = (x 1 …x n var ) For each event, calculate the deviation from the desired value (0 for background, 1 for signal) Calculate the error function for random values w of the weights

… Training Change the weights so as to cause the most steep decline in E: “online learning”: remove the sums –Requires a randomized training sample

What architecture to use? Weierstrass theorem: for a multilayer perceptron, 1 hidden layer is sufficient to approximate a continuous correlation function to any precision, if the number of neurons in the layer is high enough Alternatively: several hidden layers and less neurons may converge faster and be more stable Instability problems: –output distribution changes with different samples

What variables to use? Improvement with added variables: Importance of variable i:

More info A cut on a NN output = non-linear slice through n-dimensional space NN output shape can be (approximately) Gaussianized: q  q’ = tanh   (q – ½ (q max +q min ) / ½(q max – q min )]