Unfolding Problem: A Machine Learning Approach

Unfolding Problem: A Machine Learning Approach
Nikolai Gagunashvili School of Computing, University of Akureyri, Iceland 11/21/2018

Contents Introduction Basic equation System identification
The unfolding procedure A numerical example Conclusions References 11/21/2018

Introduction 11/21/2018

Introduction (cont.) 11/21/2018

The unfolding problem is an underspecified problem.
Introduction (cont.) The unfolding problem is an underspecified problem. Any approach to solve the problem requires a priori information about the solution. Different methods of unfolding differ, directly or indirectly, through the use of this a priori information. 11/21/2018

Dimensional independent unfolding with D-optimal system identification
Basic equation We will use the linear model for a transformation of a true distribution to the measured one where f= (f1,f2,….,fm)T is vector of an experimentally measured histogram content, φ= (φ1, φ2,..,φn)T is vector of some true histogram content, ε = (ε1, ε2,.....εm)T is vector of random residual components with mean value E ε = 0, and a diagonal variance matrix Var ε = are the statistical errors of the measured distribution. 11/21/2018

Dimensional independent unfolding with D-optimal system identification
is the matrix of transformation that transform true distribution to experimentally measured distribution 11/21/2018

Basic equation (cont.) 11/21/2018

Basic equation (cont.) A Least Squares Method can give an estimator for the true distribution where , the estimator, is called the unfolded distribution. The full matrix of errors of the unfolded distribution is given by according to the Least Squares Method. 11/21/2018

There are two stage of solving unfolding (inverse) problem:
1. Investigation and calculation matrix P is known as problem of identification system and may be defined as the process of determining a model of a dynamic system using observed input-output data. 2. Solution of equation (1) that gives unfolded function with complete matrix of statistical errors of 11/21/2018

System identification or calculation of matrix P
The Monte-Carlo simulation of a set-up can be used to get input-output data (training sample). Monte-Carlo Simulations • • • • To regularize the solution of the unfolding problem, let us use training set with a priori known from theory, or from other experiments distributions 11/21/2018

Assume we have q input distributions in training sample, and present them as matrix where each row represents one input training histogram. 11/21/2018

For each i-th row of the matrix P we can write the equation is a vector of i-th components of output distributions, where is i-th bin content of output distribution for the j-th training distribution is a vector of random residuals with mean value and variance where is the statistical error of i-th bin of output distribution for the j-th training distribution. 11/21/2018

Least Squares Method gives and estimator for pi Columns of matrix can correlate with each other It means that transformation of the training distribution to the i-th bin of output distribution can be parameterized by subset of elements of the row pi. May be more than one subset that describes this transformation in sufficiently good manner. Example: 11/21/2018

Thus for each i-th output bin we will have Ni candidate rows, and for all output bins candidate matrices P. We need to choose a matrix P that is good, or optimal , in some sense. The most convenient in this case is the criteria of D-optimality that is related to the minimization of determinant of full matrix of errors of unfolded distribution 11/21/2018

Main advantages D-optimization Minimizes the volume of the confidence ellipsoid for an unfolded distribution. There are many computer algorithms for optimization. 11/21/2018

Basic equation solution
A Least Squares Method give an estimator for the true distribution where , the estimator, is the unfolded distribution. The full matrix of errors of the unfolded distribution is given by according to the Least Squares Method. 11/21/2018

The unfolding procedure
Initialization Define a binning for experimental data. Define a binning for the unfolding distribution. System identification Choose a set of training distributions. Calculate the D-optimal matrix P. Basic equation solution Calculate unfolded distribution with full matrix of errors Test of goodness of the unfolding Fit unfolded distribution and compare the experimental distribution and the reconstructed simulated distribution The holdout method, cross-validation method, bootstrap method. 11/21/2018

Initialization Define a binning for experimental data 11/21/2018

Initialization Define a binning for the unfolded distribution 11/21/2018

Selection criteria for set of training distributions — A training distribution has corresponding the output distribution that can be compared with the experimentally measured distribution by χ2 test. — Let us select for identification a training distribution that has a corresponding output distribution satisfying a χ2 <a selection criteria (the parameter a defines a significant level p(a) for the comparison of two histograms). Application of this selection criteria Increase the number of candidate matrix P Decrease value of determinant of full matrix of errors Decrease value of statistical errors of unfolded distribution. 11/21/2018

Selection criteria for set of training distributions Experimental distribution Monte-Carlo Set of training distributions Set of output distributions 11/21/2018

A numerical example We take a true distribution with parameters
An experimentally measured distribution is defined as where the acceptance and is the detector resolution function with σ=1.5. 11/21/2018

An example of the true distribution φ(x), the acceptance function A(x) and the resolution function R(x,10) 11/21/2018

An example of the measured distribution f
11/21/2018

11/21/2018

A numerical example (cont.)
A histogram of the measured distribution was obtained by simulating 104 events. Random parameters are generated uniformly on the intervals [1,3] for A1; [0.5,1.5] for A2; [8,12] for B1 ; [10,18] for B2; [0.5,1.5] for C1; [0.5,1.5] for C2; which define a training distribution for identification. 11/21/2018

Training distributions generated for system identification and an unfolded distribution for different χ2 cut 11/21/2018

Conclusions The proposed method use of a set of a priori distributions for identification to obtain stable solution of unfolding problem. D-optimization and the application of the Least Squares Method gives the possibility of minimizing the statistical errors of the solution. χ2 selection criteria permits to decrease the possible bias of the procedure. The procedure has no restriction due to dimensionality of the problem. The procedure can be applied for solving unfolding problem with smooth solution as well as non-smooth solution. Based only on a statistical approach the method has a good statistical interpretation. 11/21/2018

References V.Blobel, Unfolding methods in high-energy physics experiments, CERN (1985). V.P.Zhigunov, Improvement of resolution function as an inverse problem, Nucl. Instrum. Meth. 216(1983)183. A.Höcker,V.Kartvelishvili, SVD approach to data unfolding, Nucl. Instrum. Meth. A372(1996)469. N.D.Gagunashvili, Unfolding of true distributions from experimental data distorted by detectors with finite resolutions, Nucl. Instrum. Meth. A451(1993)657. N.D.Gagunashvili, Unfolding with system identification, Proceedings of PHYSTAT 05, Oxford,UK. 11/21/2018

Unfolding Problem: A Machine Learning Approach

Similar presentations

Presentation on theme: "Unfolding Problem: A Machine Learning Approach"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unfolding Problem: A Machine Learning Approach

Similar presentations

Presentation on theme: "Unfolding Problem: A Machine Learning Approach"— Presentation transcript:

Similar presentations

About project

Feedback