Data exchange, data merging and common storage format for NEWS Valeri Tioukov 13/06/2019
Proposed data flow (26/07/2016) European scanning system Japanese scanning system Monte Carlo Convertor Convertor Common Data Format Common Analysis tools Compatible results LNGS 13/06/2019
Current data flow European scanning system Japanese scanning system Presented in 2017 Current data flow European scanning system Japanese scanning system Monte Carlo EU data JP data MC data LNGS 13/06/2019
Presented in 2017 DMDS - Revision 5: /dm2root/src/libDMRoot .. DMRCluster.cpp DMRCluster.h DMRGrain.cpp DMRGrain.h DMRImage.cpp DMRImage.h DMRLog.cpp DMRLog.h DMRMicrotrack.cpp DMRMicrotrack.h DMRRun.cpp DMRRun.h DMRRunHeader.cpp DMRRunHeader.h DMRView.cpp DMRView.h DMRViewHeader.cpp DMRViewHeader.h DMRootLinkDef.h Makefile libDMRoot.h libDMRoot.sln libDMRoot.vcproj Presented in 2017 First version of the data exchange library is ready and available here: http://emulsion.na.infn.it/svn/DMDS/ The project is called dm2root and contain one library libDMRoot LNGS 13/06/2019
Next pass toward common(?) data format (4 days ago) New, additional storage library (almost exact copy of libDMRoot) Question: Why you decide to make a copy of libDMRoot and not use it directly as it was suggested? Answer: We have different scanning system, some parameters and some algorithms are different, so we need our own classes => the data format should be different LNGS 13/06/2019
We have 3 different scanning systems in Italy Polarizing system - Color system – GrayScale system Scanning systems differ Some parameters differ Some algorithms differ Do we need different storage libraries with different formats for them? Following this logics each microscope could have it’s own format: libDMR_Color libDMR_Polar libDMR_Gs …. And if we made some modification in the scanning system? libDMR_Polar_configuration1 Create a NEW, DIFFERENT storage format due to small difference in the algorithms or parameters - Is it really a good idea? LNGS 13/06/2019
we use common storage format: Transient data model Color system => Color classes and Color algorithms Polar system => Polar classes and Polar algorithms B/W system => BW classes and BW algorithms (Japanese system => Japanese classes and Japanese algorithms) OK! Data Processing: BUT we use common storage format: Are different! Data Storage Color system: DMRCluster { x,y,z ……. Icol != 0 Ipol = 0 } Polar system: DMRCluster { x,y,z ……. Icol = 0 Ipol != 0 } BW system: DMRCluster { x,y,z ……. Icol = 0 Ipol = 0 } Japanese system: DMRCluster { x,y,z ……. Icol = 0 Ipol = 0 IellipticFit != 0 } Persistent data model Common parameters Color specific Polar specific Japanese specific That’s all! – no any storage classes duplication LNGS 13/06/2019
Color SS Polar SS Elliptical SS Phase Contrast SS Data Production Calg Palg Ealg PHalg Preprocessing Writing to Common format Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck, Analysis Palg Ealg PHalg Any other algorithm Calg Direct data check LNGS 13/06/2019
Common format - what is this? Is it a raw format? Is it a final format? Is it almost final format? Not really Not necessary Not necessary Most important properties of any common format: The information is sufficient to perform the complete data analysis It is documented and clear to everybody All relevant experimental data available in this format LNGS 13/06/2019
SS Data Production Preprocessing Example 1 Raw images only Legal common format, but very inconvenient: Huge files Slow processing SS Data Production Preprocessing Assume that no any algorithms available here Writing to Common format Raw Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Clustering Other processing LNGS 13/06/2019
SS Data Production Preprocessing Clustering algorithm is available Example 2 Images related to clusters, clusters itself Already good common format SS Data Production Clustering Preprocessing Clustering algorithm is available Writing to Common format Cluster Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing LNGS 13/06/2019
Data providers and data consumers in collaboration Providers: scanning, preprocessing, writing data into common format Consumers: reading data from common format, postprocessing, analysis Nagoya produce Elliptical data Napoli produce Polarization data ..etc… Napoli consume Elliptical data Nagoya consume Polarisation data Machine learning can consume any data New algorithms can be developed by the both data providers and data consumers Once the new algorithm is available, tested and work fine, you may want to make it’s results available to Collaboration and provide them as a part of a Common Format LNGS 13/06/2019
Clustering and graining Example 3 Images related to clusters, clusters and grains Even better common format SS Data Production Clustering and graining Preprocessing Writing to Common format Cluster Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing LNGS 13/06/2019
Barshift polarization analysis Example 4 Images related to clusters, clusters and grains Even better common format SS Data Production Clustering and graining Preprocessing Writing to Common format Cluster Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing Barshift polarization analysis Code exported to SVN – becomes available to Collaboration LNGS 13/06/2019
Barshift polarization analysis Example 5 (near future) Images related to clusters, clusters and grains, microtracks Rich common format SS Data Production Clustering and graining Preprocessing Writing to Common format Cluster Images Clusters Grains Microtracks Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing Barshift polarization analysis Code exported to SVN – becomes available to Collaboration LNGS 13/06/2019
Do not have this information? => Do not fill it In libDMRoot we already prepared the structures for most of basic objects Data provider have some information on preprocessing phase? => Fill it Do not have this information? => Do not fill it Not necessary to wait when the complete and ideal processing chain is established for starting the use of a common format It is not too early to start now - it’s quite late, because we need to perform the common analysis immediately Cluster Images Clusters Grains Grains Microtracks Etc. Common format If some structure is not enough to accommodate any information we can extend it What is missed in DMRCluster to fit Japanese Elliptical data? Different algorithms can produce some difference in result. Two solutions: keep in data the information (flag) about the algorithm applied Export algorithm itself in a way that other people can run it on Common Format data LNGS 13/06/2019
Practical steps to do Define in the libDMRoot extensions necessary (if any) to fit Nagoya data Extend libDMRoot Drop libJPData to avoid the code duplication and start to export data in libDMRoot - them are practically identical now, so this is straightforward libDMRoot – storage library is conservative and should be updated only when it is really necessary and in agreement with other data providers Instead for processing (not for storage) any new classes and new libraries can be created both on preprocessing and on postprocessing level The only constraints are: preprocessing algorithms must be able to write data into common format (directly or via converter) Postprocessing – reads data from common format LNGS 13/06/2019
What is the data merging? Sample was scanned in Japan => Elliptic selection done Same sample scanned in Napoli => Polarization analysis done To merge data we do not need to put them together into the same tree To merge data we do not need to put them together into the same file We need to find one by one correspondence between clusters (grains) obtained by both systems The basic result of the data merging is this table together with both original data files LNGS 13/06/2019
Example of merged data File B: Grains (clusters) in common format File A: Grains (clusters) in common format File B: Grains (clusters) in common format Example of merged data grB ViewB grA ViewA ViewA,GrA <-> viewB,grB File 3: list of matched couples LNGS 13/06/2019
The same area scanned on two Napoli systems Sample A (NSSna2) Sample B (NSSna1) C60keV_test/color_camera/dm_ tracks.dm.root 5 mm x 1mm area scanned 1703 views About 1300000 grains C60keV_test/polarized_light/dm _tracks.dm.root 5mm x 1mm area scanned 1827 views About 900000 grains LNGS 13/06/2019
dmalign.Aff: 1.000516 0.012561 -0.013973 0.998641 11.60 22.77 Global alignment Result of the Global alignment procedure: 5 mm2 vs 5 mm2 325000 considences found with +-1.5 μm acceptance About of 2/3 of them are in the peak core Matching accuracy (3σ of the peak) X: +- 1.1 μm Y: +- 0.65 μm LNGS 13/06/2019
Merging procedure - one by one correspondence is established dmmerge –par=align.rootrc Input: a.dm.root - scanning data b.dm.root – scanning data a_b.cp.root – couples (result of dmalign) Output: a_b.mrg.root – with “match” tree made of selected branches for selected couples of both scanned samples LNGS 13/06/2019
The signal is selected here root -l check_mrg.C TCut cut("cut","abs(s2.eX-s1.eX)<0.4&&abs(s2.eY-s1.eY)<0.25"); //peak The signal is selected here LNGS 13/06/2019
Effect of polarization on clusters direction and the barycenter shift Peak couples Ag40nmNP Without filter no dependence of the clusters direction from the polarization No barshifts>0.04 With filter clear dependence of the clusters direction from the polarization Very few barshifts>0.04 For nanoparticles no corellation of the cluster angle and the barshift
Files sharing We got 100 Tb of disk space in the CNAF (computing center of INFN) WebDAV protocol is to be established for accessing this space Once it’s done all data providers will have write-access all data consumers – read access for data The request for access providing WebDAV was done several weeks ago Meanwhile in Napoli we export data using our group Apache web server to make available it for downloading Is it possible also for Japanese data? Some Cloud solution? LNGS 13/06/2019