Data exchange, data merging and common storage format for NEWS

Slides:

Advertisements

Similar presentations

Sorting Really Big Files Sorting Part 3. Using K Temporary Files Given  N records in file F  M records will fit into internal memory  Use K temp files,

Advertisements

Corporate Property Automated Information System (CPAIS) Macro Walkthrough Guide for Excel Version 2003.

External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.

Cristiano Bozza – European Emulsion Scanning Group – Nagoya Jan Scanning data sharing through Central DB.

The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.

M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,

CFT Offline Monitoring Michael Friedman. Contents Procedure  About the executable  Notes on how to run Results  What output there is and how to access.

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Root, GLAST, and IDL Heather Kelly NASA/GSFC Emergent Corporation.

The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.

Graph-based Segmentation. Main Ideas Convert image into a graph Vertices for the pixels Vertices for the pixels Edges between the pixels Edges between.

Hosted Virtualization Lab Last Update Copyright Kenneth M. Chipps Ph.D.

Experience with analysis of TPC data Marian Ivanov.

Outline 3  PWA overview Computational challenges in Partial Wave Analysis Comparison of new and old PWA software design - performance issues Maciej Swat.

Event Data History David Adams BNL Atlas Software Week December 2001.

CSC 395 – Software Engineering Lecture 28: Classical Analysis -or- Do You Really Want to Do That?

Database Management Systems, R. Ramakrishnan and J. Gehrke 1 External Sorting Chapter 13.

DAQ Data Processing Chain Vasilis Vlachoudis Feb 2015.

AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.

Analysis trains – Status & experience from operation Mihaela Gheata.

Legal Issues Legal issues include copyright / intellectual property infringements, libel / defamation, disability discrimination and data protection. Any.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

FLORIDA PUBLIC HURRICANE LOSS MODEL V6.1 Computer Science February 2-4, Dr. Shu-Ching Chen School of Computing and Information Sciences Florida.

Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.

CS4432: Database Systems II Query Processing- Part 2.

Feb. 14, 2002DØRAM Proposal DØ IB Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) Introduction Partial Workshop Results DØRAM Architecture.

October Test Beam DAQ. Framework sketch Only DAQs subprograms works during spills Each subprogram produces an output each spill Each dependant subprogram.

Valeri Tioukov GS May ROOT-based framework for the reconstruction of emulsions data Set of tools for storage, interactive reconstruction and analysis.

Microscopic Stroboscopic Interferometer System (MSIS) Measurements David Garmire.

ANALYSIS TRAIN ON THE GRID Mihaela Gheata. AOD production train ◦ AOD production will be organized in a ‘train’ of tasks ◦ To maximize efficiency of full.

Update on the new reconstruction software F. Noferini E. Fermi center INFN CNAF EEE Analysis Meeting - 28/09/20151.

LHCb Software Week 25/11/99 Gonzalo Gracia Abril 1 r Status of Geant4 in LHCb. r Ideas on how to populate the LHCb Detector Description Data Base (LHCb.

1 Design and Implementation of a High-Performance Distributed Web Crawler Polytechnic University Vladislav Shkapenyuk, Torsten Suel 06/13/2006 석사 2 학기.

Mohammed I DAABO COURSE CODE: CSC 355 COURSE TITLE: Data Structures.

Certification of Reusable Software Artifacts

Graph-based Segmentation

A step-by-Step Guide For labels or merges

Modern Systems Analysis and Design Third Edition

Implementation Process

Visit for more Learning Resources

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Overview of the Belle II computing

Belle II Physics Analysis Center at TIFR

Moving to Epicor ERP version 10: Experiences so far

ALICE analysis preservation

System Programming and administration

Arjen Markus Deltares (previous name: WL | delft hydraulics)

Scientific LNGS: users point of view

History of compiler development

FIZZ Database General presentation.

Introduction to Computers

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Data Analysis in Particle Physics

Compiler Construction

University of Wisconsin-Madison

Part of the Multilingual Web-LT Program

Introduction to Systems Analysis and Design

Sharing of Eurostat predefined tables

Sharing of Eurostat predefined tables

CPSC-608 Database Systems

Cohesion and Coupling.

Modern Systems Analysis and Design Third Edition

Use Cases Simple Machine Translation (using Rainbow)

MapReduce: Simplified Data Processing on Large Clusters

IntroductionLecture 1: Basic Ideas & Terminology

Data analysis in Nagoya

Microscopes status Andrey Alexandrov.

Presentation transcript:

Data exchange, data merging and common storage format for NEWS Valeri Tioukov 13/06/2019

Proposed data flow (26/07/2016) European scanning system Japanese scanning system Monte Carlo Convertor Convertor Common Data Format Common Analysis tools Compatible results LNGS 13/06/2019

Current data flow European scanning system Japanese scanning system Presented in 2017 Current data flow European scanning system Japanese scanning system Monte Carlo EU data JP data MC data LNGS 13/06/2019

Presented in 2017 DMDS - Revision 5: /dm2root/src/libDMRoot .. DMRCluster.cpp DMRCluster.h DMRGrain.cpp DMRGrain.h DMRImage.cpp DMRImage.h DMRLog.cpp DMRLog.h DMRMicrotrack.cpp DMRMicrotrack.h DMRRun.cpp DMRRun.h DMRRunHeader.cpp DMRRunHeader.h DMRView.cpp DMRView.h DMRViewHeader.cpp DMRViewHeader.h DMRootLinkDef.h Makefile libDMRoot.h libDMRoot.sln libDMRoot.vcproj Presented in 2017 First version of the data exchange library is ready and available here: http://emulsion.na.infn.it/svn/DMDS/ The project is called dm2root and contain one library libDMRoot LNGS 13/06/2019

Next pass toward common(?) data format (4 days ago) New, additional storage library (almost exact copy of libDMRoot) Question: Why you decide to make a copy of libDMRoot and not use it directly as it was suggested? Answer: We have different scanning system, some parameters and some algorithms are different, so we need our own classes => the data format should be different LNGS 13/06/2019

We have 3 different scanning systems in Italy Polarizing system - Color system – GrayScale system Scanning systems differ Some parameters differ Some algorithms differ Do we need different storage libraries with different formats for them? Following this logics each microscope could have it’s own format: libDMR_Color libDMR_Polar libDMR_Gs …. And if we made some modification in the scanning system? libDMR_Polar_configuration1 Create a NEW, DIFFERENT storage format due to small difference in the algorithms or parameters - Is it really a good idea? LNGS 13/06/2019

we use common storage format: Transient data model Color system => Color classes and Color algorithms Polar system => Polar classes and Polar algorithms B/W system => BW classes and BW algorithms (Japanese system => Japanese classes and Japanese algorithms) OK! Data Processing: BUT we use common storage format: Are different! Data Storage Color system: DMRCluster { x,y,z ……. Icol != 0 Ipol = 0 } Polar system: DMRCluster { x,y,z ……. Icol = 0 Ipol != 0 } BW system: DMRCluster { x,y,z ……. Icol = 0 Ipol = 0 } Japanese system: DMRCluster { x,y,z ……. Icol = 0 Ipol = 0 IellipticFit != 0 } Persistent data model Common parameters Color specific Polar specific Japanese specific That’s all! – no any storage classes duplication LNGS 13/06/2019

Color SS Polar SS Elliptical SS Phase Contrast SS Data Production Calg Palg Ealg PHalg Preprocessing Writing to Common format Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck, Analysis Palg Ealg PHalg Any other algorithm Calg Direct data check LNGS 13/06/2019

Common format - what is this? Is it a raw format? Is it a final format? Is it almost final format? Not really Not necessary Not necessary Most important properties of any common format: The information is sufficient to perform the complete data analysis It is documented and clear to everybody All relevant experimental data available in this format LNGS 13/06/2019

SS Data Production Preprocessing Example 1 Raw images only Legal common format, but very inconvenient: Huge files Slow processing SS Data Production Preprocessing Assume that no any algorithms available here Writing to Common format Raw Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Clustering Other processing LNGS 13/06/2019

SS Data Production Preprocessing Clustering algorithm is available Example 2 Images related to clusters, clusters itself Already good common format SS Data Production Clustering Preprocessing Clustering algorithm is available Writing to Common format Cluster Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing LNGS 13/06/2019

Data providers and data consumers in collaboration Providers: scanning, preprocessing, writing data into common format Consumers: reading data from common format, postprocessing, analysis Nagoya produce Elliptical data Napoli produce Polarization data ..etc… Napoli consume Elliptical data Nagoya consume Polarisation data Machine learning can consume any data New algorithms can be developed by the both data providers and data consumers Once the new algorithm is available, tested and work fine, you may want to make it’s results available to Collaboration and provide them as a part of a Common Format LNGS 13/06/2019

Clustering and graining Example 3 Images related to clusters, clusters and grains Even better common format SS Data Production Clustering and graining Preprocessing Writing to Common format Cluster Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing LNGS 13/06/2019

Barshift polarization analysis Example 4 Images related to clusters, clusters and grains Even better common format SS Data Production Clustering and graining Preprocessing Writing to Common format Cluster Images Clusters Grains Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing Barshift polarization analysis Code exported to SVN – becomes available to Collaboration LNGS 13/06/2019

Barshift polarization analysis Example 5 (near future) Images related to clusters, clusters and grains, microtracks Rich common format SS Data Production Clustering and graining Preprocessing Writing to Common format Cluster Images Clusters Grains Microtracks Microtracks Etc. Common format Reading from Common format Postprocessing, Crosscheck Analysis Other processing Barshift polarization analysis Code exported to SVN – becomes available to Collaboration LNGS 13/06/2019

Do not have this information? => Do not fill it In libDMRoot we already prepared the structures for most of basic objects Data provider have some information on preprocessing phase? => Fill it Do not have this information? => Do not fill it Not necessary to wait when the complete and ideal processing chain is established for starting the use of a common format It is not too early to start now - it’s quite late, because we need to perform the common analysis immediately Cluster Images Clusters Grains Grains Microtracks Etc. Common format If some structure is not enough to accommodate any information we can extend it What is missed in DMRCluster to fit Japanese Elliptical data? Different algorithms can produce some difference in result. Two solutions: keep in data the information (flag) about the algorithm applied Export algorithm itself in a way that other people can run it on Common Format data LNGS 13/06/2019

Practical steps to do Define in the libDMRoot extensions necessary (if any) to fit Nagoya data Extend libDMRoot Drop libJPData to avoid the code duplication and start to export data in libDMRoot - them are practically identical now, so this is straightforward libDMRoot – storage library is conservative and should be updated only when it is really necessary and in agreement with other data providers Instead for processing (not for storage) any new classes and new libraries can be created both on preprocessing and on postprocessing level The only constraints are: preprocessing algorithms must be able to write data into common format (directly or via converter) Postprocessing – reads data from common format LNGS 13/06/2019

What is the data merging? Sample was scanned in Japan => Elliptic selection done Same sample scanned in Napoli => Polarization analysis done To merge data we do not need to put them together into the same tree To merge data we do not need to put them together into the same file We need to find one by one correspondence between clusters (grains) obtained by both systems The basic result of the data merging is this table together with both original data files LNGS 13/06/2019

Example of merged data File B: Grains (clusters) in common format File A: Grains (clusters) in common format File B: Grains (clusters) in common format Example of merged data grB ViewB grA ViewA ViewA,GrA <-> viewB,grB File 3: list of matched couples LNGS 13/06/2019

The same area scanned on two Napoli systems Sample A (NSSna2) Sample B (NSSna1) C60keV_test/color_camera/dm_ tracks.dm.root 5 mm x 1mm area scanned 1703 views About 1300000 grains C60keV_test/polarized_light/dm _tracks.dm.root 5mm x 1mm area scanned 1827 views About 900000 grains LNGS 13/06/2019

dmalign.Aff: 1.000516 0.012561 -0.013973 0.998641 11.60 22.77 Global alignment Result of the Global alignment procedure: 5 mm2 vs 5 mm2 325000 considences found with +-1.5 μm acceptance About of 2/3 of them are in the peak core Matching accuracy (3σ of the peak) X: +- 1.1 μm Y: +- 0.65 μm LNGS 13/06/2019

Merging procedure - one by one correspondence is established dmmerge –par=align.rootrc Input: a.dm.root - scanning data b.dm.root – scanning data a_b.cp.root – couples (result of dmalign) Output: a_b.mrg.root – with “match” tree made of selected branches for selected couples of both scanned samples LNGS 13/06/2019

The signal is selected here root -l check_mrg.C TCut cut("cut","abs(s2.eX-s1.eX)<0.4&&abs(s2.eY-s1.eY)<0.25"); //peak The signal is selected here LNGS 13/06/2019

Effect of polarization on clusters direction and the barycenter shift Peak couples Ag40nmNP Without filter no dependence of the clusters direction from the polarization No barshifts>0.04 With filter clear dependence of the clusters direction from the polarization Very few barshifts>0.04 For nanoparticles no corellation of the cluster angle and the barshift

Files sharing We got 100 Tb of disk space in the CNAF (computing center of INFN) WebDAV protocol is to be established for accessing this space Once it’s done all data providers will have write-access all data consumers – read access for data The request for access providing WebDAV was done several weeks ago Meanwhile in Napoli we export data using our group Apache web server to make available it for downloading Is it possible also for Japanese data? Some Cloud solution? LNGS 13/06/2019