Parallel Algorithms for GEM Detectors Systems Postprocessing Rafał Krawczyk, Paweł Linczuk, Krzysztof Poźniak, Ryszard Romaniuk, Grzegorz Kasprowicz, Wojciech.

Slides:

Advertisements

Similar presentations

DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)

Advertisements

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.

Kondo GNANVO Florida Institute of Technology, Melbourne FL.

1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.

1 Geometry layout studies of the RICH detector in the CBM experiment Elena Belolaptikova Dr. Claudia Hoehne Moscow State Institute of Radioengineering,

ARM-DSP Multicore Considerations CT Scan Example.

Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.

Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Data Acquisition System for 2D X-Ray Detector Beijing Synchrotron Radiation Facility (BSRF) located at Institute of High Energy Physics is the first synchrotron.

A Search for Point Sources of High Energy Neutrinos with AMANDA-B10 Scott Young, for the AMANDA collaboration UC-Irvine PhD Thesis:

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)

GPGPU platforms GP - General Purpose computation using GPU

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston

Designing a CODAC for Compass Presented by: André Sancho Duarte.

1 Electronics Lab, Physics Dept., Aristotle Univ. of Thessaloniki, Greece 2 Micro2Gen Ltd., NCSR Demokritos, Greece 17th IEEE International Conference.

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.

Niko Neufeld, CERN/PH-Department

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

The GANDALF Multi-Channel Time-to-Digital Converter (TDC)  GANDALF module  TDC concepts  TDC implementation in the FPGA  measurements.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures.

Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.

Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.

Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:

1 The GEM Readout Alternative for XENON Uwe Oberlack Rice University PMT Readout conversion to UV light and proportional multiplication conversion to charge.

14 Sep 2005DAQ - Paul Dauncey1 Tech Board: DAQ/Online Status Paul Dauncey Imperial College London.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

School of Electrical Engineering & Computer Science National University of Sciences & Technology (NUST), Pakistan Research Profile Rehan Hafiz.

Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.

Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.

XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Tomasz Czarski, Maciej Linczuk, Institute of Electronic Systems, WUT, Warsaw LLRF ATCA.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Geant4 based simulation of radiotherapy in CUDA

Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn, Kevin Skadron University of Virginia.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Ionization Detectors Basic operation

Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:

StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:

AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November.

Detection of electromagnetic showers along muon tracks Salvatore Mangano (IFIC)

Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.

Pre-calculated Fluid Simulator States Tree Marek Gayer and Pavel Slavík C omputer G raphics G roup Department of Computer Science and Engineering Faculty.

New L2cal hardware and CPU timing Laura Sartori. - System overview - Hardware Configuration: a set of Pulsar boards receives, preprocess and merges the.

Goddard February 2003 R.Bellazzini - INFN Pisa A new X-Ray Polarimeter based on the photoelectric effect for Black Holes and Neutron Stars Astrophysics.

XLV INTERNATIONAL WINTER MEETING ON NUCLEAR PHYSICS Tiago Pérez II Physikalisches Institut For the PANDA collaboration FPGA Compute node for the PANDA.

Calorimeter Digitisation Prototype (Material from A Straessner, C Bohm et al) L1Calo Collaboration Meeting Cambridge 23-Mar-2011 Norman Gee.

Collection of Photoelectrons from a CsI Photocathode in Triple GEM Detectors C. Woody B.Azmuon 1, A Caccavano 1, Z.Citron 2, M.Durham 2, T.Hemmick 2, J.Kamin.

Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.

XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Wojciech Jalmuzna, Technical University of Lodz, Department of Microelectronics and Computer.

Date of download: 6/1/2016 Copyright © 2016 SPIE. All rights reserved. Triangulated shapes of human head layer boundaries employed in simulations: (a)

Recent development of gaseous position-sensitive thermal neutron detectors for the IBR-2 spectrometers A.V.Belushkin, A.A.Bogdzel, A.N.Chernikov, A.V.Churakov,

Accelerating particle identification for high-speed data-filtering using OpenCL on FPGAs and other architectures for FPL 2016 Srikanth Sridharan CERN 8/31/2016.

FPGAs for next gen DAQ and Computing systems at CERN

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Parallel Plasma Equilibrium Reconstruction Using GPU

Commissioning of the ALICE HLT, TPC and PHOS systems

Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang

Implementing Boosting and Convolutional Neural Networks For Particle Identification (PID) Khalid Teli .

HARPO, a Micromegas+GEM TPC for gamma polarimetry above 1 MeV

Peng Jiang, Linchuan Chen, and Gagan Agrawal

Presentation transcript:

Parallel Algorithms for GEM Detectors Systems Postprocessing Rafał Krawczyk, Paweł Linczuk, Krzysztof Poźniak, Ryszard Romaniuk, Grzegorz Kasprowicz, Wojciech Zabołotny, Paweł Zienkiewicz, Piotr Kolasiński, Andrzej Wojeński, Tomasz Czarski, Maryna Chernyshova

Agenda Overview of Soft X–Ray Radiation (SXR) detection systems Issue of postprocessing in the systems Algorithmic and hardware challenge Preliminary results Ongoing research Future plans

Tokamaks Rus. Toroidalnaja Kamiera s Magnitnymi Katuszkami Fusion reactor responsible for accelerating and maintaining plasma within magnetic boundaries to allow fusion reaction Source: fusionforenergy.europa.eu

The challenge of maintaining plasma purity The critical problem to fusion to proceed efficiently is sustaining plasma purity Accumulated impurities of plasma significantly hinder reaction The concentration of impurities must be investigated to optimize performance

Why the SXR detection ? Local impurities can be measured through SXR measurement Also: information about photon temperature, shape of plasma and magnetic axis Magnetic topology possible

GEM detector GEM - Gas Electron Multiplier Impervious to neutrons in tokamak High spatial and time resolution Possible two-dimensional detection Manifold detection topologies possible

GEM detector – principle of work Generation of electron as a result of radiation absorption Multiplication in gas The cloud falls on the anode plate Henceforth electrical signal- amplification and A/D conversion

Systems with GEM detectors Tremendous amounts of data and throughput– hundreds of Gbits/s Achieving highest spatial and temporal resolution with hardware constraints The preprocessing phase done by FPGAs

Legacy system Data saved to disk Preprocessing online in FPGAs, postprocessing offline in MATLAB The MATLAB algorithms hard or impossible to implement in FPGAs

The development of system For 512 channels real-time processing of above 500Gbit/s of data – increase of resolution and throughput Increase of functionality – further processing of data on-line (so-called postprocessing) Postprocessing of data of reduced though significant throughput (several GB/s) after preprocessing

Postprocessing- the primary objective Optimizing data analysis Development of platform to implement and to test the both algorithms Finding an alternative for FPGAs Introducing loopback to plasma control mechanisms in tokamaks

Posprocessing- algorithmic challenge Merging and sorting of data from different boards Subsequent histogramming of collected data with cluster and event detection Reconstruction of 2D charge distribution in the detector Planned 3D reconstruction

Postprocessing- hardware challenge The necessity of investigating architectural capabilities of available high-end electronics GPU(CUDA) MIC Intel Xeon Phi + CPU Intel Xeon DSP FPGA

Preliminary results Basic goal- achieving speedup on MATLAB algorithms with offline data Achieved by implementing algorithms in C accessible via MATLAB mex API Redesigning algorithms while retaining their functionality Up to 1000 x speedup achieved

Preliminary results cont’d Preliminary parallelization led to further speedup on dual-core CPU Format of data and patterns of memory access strongly influence performance Conclusion – attempt to optimize via manipulating the data format and parallelizing seems legitimate

Ongoing research Reverse- engineering the collected data to format at a stage of PCIe transfer to PC Study of known methods to parallelize the given algorithms Finding parallel patterns Necessary implementation of crucial parts of algorithms and scrutinizing the influence of various forms of data representation and parallelization

Ongoing research cond’d Investigating achieved speedup on Intel Xeon, Intel Xeon Phi and NVIDIA GPU Major problem – Computation-to data transfers ratio, data of variable length, memory access patterns Investigating and assessing which technology will allow to achieve highest speedup Necessary deep insight into architecture and justifying the reason of achieved speedups The collected information crucial for further studies

Future plans Basing on the achieved speedups, propose a platform for further development of algorithms Implement more elaborate algorithms, e.g. 3-d reconstruction Pinpointing algorithm and hardware features that led to such decisions Developing a heuristics to choose appropriate hardware for given algorithms Develop a loopback to the tokamak

Thank you for your attention