Lic. Thesis Presentation in ICT/ECS KTH

Slides:

Advertisements

Similar presentations

Multiprocessor Architecture for Image processing Mayank Kumar – 2006EE10331 Pushpendre Rastogi – 2006EE50412 Under the guidance of Dr.Anshul Kumar.

Advertisements

Ultrafast 16-channel ADC for NICA-MPD Forward Detectors A.V. Shchipunov Join Institute for Nuclear Research Dubna, Russia

Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.

An ATCA and FPGA-Based Data Processing Unit for PANDA Experiment H.XU, Z.-A. LIU,Q.WANG, D.JIN, Inst. High Energy Physics, Beijing, W. Kühn, J. Lang, S.

CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.

Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.

System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

DLS Digital Controller Tony Dobbing Head of Power Supplies Group.

RUNNING RECONFIGME OS OVER PETA LINUX OS MUHAMMED KHALID RAHIM DR. GRANT WIGLEY ID:

Status Report of CN Board Design Zhen’An LIU Representing Trigger Group, IHEP, Beijing Panda DAQ Meeting, Munich Dec

DEPFET Backend DAQ, Giessen Group 1 ATCA based Compute Node as Backend DAQ for sBelle DEPFET Pixel Detector Andreas Kopp, Wolfgang Kühn, Johannes Lang,

RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:

LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.

Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.

Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.

Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.

R&D for First Level Farm Hardware Processors Joachim Gläß Computer Engineering, University of Mannheim Contents –Overview of Processing Architecture –Requirements.

Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

Design Criteria and Proposal for a CBM Trigger/DAQ Hardware Prototype Joachim Gläß Computer Engineering, University of Mannheim Contents –Requirements.

Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.

Latest ideas in DAQ development for LHC B. Gorini - CERN 1.

Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller

XLV INTERNATIONAL WINTER MEETING ON NUCLEAR PHYSICS Tiago Pérez II Physikalisches Institut For the PANDA collaboration FPGA Compute node for the PANDA.

DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:

Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.

New ATCA compute node Design for PXD Zhen-An Liu TrigLab, IHEP Beijing Feb ， 6th International Workshop on DEPFET Detectors and Applications.

KM3NeT Offshore Readout System On Chip A highly integrated system using FPGA COTS S. Anvar, H. Le Provost, F. Louis, B.Vallage – CEA Saclay IRFU – Amsterdam/NIKHEF,

The Slow Control System of the HADES RPC Wall Alejandro Gil on behalf of the HADES RPC group IFIC (Centro Mixto UV-CSIC) Valencia, 46071, Spain IEEE-RT2009.

Firmware and Software for the PPM DU S. Anvar, H. Le Provost, Y.Moudden, F. Louis, E.Zonca – CEA Saclay IRFU – Amsterdam/NIKHEF, 2011 March 30.

Status of Compute Node Zhen’an Liu, Dehui Sun, Jingzhou Zhao, Qiang Wang, Hao Xu Triglab, IHEP, Beijing Wolfgang Kühn, Sören Lange, Univ. Giessen Belle2.

Johannes Lang: IPMI Controller Johannes Lang, Ming Liu, Zhen’An Liu, Qiang Wang, Hao Xu, Wolfgang Kuehn JLU Giessen & IHEP.

The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.

IRFU The ANTARES Data Acquisition System S. Anvar, F. Druillole, H. Le Provost, F. Louis, B. Vallage (CEA) ACTAR Workshop, 2008 June 10.

Eric Hazen1 Ethernet Readout With: E. Kearns, J. Raaf, S.X. Wu, others... Eric Hazen Boston University.

HTCC coffee march /03/2017 Sébastien VALAT – CERN.

Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team

Mohamed Abdelfattah Vaughn Betz

Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.

M. Bellato INFN Padova and U. Marconi INFN Bologna

Use of FPGA for dataflow Filippo Costa ALICE O2 CERN

FPGAs for next gen DAQ and Computing systems at CERN

Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.

Future Hardware Development for discussion with JLU Giessen

LHC experiments Requirements and Concepts ALICE

96-channel, 10-bit, 20 MSPS ADC board with Gb Ethernet optical output

Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos

KRB proposal (Read Board of Kyiv group)

TELL1 A common data acquisition board for LHCb

ALICE – First paper.

RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne

Commissioning of the ALICE HLT, TPC and PHOS systems

Instructor: Dr. Phillip Jones

Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch

Development of new CN for PXD DAQ

Improving java performance using Dynamic Method Migration on FPGAs

PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM

The LHCb Event Building Strategy

Table 1: The specification of the PSICM and the ePSICM Prototypes

Example of DAQ Trigger issues for the SoLID experiment

John Harvey CERN EP/LBC July 24, 2001

Characteristics of Reconfigurable Hardware

LHCb Trigger, Online and related Electronics

Design Principles of the CMS Level-1 Trigger Control and Hardware Monitoring System Ildefons Magrans de Abril Institute for High Energy Physics, Vienna.

Portable SystemC-on-a-Chip

The LHCb Front-end Electronics System Status and Future Development

TELL1 A common data acquisition board for LHCb

Presentation transcript:

A High-End Reconfigurable Computation Platform for Particle Physics Experiments Lic. Thesis Presentation in ICT/ECS KTH Under the collaboration between KTH & JLU by Ming Liu Supervisors: Prof. Axel Jantsch (KTH) Dr. Zhonghai Lu (KTH) Prof. Wolfgang Kuehn (JLU, Germany) 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Contributions The thesis is mainly based on the following contributions: Ming Liu, Johannes Lang, Shuo Yang, Tiago Perez, Wolfgang Kuehn, Hao Xu, Dapeng Jin, Qiang Wang, Lu Li, Zhenan Liu, Zhonghai Lu, and Axel Jantsch, “ATCA-based Computation Platform for Data Acquisition and Triggering in Particle Physics Experiments”, In Proc. of the International Conference on Field Programmable Logic and Applications 2008 (FPL’08), Sep. 2008. (System architecture) Ming Liu, Wolfgang Kuehn, Zhonghai Lu and Axel Jantsch, “System-on-an-FPGA Design for Real-time Particle Track Recognition and Reconstruction in Physics Experiments”, In Proc. of the 11th EUROMICRO Conference on Digital System Design (DSD’08), Sep. 2008. (Algorithm implementation and evaluation) Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch, Shuo Yang, Tiago Perez and Zhenan Liu, “Hardware/Software Co-design of a General-Purpose Computation Platform in Particle Physics”, In Proc. of the 2007 IEEE International Conference on Field Programmable Technology (ICFPT’07), Dec. 2007. (HW/SW co-design) 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Overview Background in Physics Experiments Computation Platform for DAQ and Triggering Network architecture Compute Node (CN) architecture HW/SW Co-design of the System-on-an-FPGA Partitioning strategy HW design SW design Algorithm Implementation and Evaluation Conclusion and Future Work 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Background Nuclear & Particle Physics: a branch of physics that studies the constituents and interactions of atomic nuclei and particles. Some elementary particles do not occur under normal circumstances in nature. Many can be created and detected during energetic collisions of others. Beam → Target, or Beam ↔ Beam. Produced particles are studied with huge/complex detector systems. Examples: HADES & PANDA @ GSI, Germany ATLAS, CMS, LHCb, ALICE at the LHC @ CERN, Switzerland & France BES III @ IHEP, China WASA @ FZ-Juelich, Germany …… 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Detector Systems HADES: RICH (Ring Image CHerenkov) MDC (Mini Drift Chamber) TOF (Time-Of-Flight) TOFino (small TOF) Shower (Electromagnetic Shower) RPC (Resistive Plate Chamber, will be added to substitute TOFino) HADES BES III WASA PANDA 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden HADES Detector System 10 November 2018 Royal Institute of Technology, Sweden

Challenge & Motivation Challenge: high reaction rate and high data rate (PANDA, 10-20 MHz, data rate up to 200 GB/s!!!) Not possible to entirely store all the data, due to the storage capacity limitation. Only a rare fraction (e.g. 1/106) is of interest for extensive offline analysis. The background can be discarded on the fly. Pattern recognition algorithms used to identify interesting events. Motivation: a reconfigurable and scalable computation platform for high data rate processing. 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Data Flow Pattern recognition algorithms Data correlation Largely reduced data rate for storage 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Related Work Previously commercial bus systems, such as VMEbus, FASTbus, CAMAC, etc., were used for DAQ and triggering. Time-multiplexing of the system bus exacerbates the data exchange efficiency and cannot meet high-performance requirements. The solution of existing reconfigurable computers sounds good, but not suitable for physics experiment applications: Some are augmented computer clusters with FPGAs attached to the system bus as accelerators. (Bandwidth bottleneck between the microprocessor and the accelerator) Some are standalone boards. (Not straightforward to scale the system to a large size, due to the lack of efficient inter-board connectivity) Flexible and massive communication channels are required to interface with detectors and the PC farm. All-board-switched or tree-like topology may result in communication penalty between algorithm steps. (P2P direct links are preferred.) 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Overview Background in Physics Experiments Computation Platform for DAQ and Triggering Network architecture Compute Node (CN) architecture HW/SW Co-design of the System-on-an-FPGA Partitioning strategy HW design SW design Algorithm Implementation and Evaluation Conclusion and Future Work 10 November 2018 Royal Institute of Technology, Sweden

DAQ and Trigger Systems Detectors detect particles and generate signals Signals digitized by ADCs Data buffered in concentrators/buffers Pattern recognition algorithms extract features from events. Interesting events stored in the mass storage. Background discarded on the fly. 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Network Topology Compute Nodes (CN) interconnected for parallel/pipelined processing Hierarchical network topology External channels Optical links Gigabit Ethernet Internal interconnections On-board IO connections Inter-board backplane Inter-chassis switching 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden ATCA Backplane Advanced Telecommunications Computing Architecture (ATCA) Full-mesh direct Point-to-Point (P2P) backplane High flexibility to correlate results from different algorithms High performance compared to shared buses 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Compute Node Prototype board with 5 Xilinx Virtex-4 FX60 FPGAs 4 FPGAs as algo. processors 1 FPGA as a switch 2GB DDR2 per FPGA, IPMC, Flash, CPLD... Full-mesh on-board communications of GPIOs & RocketIOs RocketIO-based backplane channels External channels of optical links & Gigabit Ethernet 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Compute Node PCB 14-layer PCB design Standard 12U size of 280 x 322 mm 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Performance Summary 1 ATCA chassis ~ 14 CNs 1890 Gbps on-board connections 1456 Gbps inter-board backplane connections 728 Gbps full-duplex optical bandwidth 70 Gbps Ethernet 140 GB DDR2 SDRAM All computing resources of 70 Virtex-4 FX60 FPGAs (140 PowerPC 405 microprocessors + programmable resources) Power consumption evaluation: Max. 170 W/CN (Each ATCA slot: 200 W) 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Overview Background in Physics Experiments Computation Platform for DAQ and Triggering Network architecture Compute Node (CN) architecture HW/SW Co-design of the System-on-an-FPGA Partitioning strategy HW design SW design Algorithm Implementation and Evaluation Conclusion and Future Work 10 November 2018 Royal Institute of Technology, Sweden

Partitioning Strategy Multiple tasks during experiment operations (data processing, control tasks, ...) Partitioned between FPGA HW fabric & embedded PowerPC CPUs 10 November 2018 Royal Institute of Technology, Sweden

Partitioning Strategy Concrete strategy: All pattern recognition algorithms customized in the FPGA fabric , as HW parallel/pipelined processing modules Slow control tasks, (e.g. Monitoring the system status, modifying experimental parameters, ...), implemented in SW (applications + OS) Soft TCP/IP stack in Linux OS 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden HW Design Old bus-based arch. (PLB & OPB) CPU & fast peripherals on PLB Slow peripherals on OPB Customized processing modules (e.g. TPU) on PLB Improved MPMC LocalLink-based arch. Multi-Port Memory Controller (8 ports) Direct access to the memory from the device Customized processing unit interfaced to MPMC directly 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden SW Design Open-source embedded Linux on the embedded PowerPC CPUs Device drivers Standard devices (Ethernet, RS232, Flash memory, etc.) Customized modules Applications for slow controls High level scripts C/C++ programs Apache webserver Java programs on the VM Software cost: low budget!!! 10 November 2018 Royal Institute of Technology, Sweden

Remote Reconfigurability Remote reconfigurability (HW & SW) is desired due to the spatial constraint in experiments. Both OS and FPGA bitstream are stored in the NOR flash memories. With the support of the MTD driver, the bitstream and the OS kernel can be overwritten/upgraded in Linux. Roboot the system and the updated system will function. Backup mechanism to guarantee the system alive. 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Overview Background in Physics Experiments Computation Platform for DAQ and Triggering Network architecture Compute Node (CN) architecture HW/SW Co-design of the System-on-an-FPGA Partitioning strategy HW design SW design Algorithm Implementation and Evaluation Conclusion and Future Work 10 November 2018 Royal Institute of Technology, Sweden

Pattern Recognition in HADES New DAQ & Trigger system for HADES upgrade (≤10 GB/s) Pattern recognition algorithms: Cherenkov ring recognition (RICH) MDC particle track reconstruction (MDCs) Time-Of-Flight processing (TOF & RPC) Electromagnetic shower recognition (Shower) Partitioned and distributed on FPGA nodes Algorithms correlated by hierarchical connections 10 November 2018 Royal Institute of Technology, Sweden

Pattern Recognition in HADES Correlation Event building & storage 10 November 2018 Royal Institute of Technology, Sweden

Particle Track Reconstruction Particle tracks bent in the magnetic field between the coils Straight lines before & after the coil approximately Inner and outer tracks pointing to RICH and TOF detector respectively and helping them to find patterns (correlation) Similar principle for inner and outer segments. Only inner part discussed The particle track reconstruction algorithm for HADES was previously implemented in SW, due to the complexity. Now implemented and investigated as a case study in HW 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Basic Principle Wires fired by flying particles Project fired wires to a plane Recognize the overlap area and reconstruct tracks from the target 6 sectors 2110 wires per sector (inner) 6 orientations 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Basic Principle 10 November 2018 Royal Institute of Technology, Sweden

Hardware Implementation PLB interface (Slave) MPMC interface (Master) Algorithm processor: Tracking Processing Unit (TPU) 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Modular Design TPU for track reconstruction computation Input: fired wire Nos. Output: position of track candidates on the proj. plane Sub-modules: Wire No. Wr. FIFO Proj. LUT & Addr. LUT Bus master Accumulate unit Peak finder 10 November 2018 Royal Institute of Technology, Sweden

Implementation Results Resource utilization of Virtex-4 FX60 – acceptable! Timing limitation: 125 MHz without optimization effort Clock frequency fixed at 100 MHz, to match the PLB speed 10 November 2018 Royal Institute of Technology, Sweden

Performance Evaluation Experimental setup: MPMC-based structure used for measurements Different measurement points on different wire multiplicities (10, 30, 50, 200, 400 fired wires out of 2110) Projection LUT: 5.7 Kbits per wire in average (1.5 MB/2110 wires) A C program running on the Xeon 2.4 GHz computer as the reference Results: Speedup of 10.8 – 24.3 times per module have been seen compared to the software solution. Considering the FPGA resource consumption, multiple TPU modules may be integrated in the system for parallel processing and high performance. 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Performance Analysis Non-TPU factors introducing overhead and restricting the performance Complex DDR2 address mechanism (large latency) Data transfer burst mode of only 8 beats (clock cycles wasted) MPMC arbitrating the memory access among multiple ports (clock cycles wasted) … TPU module is powerful, but memory accessories (LUTs) are slow. Solution: SRAM memory added to enhance the memory bandwidth and reduce the access latency Speed up of from 20 to 50 per module compared to software expected 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Overview Background in Physics Experiments Computation Platform for DAQ and Triggering Network architecture Compute Node (CN) architecture HW/SW Co-design of the System-on-an-FPGA Partitioning strategy HW design SW design Algorithm Implementation and Evaluation Conclusion and Future Work 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Conclusion An FPGA- and ATCA-based computation platform is being constructed for the DAQ and trigger system in modern nuclear and particle physics experiments. The platform features high-performance, scalability, reconfigurability, and the potential to be used for different application projects (physics & non-physics). A co-design approach is proposed to develop applications on the platform. HW: system design + customized processing modules SW: Linux OS + device drivers + application programs A case study, the particle track reconstruction algorithm, has been implemented and evaluated on the system. Speedup of one order of magnitude per module has been observed when compared to the software solution. (multiple modules integrated for parallel processing) 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Future Work The network communication will be investigated with multiple CN PCBs. All pattern recognition algorithms are to be implemented and parallelized. Study of more efficient memory addressing mechanism for multiple modules More advanced features, e.g. dynamic partial reconfiguration for adaptive computing 10 November 2018 Royal Institute of Technology, Sweden

Royal Institute of Technology, Sweden Thank you! 10 November 2018 Royal Institute of Technology, Sweden