SAMPA and CRU simulations and further ideas Johan Alme – 4th February

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

IT253: Computer Organization
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
GWDAW 16/12/2004 Inspiral analysis of the Virgo commissioning run 4 Leone B. Bosi VIRGO coalescing binaries group on behalf of the VIRGO collaboration.
MICE Tracker Front End Progress Tracker Data Readout Basics Progress in Increasing Fraction of Muons Tracker Can Record Determination of Recordable Muons.
28 August 2002Paul Dauncey1 Readout electronics for the CALICE ECAL and tile HCAL Paul Dauncey Imperial College, University of London, UK For the CALICE-UK.
Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.
6 June 2002UK/HCAL common issues1 Paul Dauncey Imperial College Outline: UK commitments Trigger issues DAQ issues Readout electronics issues Many more.
Final Presentation Packet I/O Software Management Application PISMA® Supervisor: Mony Orbach D0317 One-Semester Project Liran Tzafri Michael Gartsbein.
GPGPU platforms GP - General Purpose computation using GPU
Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.
26 February 2009Dietrich Beck FPGA Solutions... FPGA and LabVIEW Pattern Generator Multi-Channel-Scaler.
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
Emulator System for OTMB Firmware Development for Post-LS1 and Beyond Aysen Tatarinov Texas A&M University US CMS Endcap Muon Collaboration Meeting October.
Electronics for PS and LHC transformers Grzegorz Kasprowicz Supervisor: David Belohrad AB-BDI-PI Technical student report.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
The GANDALF Multi-Channel Time-to-Digital Converter (TDC)  GANDALF module  TDC concepts  TDC implementation in the FPGA  measurements.
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
PicoTDC Features of the picoTDC (operating at 1280 MHz with 64 delay cells) Focus of the unit on very small time bins, 12ps basic, 3ps interpolation Interpolation.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
Technical Part Laura Sartori. - System Overview - Hardware Configuration : description of the main tasks - L2 Decision CPU: algorithm timing analysis.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
A Front End and Readout System for PET Overview: –Requirements –Block Diagram –Details William W. Moses Lawrence Berkeley National Laboratory Department.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
High Speed Digital Systems Lab. Agenda  High Level Architecture.  Part A.  DSP Overview. Matrix Inverse. SCD  Verification Methods. Verification Methods.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Xiangming Sun1PXL Sensor and RDO review – 06/23/2010 STAR XIANGMING SUN LAWRENCE BERKELEY NATIONAL LAB Firmware and Software Architecture for PIXEL L.
Reading Assignment: Rabaey: Chapter 9
1 07/10/07 Forward Vertex Detector Technical Design – Electronics DAQ Readout electronics split into two parts – Near the detector (ROC) – Compresses and.
Samuel Silverstein Stockholm University CMM++ firmware development Backplane formats (update) CMM++ firmware.
Trigger Meeting: Greg Iles5 March The APV Emulator (APVE) Task 1. –The APV25 has a 10 event buffer in de-convolution mode. –Readout of an event =
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
1 Calorimeter electronics Upgrade Outcome of the meeting that took place at LAL on March 9th, 2009 Calorimeter Upgrade Meeting Barcelona March 10th-11st,
New L2cal hardware and CPU timing Laura Sartori. - System overview - Hardware Configuration: a set of Pulsar boards receives, preprocess and merges the.
Unit 1 Lecture 4.
LHCb VELO Upgrade Strip Chip Option: Data Processing Algorithms Giulio Forcolin, Abdul Afandi, Chris Parkes, Tomasz Szumlak* * AGH-Krakow Part I: LCMS.
HBD/TPC Electronics Status Works done to for a)Prototype detector readout b)Understand packing density and heat loading issues c)Address the overall system.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
FPGA Co-processor for the ALICE High Level Trigger Gaute Grastveit University of Bergen Norway H.Helstrup 1, J.Lien 1, V.Lindenstruth 2, C.Loizides 5,
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
HBD/TPC Electronics Status Works done to for a)Prototype detector readout b)Understand packing density and heat loading issues c)Address the overall system.
LKr readout and trigger R. Fantechi 3/2/2010. The CARE structure.
.1PXL READOUT STAR PXL READOUT requirement and one solution Xiangming Sun.
LHCb upgrade Workshop, Oxford, Xavier Gremaud (EPFL, Switzerland)
TPC CRU Jorge Mercado (Heidelberg) Ken Oyama (Nagasaki IAS) CRU Team Meeting, Jan. 26, 2016.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
FPGA based signal processing for the LHCb Vertex detector and Silicon Tracker Guido Haefeli EPFL, Lausanne Vertex 2005 November 7-11, 2005 Chuzenji Lake,
H. Krüger, , DEPFET Workshop, Heidelberg1 System and DHP Development Module overview Data rates DHP function blocks Module layout Ideas & open questions.
TOP Trigger Status Xin Gao and Luca Macchiarulo Tuesday January 17 th,2012, University of Hawaii Belle II TRG/DAQ Workshop.
Parallel compressing system for satellite on programmable chip Yifat Manzor Yifat Manzor & Reshef Dahan Supervisor: Eran Segev Part A.
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
CLUster TIMing Electronics Part II
vXS fPGA-based Time to Digital Converter (vfTDC)
14-BIT Custom ADC Board Rev. B
Digital readout architecture for Velopix
SLP1 design Christos Gentsos 9/4/2014.
CoBo - Different Boundaries & Different Options of
Design of Digital Filter Bank and General Purpose Digital Shaper
FIT Front End Electronics & Readout
Dominique Breton, Jihane Maalmi
Example of DAQ Trigger issues for the SoLID experiment
New DCM, FEMDCM DCM jobs DCM upgrade path
PID meeting Mechanical implementation Electronics architecture
ECE 352 Digital System Fundamentals
Presentation transcript:

SAMPA and CRU simulations and further ideas Johan Alme – 4th February

Read-out Simulations The presented work is done by Damian K Wejnerowski and Håvard Rustad Olsen (M.Sc. Students Bergen University College) ›Supervisors: Håvard Helstrup, Johan Alme The idea: Make a simulation model for the complete read-out chain in ALICE TPC Run3 ›Based on TDR and ongoing development Range: ›From digital part of SAMPA chip ›To output from CRU (DDL3?) Tool: SystemC Johan Alme – CRU planning meeting 4th February 20152

The Run3 Simulation Model – Basic Ideas Simulate the data throughput from Sampa to CRU output Software model = Hardware component ›However: Not detailed design of interfaces Easily configurable ›Easy to add more functionality Customizable Input ›Changing Data source should be easy ›Current Data source: Random generator giving configurable occupancy Note: Little/no variation from channel to channel Customizable output Johan Alme – CRU planning meeting 4th February Data generator inputs data here tion/3/material/slides/0.pdf

Simulation Model Data Generator: ›Random generated in space/time with selectable occupancy to digital part of Sampa – no clusters SAMPA: ›Digital part with all buffers in place GBTx w/GBT protocol: ›Only forwards data – assumes no extra delay CRU: ›Model assumes no sorting on FEC Note: ›More FECs can be easily added ›More CRUs can be easily added Johan Alme – CRU planning meeting 4th February 20154

SAMPA simulations Setup ›1 FEC (5 SAMPA) ›Expected buffersizes: Data: 4k * 10bit / 8k* 10bit Header: 256 * 10bit / 1k * 10bit ›Randomly generated samples depended only on occupancy Scenarios ›1. Increasing occupancy with simple fluctuation 8 Timewindows / 4k * 10bit buffers ›2. Globally distributed Fluctuation 26 Timewindows / 4k * 10bit buffers ›3. When does the buffers stabilize? 8k* 10bit buffers / 30 Timewindows Johan Alme – CRU planning meeting 4th February 20155

1. Increasing Occupancy Johan Alme – CRU planning meeting 4th February Peak: 30% - 369* 10bit 60% * 10bit 90% * 10bit Spot checking 1 channel (Sampa 0; Channel 0)Average over all channels (after 100 us, 200us, 500us and 800us) Should have been run for longer time period… - but enough to see the trends and verify the model buffer overflow

2. Some variation on input Johan Alme – CRU planning meeting 4th February Input pattern used (occupancy per 100 us) Result:

4. Will the buffer usage stabilize? ›Neither tests stabilized after a period of 30 x 100 us ›50 % peaked at 3615 * 10bit ›70 % peaked at 9126 * 10bit ›This is as expected – and acts as a proof of the simulation model Johan Alme – CRU planning meeting 4th February 20158

CRU Model Simple Model: ›One FIFO per channel  1920/2560 FIFOs. (12/16 FECs) ›Assumes unsorted input data in time and geometry! ›Does Geometrical sorting Time sorting Setup for simulation ›Samples generated by random generator For each channel and for each time window 30% occupancy = 30% probability In average 340 samples per 100 us Johan Alme – CRU planning meeting 4th February 20159

CRU – Buffer Estimation Johan Alme – CRU planning meeting 4th February These shows total mem usage on CRU under these two conditions

Further work on Simulation Include a more realistic data source: › real data/black events › zero-suppressed/huffman-encoded Include a better model for CRU Johan Alme – CRU planning meeting 4th February

CRU Mockup test – is it possible to build 2560 FIFOs in one FPGA?? Assumes no sorting on FEC level Structure: ›2600 x 18 Kb FIFOs ›Some «stupid» logic Took ~40 hours to build (on a weak laptop) Successful build! Failed to look at design afterwards due to mem resources on laptop We push the limit of mem resources even on state of the art FPGAs… Conclusion: ›If some level of sorting can be done at the FEC level it would be better Johan Alme – CRU planning meeting 4th February Target device: Xilinx Virtex Ultrascale XCVU095 - Largest available

How can sorting on FEC level be done? 1.The pads and the FECs are distributed such that full padrows belong to one CRU ›Figure shows new IROC padplane: 63 pad rows 5440 pads (- 64 pads) 34 FEC (2 partitions, 3 sectors) dxmin=8.7mm dxmax=13.5mm ›For rp1, where we have 20 FECs: We can split the data from the 40 GBT links into 2 CRUs: upper half of FEC to one CRU lower half to 2nd CRU)  Use one CRU with less than 40 GBT inputs Johan Alme – CRU planning meeting 4th February

How can sorting on FEC level be done? 2.Design the FECs such that we make sure that pads are connected to the input of the SAMPAs in a logical way. This might not mean the easiest physical routing of signals… Pad plane to transfer poins routing may not be symmetrical for all FECs, probability is there for a Geographic dependency of FEC. ›The routing from pad plane to transfer points (Kapton connectors!) may not be same for all FECs ›an FFEC with routing logical to follow pad-row can get a FEC to have a geographic constraint. ›Can there be an intermediate connector? (with totally characterized parasitic () R.LC, high / low frequency e.t.c…) Johan Alme – CRU planning meeting 4th February A. Rehman: Idea of Attiq ur Rehman

How can sorting on FEC level be done? 3.By adding programmable sorting matrix on SAMPA This will need a few memory resources for the programmable routing table Will give us full freedom to match pads/padrow no matter which partition Johan Alme – CRU planning meeting 4th February Channel 1 Channel 2 Channel 3 Channel 4 Channel 32 Sorting Matrix 4 x e-links Idea of Attiq ur Rehman

What does this mean for the CRU? All data for the GBT link will be arrive sorted, but time-multiplexed. We decide the readout order so that all data for one padrow comes first, and then comes the next, etc I.e. – we would need the same amount of FIFOs as there are pads in a row. ›150 FIFOs The depth of the FIFO depends on the time-frame. In order to do parallel cluster finding you need the data from all pads to be present. ›So if you multiplex M channels, each with a time-frame of N, you need to store NxM 10-bit words. › Johan Alme – CRU planning meeting 4th February Idea of Torsten Alt/Attiq ur Rehman

What does this mean for the CRU? The search for clusters is done on all pads of a pad-row in parallel in a Search Matrix Some key numbers from Torsten Alts calculations (I don’t go through them in details here): ›Number of FIFO buffers = Number of pads in a row = 150 ›Depth of FIFO, min 3 full events  3K 10 bit words One BRAM in Xilinx is 36 kb = 1 BRAM block  150 BRAM blocks ›Min clock speed 230 MHz ›Min 7500 registers for the search matrix ›+ extra logic for the calculation logic This have not been simulated, and the simulation models must be adjusted for this! Johan Alme – CRU planning meeting 4th February Idea of Torsten Alt

Further ideas The SAMPA needs to time-multiplex the data from the channels onto the GBT. ›However, how it does it, should be up to us. Ideally we could make it configurable. Instead of having a fixed time-frame size, this could be programmable. ›So we can decide if the want to have 1000 samples from Channel 0, then 1000 samples from Channel 1, etc. ›We could decide to just get 250. Or 100. Or whatever number. ›This would allow to fine-tune the required buffers in the CRU by quite some degree. ›For the SAMPA not much would change. The data is sampled and then stored in an internal buffer/memory. With minimal overhead we could solve the problem of “time-frame” borders. If the continuous data stream is chopped up into the time-frames, then we’ll create artificial borders for the cluster finder. ›One easy solution would be that the time-frames overlap. Johan Alme – CRU planning meeting 4th February Idea of Torsten Alt

Further ideas cont… Time-frames overlap. ›This will give a bit of overhead but not that much. ›Example: We chose a time-frame of 250 samples. The SAMPA sends out Samples for all pads, time-multiplexed. Instead of sending the Samples for the next time-frame, we send % of the data is sent double. ›This allows the Clusterfinder to avoid any border issues. It would create double clusters but they can be identified by software easily and filtered out. But we would have the overlap, so we wouldn’t miss clusters. This is easily realised in HW, and it would give an enormous flexibility at a very low price. Johan Alme – CRU planning meeting 4th February Idea of Torsten Alt

Conclusion Simulations have so far proved that the current scheme is working given 30% occupancy and zero suppressed data. However – if no sorting is done on the FEC, the CRU implementation might be large and clumsy. ›Maybe impossible to realize in commercially available solutions ›i.e. a custom HW solution will be needed Torsten Alt has invented an elegant scheme for buffering and 2D clusterfinding that uses a minimum of resources on the CRU ›This relies on the data from the FECs are sorted correctly ›Following Attiq Ur Rehmans scheme and by doing careful design this should be straight forward ›This solution does not add extra latency, and only add very few resources. However – this idea should be simulated to perform a sanity check of the estimations. Johan Alme – CRU planning meeting 4th February

Answers to Questionare FEC partitioning compatible with cluster finder ? ›Ordering of pads inside SAMPA by programmable routing, the problem of sending incomplete padrows to the 2 CRUs is still there, might be solvable on FEC level (under discussion) Plans for the data compression in the SAMPA chip ? ›zero-suppression and (alternatively) Huffman coding Number of CRU needed if using the PCIe form factor ? ›As described in the TDR. This means 3 partitions per OROC. ›Table 6.1 here: ›=> ›Assuming 24 GBTs per CRU, the O2 counting is 324. (IROC1 = 2, IROC2 = 2, OROC1 = 2, OROC2 = 2, OROC3 = 1, Sum = 9, 9 x 36 = 324) ›Assuming 32 GBTs per CRU, the O2 counting is 288. (IROC1 = 1, IROC2 = 2, OROC1 = 2, OROC2 = 2, OROC3 = 1, Sum = 8, 8 x 36 = 288) What is the size of the CRU internal buffer needed for each GBT ? ›worst case: ~18kbx2560 = 46Mb ›with SAMPA pad ordering and with 2D cluster finder: 36kbx150 = 5.4 Mbit Data size needed to represent a cluster with the upgraded TPC (Pad, row, time, charge, etc) ? ›To represent a cluster for the current ROCs, 7 parameters and in total 77 bit are needed in an uncompressed format. For the upgrade that will eventually change if there are more pads per paprow and/or if a better precision is required. ›Compression factor w/clusterfinder is minimum can be increased to about 3 by optimizing the design Johan Alme – CRU planning meeting 4th February