Proposal for a “Switchless” Level-1 Trigger Architecture Jinyuan Wu, Mike Wang June 2004.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Data Communications and Networking
BTeV Trigger BEAUTY th International Conference on B-Physics at Hadron Machines Oct , 2003, Carnegie Mellon University Michael Wang, Fermilab.
Digital Phase Follower -- Deserializer in Low-Cost FPGA
Track Trigger Designs for Phase II Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen, S.X. Wu, (Boston.
Some Thoughts on L1 Pixel Trigger Wu, Jinyuan Fermilab April 2006.
BTeV Trigger Erik Gottschalk, Fermilab (for the BTeV Trigger Group)
CHEP03 - UCSD - March 24th-28th 2003 T. M. Steinbeck, V. Lindenstruth, H. Tilsner, for the Alice Collaboration Timm Morten Steinbeck, Computer Science.
BTeV Trigger Architecture Vertex 2002, Nov. 4-8 Michael Wang, Fermilab (for the BTeV collaboration)
Switching for BTeV Level 1 Trigger Jinyuan Wu (For the BTeV Collaboration)
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
K. Honscheid RT-2003 The BTeV Data Acquisition System RT-2003 May 22, 2002 Klaus Honscheid, OSU  The BTeV Challenge  The Project  Readout and Controls.
Lecture 2 TCP/IP Protocol Suite Reference: TCP/IP Protocol Suite, 4 th Edition (chapter 2) 1.
Without hash sorting, all O(n 2 ) combinations must be checked. Hash Sorter - Firmware Implementation and an Application for the Fermilab BTeV Level 1.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Readout & Controls Update DAQ: Baseline Architecture DCS: Architecture (first round) August 23, 2001 Klaus Honscheid, OSU.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
U N C L A S S I F I E D FVTX Detector Readout Concept S. Butsyk For LANL P-25 group.
Understanding Data Acquisition System for N- XYTER.
Data and Computer Communications Chapter 10 – Circuit Switching and Packet Switching (Wide Area Networks)
Some features of V1495 Shiuan-Hal,Shiu Everything in this document is not final decision!
Mr C Johnston ICT Teacher BTEC IT Unit 05 - Lesson 03 Network Topologies.
A Front End and Readout System for PET Overview: –Requirements –Block Diagram –Details William W. Moses Lawrence Berkeley National Laboratory Department.
A Pattern Recognition Scheme for Large Curvature Circular Tracks and Its FPGA Implementation Example Using Hash Sorter Jinyuan Wu and Z. Shi Fermi National.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
Mr C Johnston ICT Teacher
Design Criteria and Proposal for a CBM Trigger/DAQ Hardware Prototype Joachim Gläß Computer Engineering, University of Mannheim Contents –Requirements.
VLVnT09A. Belias1 The on-shore DAQ system for a deep-sea neutrino telescope A.Belias NOA-NESTOR.
Compton Trigger Design Update Tanja Horn Compton Working Group 6 June 2008.
June 17th, 2002Gustaaf Brooijmans - All Experimenter's Meeting 1 DØ DAQ Status June 17th, 2002 S. Snyder (BNL), D. Chapin, M. Clements, D. Cutts, S. Mattingly.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
New product introduction:
Protocol Layering Chapter 11.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
LIGO-G9900XX-00-M LIGO II1 Why are we here and what are we trying to accomplish? The existing system of cross connects based on terminal blocks and discrete.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Tiny Triplet Finder Jinyuan Wu, Z. Shi Dec
Computer Communication and Networking Lecture # 4 by Zainab Malik 1.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Monitoring for the ALICE O 2 Project 11 February 2016.
Evelyn Thomson Ohio State University Page 1 XFT Status CDF Trigger Workshop, 17 August 2000 l XFT Hardware status l XFT Integration tests at B0, including:
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
CMX: Update on status and planning Yuri Ermoline, Wojciech Dan Edmunds, Philippe Laurens, Chip Michigan State University 7-Mar-2012.
29/05/09A. Salamon – TDAQ WG - CERN1 LKr calorimeter L0 trigger V. Bonaiuto, L. Cesaroni, A. Fucci, A. Salamon, G. Salina, F. Sargeni.
Off-Detector Processing for Phase II Track Trigger Ulrich Heintz (Brown University) for U.H., M. Narain (Brown U) M. Johnson, R. Lipton (Fermilab) E. Hazen,
IRFU The ANTARES Data Acquisition System S. Anvar, F. Druillole, H. Le Provost, F. Louis, B. Vallage (CEA) ACTAR Workshop, 2008 June 10.
Eric Hazen1 Ethernet Readout With: E. Kearns, J. Raaf, S.X. Wu, others... Eric Hazen Boston University.
WPFL General Meeting, , Nikhef A. Belias1 Shore DAQ system - report on studies A.Belias NOA-NESTOR.
Lab 4 HW/SW Compression and Decompression of Captured Image
Link Layer 5.1 Introduction and services
Modeling event building architecture for the triggerless data acquisition system for PANDA experiment at the HESR facility at FAIR/GSI Krzysztof Korcyl.
Ch 13 WAN Technologies and Routing
Introduction to Programmable Logic
Electronics, Trigger and DAQ for SuperB
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Example of DAQ Trigger issues for the SoLID experiment
John Harvey CERN EP/LBC July 24, 2001
Network Processors for a 1 MHz Trigger-DAQ System
Embedded XINU and WRT54GL
Myrinet 2Gbps Networks (
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Proposal for a “Switchless” Level-1 Trigger Architecture Jinyuan Wu, Mike Wang June 2004

L1 Block Diagram L1 Switch TSO PP L1B Servers ST L1B BM CPU TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST BM CPU BM CPU BM CPU BM CPU BM CPU BM CPU BM CPU Pixel Data Time Stamp Ordering Cluster Processing Raw Data to L1B Data Sharing Segment Finding Event Building Track & Vertex Processing L1B Servers L1B GL1 Node Triplets to L1B Tracks & Vertices to L1B Trigger primitives to GL1

Use 4 bit arrays. There are 3 constraints total. More constraints help to eliminating fake tracks. It is possible to use bit-wise majority logic (such as 3-out- of-4) to accommodate detector inefficiency issues. Pentlet Finding An Example of Possible Algorithm Plane APlane CPlane EPlane BPlane D

Must Take Care Everything: Event building, an easy part. Rescaling flexibility. To L1B: –Raw data & cluster indices. –Triplets. –Tracks & vertices. Interface with COT farm nodes. To GL1. Types of modules build or buy – the less, the better. Types of links – the less, the better. Anything else?

Gang Things Together & Reroute Cables L1 Switch TSO PP L1B Servers ST L1B BM CPU TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST TSO PP ST BM CPU BM CPU BM CPU BM CPU BM CPU BM CPU BM CPU L1B Servers L1B GL1 Node TSO PP Events are built during TSO & PP processing. ST ST/L1B L1 Switch is gone, its functions are absorbed in TSO & PP stages. Triple links are not needed. L1B for raw data, triplets & etc.

System Interconnection Time Stamp Ordering Module Pixel Pre-processor Module Segment Tracker & L1B Module Buffer Manager Module L1B Server PC Worker Farm Node

TSO & PP: Event Building Time Stamp Ordering Module: 12 fiber in, 16 out (4 pairs/out) Pixel Pre-processor Module: 10 in, 8 out (4 pairs/IO) Each highway contains 10 Time Stamp Ordering Modules. (Receiving 10 cables, 12 fibers/cable, 120 fibers) Each TSO module combines 12 fibers and sends to 16 outputs based on BCO. A Pixel Pre-processor reads 10 inputs from 10 TSO modules, combines them and sends to 8 outputs based on BCO. Each output of PP contains whole detector (120 fibers) information, but 1/1024 BCO (or 1/128 hwy). BCO: HwyTSOPP

Questions: Will switch functions increase cost of TSO & PP? –Data rate at TSO & PP (raw data+cluster indices) is indeed several times higher than after ST (triplets). –Functional blocks are compiled and simulated for Altera Cyclone devices. See: Doc # 2907, –The switch functions can be absorbed into TSO & PP without noticeable cost increase. Is it a problem for the FPGA to receive many independent serial data sources? –It is not a problem. –Functional blocks are compiled and simulated for Altera Cyclone devices.

Merging Switch & User Functions User Func. User Func. User Func. User Func. User Func. User Func.

Segment Tracker, L1B etc. Segment Tracker & L1B Module Buffer Manager Module L1B Server PC Worker Farm Node A ST/L1B module receives two inputs from PP with whole detector information contained. A L1B Sever PC hosts 4 ST/L1B modules. Triplets are sent through BM to Worker Farm Nodes. See next page for detail operations.

ST/L1B and BM Interconnection BM Hash Sorter SDRAM SRAM-ZBT 128K x 32 ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic (4) Triplets are sent to BM and Worker Node (1) Raw data and cluster (x,y) are input. (2) Raw data and cluster (x,y) are stored in SDRAM. (3) Triplets are produced in ST, and stored in SDRAM (5) Worker Node finds tracks, vertices, etc. (6) Tracks, vertices, etc are sent back to L1B. (7) Raw data, cluster (x,y), triplets tracks, vertices, etc are read out from L1B.

A possible GL1 Interconnection BM Hash Sorter SDRAM SRAM-ZBT 128K x 32 ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic GL1 Interface SDRAM SRAM-ZBT 128K x 32 BM Hash Sorter SDRAM SRAM-ZBT 128K x 32 (1) GL1 are sent out by the Worker Nodes. (2) The TSO modules can be used as concentrators. (3) The same BM module is used as GL1 interface. GL1 Node Farm Nodes ST/L1B

Building Blocks: Option 1 Time Stamp Ordering Module Pixel Pre-processor Module Segment Tracker & L1B Module Buffer Manager Module

Building Blocks: Option 2 Time Stamp Ordering Module Pixel Pre-processor Module Segment Tracker & L1B Module Buffer Manager Module

Time Stamp Ordering Module Pixel Pre-processor Module

Rescaling available processing power Original baseline: –Switch located after the segment tracker hardware –Primarily so that event building could be accomplished (routing packets belonging to the same BCO to one CPU). –With this design, it was also easy to rescale the processing power of the track/vertex farm by adding or subtracting nodes: To suit processing requirements To accommodate a fault tolerant design Proposed “switchless” design: –Also allows rescaling of track/vertex farm –In addition, even pixel processors (PP) and segment trackers (ST) can easily be rescaled

Rescaling: Pixel Pre-processor Pixel Pre-processor can be added/removed 1-by-1. Minimum system at the early commissioning stage needs only 1 PP module. Broken PP cards during operation can be tolerated: just reroute data to other outputs on the TSO modules. ST and farm nodes in later stages scale accordingly.

Rescaling: Relative Numbers of Modules In addition to the rescaling of PP modules, the relative numbers of PP, ST and BM modules can be adjusted. Numbers of I/O: –PP module: 8 outputs. –ST module: 2 inputs, up to 3 outputs. –BM module: up to 4 inputs. Different interconnection configurations allow different ratios of number of modules. BM Hash Sorter SDRAM SRAM-ZBT 128K x 32 ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic

Rescaling: Relative # of ST & Nodes The BM/ST ratios: 1/4, 2/4, 3/4, 4/4, 6/4, 8/4, 12/4. BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B ST L1B ST L1B ST L1B ST L1B BM Worker BM Worker BM Worker ST L1B ST L1B ST L1B ST L1B BM Worker BM Worker BM Worker 64 ST, 32 Nodes 64 ST, 48 Nodes

Rescaling: If One Node Breaks: If one farm node breaks, data should be rerouted to other nodes. The loads are to be shared by other nodes. Both “push-only” and “request-read” schemes are possible. BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B If this node is broken. These ST will reroute data. These nodes runs with 25% additional loads. BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B BM Worker ST L1B ST L1B

Conclusion: Event building: support more algorithms. Rescaling flexibilities: support extra flexibilities. To L1B: everything is sent to one location. –Raw data & cluster indices. –Triplets. –Tracks & vertices. Interface with COT nodes: PCI(-express). To GL1: dedicated connections are possible. Types of modules: build 2, buy 2 (types of PC). Types of links: 1, (FiniteBand?). Anything else? Happiness of everyone:.

Rescaling: Using Cable Bundles: Each cable has 4 pairs. Use cable bundles to achieve extra routing flexibility. 4x4: single directional, 2x2: bi-directional

Time Stamp Ordering Module Input from PDCB, 12 Fibers Control & Monitor Port 16 Outputs to Pixel Pre-processor, 4 pairs each. Free Format Euro Card/ VME Format

Pixel Pre-processor Module 10 Inputs from TSO, 4 pairs each 8 Outputs to Segment Tracker & L1B, 4 pairs each. Free Format Euro Card/ VME Format Control & Monitor Port

Segment Tracker/L1B Module ST 2-in 2-out SDRAM SRAM-ZBT 128K x 32 L1B Logic

Buffer Manager Module BM Hash Sorter SDRAM SRAM-ZBT 128K x 32