Download presentation
Presentation is loading. Please wait.
Published byCrystal Singleton Modified over 9 years ago
1
LVDS switch: failure tolerance in 2D torus topology Gloria Torralba Kirchhoff-Institut für Physik - University of Heidelberg Dtp. Electronic Engineering – University of Valencia L-1 review, Lausanne 14 June 2002
2
Specifications of the current architecture of the L-1 trigger of LHCb The ring topology is not failure tolerant Possible solution: LVDS switch, bypassing the SCI signals when the node is out of order Scheme of proposed switch: –Node activity checker module –De-Skew module Contents
3
Introduction The future LHCb experiment will look at proton-proton interactions occurring at a rate of 40 MHz, producing a total data rate of about 40 TByte/sec. Level-1 trigger for the future LHCb experiment at CERN The architecture of the Level-1 trigger is a network farm of about 250 CPUs, has an input rate of 1.17 MHz and performs pattern recognition on the input data stream of 4 Gbyte/sec. It has been proved that the system is capable of sending data block of less than 200 Byte with more than 1 MHz.
4
Architecture Level-1 trigger for the future LHCb experiment at CERN RUs Trigger Processor Farm ~4 GByte/sec 1.17 MHz Decision Unit < 200 Byte From ODE subevent 1.Synchronous input rate of 1.17 MHz ( <200 Byte/subevent, about 20 sources). 2.Asynchronous output rate of 1.17 MHz (128 Bit/event). 3.Maximum latency for an event < 2 ms. 4.Scalability: total bandwidth and number of required computer nodes may increase.
5
Processor farm: 2D torus topology SCI TAGNet Detector data links RS SC Source Node Comp.Node Result Node Scheduler Source Node Comp. Node Result Node Scheduler SCH RS SCI (IEEE Std. 1596-1992) used as interconnect guarantees low latency and high throughput. The ring topology is not failure tolerant. Architecture of the Level-1 trigger for the future LHCb experiment at CERN
6
SCI (IEEE Std. 1596-1992) protocol SCI used as interconnection Transmission protocol SEND-ECHO. A normal transaction consists of two sub-actions, a request sub-action and a response sub-action. Together with each sub-action there is an echo packet returned to the requester. A node connected to an SCI interconnection may send many requests (up to 64), before a response is received. This transaction pipeline can cause responses to be returned out of order, and therefore a sequence number is needed to match a response with the corresponding request (the sequence number in the control word is a label that identifies a packet). REQUESTERREQUESTER sourcetarget request SEND sourcetarget request ECHO sourcetarget response SEND sourcetarget response ECHO sub - action of response RESPONDERRESPONDER sub - action of request
7
SCI (IEEE Std. 1596-1992) signaling CRCidle FLAG headerdata CLOCK address 4 ns (250 MHz) DATA (16 bits) 500 Mbytes/sec ( 0, 16, 64, 128 or 256 bytes) targetID flow control + command sourceID time of death + transactionID request or response packet idle TargetSourc com. flow idle echo packet CRC Point-to-point unidirectional communication from node to node. Symbols of 16 bits. Header (7 symbols) +CRC (1 symbol). The target symbol and the address symbols define the 64-bit SCI address. The data part may contain from 16 to 256 bytes. Transmission starts by putting the target word onto the output and setting the output flag high. The output flag is high while the packet is being transmitted. A CRC is attached to the end of the packet when the output flag goes low.
8
Hardware (Dolphin 66/64 SCI cards) The processor farm use the Dolphin adapters cards D339: PSB66 PCI Bus Interface and SCI Link Controller 3 connected via bus B-Link. When there is no traffic, a node receives idle symbols. Bypass FIFO is empty and the output consists of idle symbols only. When a node receives a packet, it checks the packet's destination (target ID): –Packets directed for other nodes are routed to the bypass FIFO and transmitted onward. –Packet that is directed for it is routed to the input FIFO (RX). The packet's header information is also used to generate a short "echo" packet, which is routed to the bypass FIFO, to be received by the packet's sender. PCI 64 / 66 PSB66 SCI LC3™ RXTX FIFO ATT SCI inSCI out SCI used as interconnection 16 buffer streams of 128 bytes for WRITE 16 buffer streams of 128 bytes for READ Interface SCI-PCI or SCI linc™
9
But The 2D torus topology used in the processor farm is not failure tolerant. The failing of one processor unit will leave the system unavailable in the rows and column where the faulty PU is located. And it has been proved that this configuration presents less overhead, it’s less expensive and the requirements of total bandwidth and latency for the LHCb application are sufficient (work A. Walsch). The failure tolerance can be achieved in the 2D torus farm by inserting an external hardware between the SCI ring and the Dolphin SCI PCI cards. We have called it LVDS switch. SCI used as interconnection
10
LVDS switch How to know that the Node is off? It could be –PCI off –PCI-SCI board off Doesn’t it matter: flag headerdata CRC idle clock address 4 ns (250 MHz) 16 bits SCI SIGNALING 500 Mbytes/sec (0, 16, 64, 128 or 256 bytes) Failure tolerance in 2D torus topology: switch targetID, flow control+command, sourceID, time of death + transactionID −No SCI output (enough checking output- activity?): −Check flag signal or Check clock signal ? Node of the processor farm with the D339 SCI card and the proposed switch attached
11
Dolphin SCI PCI adapter card bypass de - skew delay module SCI inSCI out Node Activity - checker: t= 10 s,1ms,10 ms timeout SCI in [flag] S 0 [flag] S 1 [flag] select SCI out [flag] flag select S0S0 S 1 SCI in adjustable delay LVDS LC3 connector 16 bits + clock + flag LVDS switch Switch externally powered. It has run reliable. Bypass all the information arriving to the node if it failures. The most simple switch is a multiplexer selectable by the node activity: process: node_activity ( time_check) begin check S 0 after timeout if S 0 [ flag ] = ‘1’ - - node ON select = ‘0’ ; S 0 => SCI OUT else - - node OFF select = ‘1’ ; S 1 => SCI OUT end process; 64 / 66 PCI bus Proposed LVDS switch *Only represented the flag signals of the SCI input and output MUX time_check
12
Node activity checker module This module has to detect if the node is ON/OFF. The most simple check is to analyze the SCI OUT in t = time_check. –If the node is OFF (it could be a cause of PCI off, SCI link off, etc.), the node will produce no output signal. Another possibility is to check the input FIFO (RX) of the Link Controller. –If the node is OFF this memory will be full and saturated, “deadlock situation”. This option suppose implement the SCI protocol. More complicated way. Node activity can be checked in time_check = 10 s, 1ms, 10ms ? (latency < 2 ms and time of CPU 0.5 ms, so it could be enough 1 ms). This module will implement also the control of the LVDS switch. Components of the proposed switch
13
Node activity checker module Timeout fixed? –Time for receive data, process data (accept or re-send), send echo packet. –More?. Components of the proposed switch DATA (bytes) Flag time (symbols x clk) 08 x 4ns32 ns 1616 x 4ns64 ns 6440 x 4ns160 ns 12872 x 4ns288 ns 256136 x 4ns544 ns echo4 x 4ns16 ns Active flag signal for several data sizes time_check
14
De - Skew Module The parallel signals being transmitted simultaneously will not reach the receiving node at the same time due the capacity load of material crossing, even more when the distance between node is > 2 meters, as in the ~256 SCI node 2D torus topology. PCBconnector cableconnectorPCB+package 4 ns (80 cm cable) Maximum skew allowed in the standard is 600 ps since bit width is nominally 2 ns. This problem is handled by de-skewing circuitry at the receiving SCI link, but not in the Dolphin chips. The de-skew module proposed will match the signal transitions by adding an individual delay to each input signal. Granularity 500 ps. delay [max : 0] SCI input [17 : 0] SCI input delayed [17 : 0] DE - SKEW module Components of the proposed switch
15
State diagram ON stay: Waiting for trigger signal (time_check, events coming, etc. ) to begin check. OFF stay: Go to this stay when it has been detected that the node is OFF. SCI out Normal mode Bypass mode select Node activity checker Switch avoiding glitches on SCI out A0 A1 Z
16
Others a.What happens if the node simply doesn’t work correctly? 1)There are outputs signals (flag, clk, data) but they are corrupted. 2)First I should establish the error percentage in the SCI link. 3)It’s enough check electrical signals? b.What happens with the data that are directed for one node which is OFF? (TargetID=NodeID; data are crossing the ring and any node will accept them). 1)Lost of data (after time of death). c.About the technology: 1)It is expected that the design will have very critical path requirements a cause of the de-skew process. 2)In order to test process granularity, one simple version of the De-Skew Module (with no programmable delays) is being simulated and synthesized in several processes (AMS technology 0.8, 0.6. and 0.35 microns).
17
Others d.Some preliminary results of granularity in several processes: 0.8 m0.6 m0.35 m (Delay/unit) LH 540 ps280 ps220 ps (Delay/unit) HL 570 ps300 ps220 ps For bit width 3 ns (SCI is running at 166MHz in the processor farm ), the minimum delay that could be added is 300 ps. unit G. Torralba. L-1 review, Lausanne 14 June 2002
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.