The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet Introduction Myrinet Overview Tests of the Switching Fabric Event Building Studies Future Work and Conclusions Frans Meijers CERN/EP on behalf of the CMS DAQ group CHEP2000, Padova Italy, Feb 2000
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Introduction DAQ architecture and EVB parameters Event building by switches. Crossbar EVB traffic shaping: barrel shifter Banyan network A multistage 1024 port switch The CMS DAQ system
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb DAQ architecture and EVB parameters 100 kHz 1 Mbyte 1 Tbps Detector Front-end Computing Services Readout Systems Builder and Filter Systems Event Manager Builder Networks Level 1 Trigger Run Control kbyte Level-1 Maximum trigger rate Average event size Builder network (512x512 port) aggregate throughput Number of Readout Units Average event fragment size High Level Trigger acceptance %
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Event building by switches. Crossbar The maximum switch load for random traffic is about 63% (large N limit) due to head-of-line blocking Higher efficiency: queues at input and/or outputs ports traffic shaping (example: barrel shifter 100%) NxN matrix N 2 number of crosspoints
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb EVB traffic shaping: barrel shifter sources emit to mutually exclusive destinations in a cycle works only for fixed size chunks needs synchronisation
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Banyan network Example : 8x8 made of 3 stages 2x2 (8=2 3 ) single path per connection suffers from internal blocking number of cross points : N log 2 N For random traffic (no intermediate IQ and no OQ): efficiency drops with s, N; for “infinite” N, eff. 20% There exists a non-blocking barrel-shifting pattern
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb A multistage 1024 port switch Banyan topology: NxN out of nxn N=n s basic unit: 8x8 crossbars 3 stages: 512x512 need 192 crossbars in total Important to study multistage switches
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS DAQ system F U Computing and Communication Services EVM LV1 R U Detector front-end readout Ctrl
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Myrinet overview Myrinet features Myrinet switches Network Interface Card
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Myrinet is a System Area Network (SAN) point to point links, byte wide, full-duplex, 1.3 Gbps per direction, very low error rate packet structure: routing header, payload and tail each crossbar switch strips leading byte from routing header wormhole routing (versus store-and-forward) no buffering, low latency, arbitrary length packets byte based flow control (STOP/GO) no packet loss inside switching fabric 3Q 2000: link speed from 1.3 Gbps to 2.6 Gbps Myrinet features PAYLOAD ROUTING HEADER CRC STOP GO
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Myrinet switches M2M-OCT-SW8 32 ports 8 times 4x4 crossbars Large switch fabric built out of 4x4 crossbar elements now 8x8 crossbar available as basic element
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Network interface card Myrinet SAN link 32 or 64 (33 or 66 MHz) host DMA RISC Pkt Interface Memory AddressData LANai7 Send DMA 64 (66 MHz) PCI Bridge 66 MHz 2 MByte Recv DMA 8 (80 MHz, NRZ) 8 M2M-PCI64 Developed a custom Myrinet Control Program controls DMA engines implements low-level communication protocol
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Switch tests Set-up for switch test Traffic conditions tested Point-to-point 1x1 Parameters point-to-point 1x1 Point-to-Point NxN - Mutually exclusive paths Block on output port Block on internal switch Random Traffic
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Demonstrator set-up for switch tests 32 nodes Linux PCs PC: 450 MHz PII BX PCI 33 MHz/32bit Myrinet switch: M2M-OCT-SW8, NIC: M2M-PCI64[A] two-stage Banyan network out of 4x4 crossbars sources destinations
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Traffic conditions tested Random traffic Point-to-point traffic (fixed destinations)
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Point-to-point 1x1 full host - NIC DMA: limited by PCI (33 MHz/32bit) partial host - NIC DMA: NIC memory - link: full packet host - NIC: only headers limited by SAN link Allows to load switch to maximum PCI link
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Parameters point-to-point 1x1 Partial host - NIC DMA above 1 kbyte: linear behaviour below 1 kbyte: plateau 5 s (NIC-host communication) speed: 128 Mbyte/s -> PCI speed speed: 141 Mbyte/s -> 92% link eff. Full host - NIC DMA time per packet = overhead + size / speed
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Point-to-point NxN - Mutually exclusive paths [d = 4*(s%4)+s/4, s=0-15] As expected; Aggregate throughput through the switch is linear in N
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Block on output port measured at source #0 Force m (=1,2,3,4) sources on the same destination: Each source gets 1/m of V max
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Block on internal switch Force 2 sources on different destinations, but through same intermediate path: As expected; plateau at V max /2 measured at source #0
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Random traffic measured at destinations Efficiency: 4x4: 69 % expect 68% 16x16: 51 % limited by head-of-line blocking sources send, independently, to a random destination according to a uniform distribution 1x1 4x4 16x
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Event building studies EVB demonstrator set-up Event building protocol Variable size event fragments Event building performance Event building: scaling behaviour Traffic shaping EVB performance with traffic shaping performance for variable size event fragments EVB with traffic shaping: scaling behaviour Traffic shaping: time evolution
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb EVB demonstrator set-up 32+1 Linux PCs [ 450 MHz PII BX PCI 33 MHz/32bit] Myrinet switch: M2M-OCT-SW8, NIC: M2M-PCI64[A] 16x16 two-stage Banyan network out of 4x4 crossbars Myrinet between RUs and BUs (full duplex). N-to-N traffic Fast Ethernet between BUs and EVM. N-to-1 traffic No emulation of Level-1 trigger
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Request EvtId BU EVMRU EvtId Request Data Send Data Clear EvtId Event building protocol level1 Several EvtId messages are grouped in a single Ethernet packet Myrinet
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Variable size event fragments Log-normal distribution example: Average = 2 kbyte, RMS = 2 kbyte mimics CMS data readout EVB Builder Units Readout Units
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Event building performance Fragment rate per node † 16x16: For 2 kbyte fragments: 30 kHz No traffic shaping Fixed size event fragments 2k unstable 4x4 8x8 16x16 1x1 results: 1x1 is close to point-to-point Performance decrease from 4x4 to 8x8 to 16x16, as expected from small sizes: overhead 7 s † Fragment rate per node = level-1 rate
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Event building - scaling behaviour take average fragment size of 2 kbyte also variable size fragments results: For variable size reduced performance, as expected No scaling in N Need simulation for large N ?
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Traffic shaping Sources divide fragments into fixed size packets (blocks) and cycle through all destinations Inspired by ATM rate division (block size is 53 bytes) Should work for large N multistage switch as well Implementation: Performed by NIC control program Block size set to 4 kbyte (30 s cycle) Barrel shifter without external synchronisation (Myrinet back pressure by HW flow control) Packets can be (partially) empty
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb EVB performance with traffic shaping fixed size event fragments 4k results: close to point-to-point fragment rate per node 16x16: for 2 kbyte fragments: 65 kHz 2k
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Performance for variable size event fragments 2k decrease of efficiency with larger RMS of fragment size distribution (in agreement with Monte Carlo) [†with full host-NIC DMA about 80 Mbyte/s or 40 kHz] Fragment rate per node for nominal average of 2k and RMS 2k †: 60 kHz
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb EVB traffic shaping - scaling behaviour EVB with traffic shaping: approximate scaling
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Traffic shaping - time evolution (I) BS cycling rate * block size 23:00 ? throughput dropped traffic shaping barrel shifter stayed in sync ? 2 hours (= cycles, 10 Tbyte moved)
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Traffic shaping - time evolution (II) 1 hour (= 10 8 cycles) BS cycling rate * block size perturb system : 1: slow down RU1: all BU’s reduced rate 2: slow down BU1: only BU1 reduced rate 1 2 traffic shaping barrel shifter stays in sync EVM RU BU
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Future work and conclusions Future work Conclusions
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Future work Evaluate Myrinet 2000 available 3Q 2000 link speed from 1.3 Gbps to 2.6 Gbps switches based on 8x8 crossbars as elementary units Further study of traffic shaping Simulation Extrapolate to large systems
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Conclusions Event builder demonstrator 16x16 based on Myrinet multistage switch and Linux PCs established. Performed systematic switch studies. As expected. Measured event building performance without traffic shaping: no scaling, as expected with traffic shaping: approximate scaling For nominal event fragment sizes with average and RMS of 2 kbyte achieved about 60 kHz trigger rate or 120 Mbyte/s per node (almost 2 Gbyte/s aggregate) That is, today, a factor two off from CMS needs, assuming scaling. Measurements provide parameters for simulation of large scale (512x512) systems
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Extra Material
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb Multi-step Event Building Step 1: at 100 kHz Rejection factor 10 with 0.25 of the data from High Level Trigger Step 2: at 10 kHz Remaining 0.75 of the data Throughput reduced by x0.75=0.33, ie factor 3 At the cost of control complexity and increased latency With link speed of 1 Gbps need factor 2 from multi-step event building for 100 kHz level-1 rate (assuming 100% efficient switch ) If higher speed links in , then single-step event builder 100 kHz 10 kHz