Download presentation
Presentation is loading. Please wait.
Published byWhitney McGee Modified over 8 years ago
1
A demonstration of a Time Multiplexed Trigger for CMS Rob Frazier, Simon Fayer, Rob Frazier, Geoff Hall, Christopher Hunt, Greg Iles, Dave Newbold, Andrew Rose (Imperial College & University of Bristol) 28 September 2011
2
Overview How is a Time Multiplexed Trigger different from a Conventional Trigger? The demonstration system Communication over Ethernet - IPbus 2
3
3 Concepts: Time Multiplexed Trigger Exploring alternative method of triggering. Time-Multiplex incoming data so that the entire calorimeter (~5Tb/s) can be processed in a single FPGA. Akin to the DAQ event builders, but all must be finished in ~1μs
4
Time Multiplex 4 Can time multiplex in either η (strips) or φ (rings). Both feasible…
5
Past Future -5 -4 -3 -2 0 +1 +2 Time / φ η (optical links) New Data Processing engine. Events are streamed into processing engine 5 To next stage Intrinsically very efficient use of logic (i.e. new calculation on every clock cycle – no waiting for data) Essentially a 5Tb/s, low latency (1μs) image processor
6
Demonstrator System Hardware : Part 1 NAT Europe MCH – Provides GbE and IPMI Vadatech VT892 – Dual star topology with 12 full-size double-width cards. – Modified to have vertical airflow. – MCH slot 1: Standard Services GbE, IPMI, etc – MCH Slot 2: Experimental Services clock distribution, fast control & feedback DAQ 6 See Eric Hazen’s talk: AMC13
7
Demonstrator System Hardware : Part 2 MINI-T5-R2 – Double Width AMC Card – Virtex 5 TX240T – Optics IN: 160 Gb/s (32x 5Gb/s) OUT: 100Gb/s (20x 5b/s) – RAM Dual QDR II, 2x72Mb, 2x 9Gb/s on each port (R/W) – MMC Atmel AT32UC3A3256 – AMC 2x Ethernet, 1x SATA 4x FatPipe, 1x Ext FatPipe 7 Thanks to Jean-Pierre Cachemiche for MMC code. Ported to Atmel AVR32 MicroController by Simon Fayer
8
Test System 8 PP0 MINI-T5 (Main Processor) x 2 Main-Processors (14 in full system) Patch Panel ECAL & HCAL Trigger Primitive Data would enter here in final system AMC13 PP1PP2 PP3PP4PP5PP0 x 24 Pre-Processors (x36 in full system) MINI-T5 (6x PreProcessors) MINI-T5 (Main Processor) MCH Standard Services Custom Services
9
Simulates 6x Pre ‑ Processor Cards. Possible because we only need 2 Main ‑ Processors Simulates 6x Pre ‑ Processor Cards. Possible because we only need 2 Main ‑ Processors Main ‑ Processor Cards Clock distributed via MCH Patch panel. One fibre from each Pre ‑ Processor routed to a Main ‑ Processor Location for AMC13 Clock, Fast Control & Feedback, DAQ Location for AMC13 Clock, Fast Control & Feedback, DAQ 9 AMC13
10
Test system internals 10 Time Mux RAM #0 Align Links Algorithm DAQ RAM #1 RAM #2 RAM #9 Time Mux x 24 Pre-Processors Simulates 240 input links (1/4 of CMS Calorimeter Trigger) x 24 Links To DAQ via AMC13 or alternatively Ethernet To Global Trigger
11
11 All 24 channels aligned Header PP Ident Header PP Ident RAM #0 (3 words) RAM #0 (3 words) RAM #1 (3 words) RAM #1 (3 words) CRC Checked then zeroed by fw 8B/10B Commas
12
Where next? Current MINI-T5 can handle ¼ of CMS Calorimeter Trigger – Assumes a time multiplexing period of 14bx – Require x4 bandwidth to place entire Calo Trigger into a single FPGA – Link speed x2 (Virtex7 with 10Gb/s), Number of links x2 (48-72 Rx) 12 V7 MicroPOD™: 8.2x7.8mm with LGA electrical interface MiniPOD™: 22x18.5mm with 9x9 MegArray™ connector MAXI-T7
13
IPbus A method to communicate with cards over Ethernet 13
14
Communication Requirements Primary control path in MicroTCA is Ethernet – What protocol should we use UDP, TCP or something else? Requirements – Robust – Scalable – Reasonable bandwidth (make good use of 1Gb/s crate interface) – Relatively simple Not too onerous (10% of design, not 90%). Maintainable over 10 years with different versions of the tools and different people. – Portable from one card to the next 14
15
Communication Ideas Primary advantage of TCP is not reliability, but throughput – Imagine UDP with retry capability – Ethernet has a large latency (packet based, CRCs, etc) i.e. single transactions will be relatively slow – TCP allows multiple packets in flight simultaneously and ensures that all packets arrive, and in the correct order Ideal, but powerful CPU or complex firmware core to reach 1Gb/s Separate commands still slow (i.e. do A, then B, then C) 15 Embedded CPU on the card either within FPGA or external Can get quite complex (i.e require CPU, RAM, Flash, etc)
16
IPbus Originally created by Jeremy Mans et al in 2009/2010* Protocol describes basic transactions needed to control h/w – A32/D32 Read/Write, Block Transfers, Auto Address Incrementing – Simple concatenation of commands – Single packet may contain write followed by block read 16 UDP, or TCP EMAC PHY I2C Core GTX Core DAQ Core Transaction Engine *John Jones implemented something similar
17
IPbus Firmware Resource usage: – Xilinx SP601 Demo board Costs £200/$350 Small Spartan 6 (XC6LX16-CS324) Uses 7% of registers, 18% of LUTs and 25% BRAM – Block RAM usage may increase slightly for v2.0 protocol Additional features: – Firmware also includes interface to IPMI controller 17 Ethernet IPMI Firmware by Jeremy Mans Dave Newbold
18
The IPbus Suite Overview MicroHAL (Based on Redwood) – C++ Hardware Access Library – Highly scalable and fast – Hierarchical with array capability Mimic firmware structure – Automatic segmentation of large commands e.g. block R split up – Software has full knowledge of registers Map onto database 18 Redwood Explorer to access any register via simple web interface Software by Andy Rose & Christopher Hunt
19
The IPbus Suite Overview Control Hub – Single point of contact with hardware Allows multiple applications/clients to access a single board – Reliable and scalable – Built on Erlang. Concurrent telecom programming Scales across multiple CPU cores – Automatic segmentation of large commands (e.g. block R split up) 19 Software by Rob Frazier
20
Scalability with Redwood and the Control Hub 20
21
IPbus Suite Overview PyChips – Python-based user-facing Hardware Access Library – Simple & easy interface – Great for very small or single-board projects – Cross-platform: Windows, Linux, OS X, etc – No dependencies except the Python interpreter itself 21 Software by Rob Frazier
22
IPbus Test System Substantial test system – 3 MicroHAL PCs – 1 Control Hub PC – 20 IPbus clients – Currently 40Mb/s per card 480Mb/s per crate Increase this to 100-200Mb/s with jumbo frames (x6) and firmware improvements Consider moving to TCP for 1Gb/s – Reliability 1 in 189 million UDP packets lost OK for lab system, but v2.0 of IPbus will have retry mechanism 22
23
Links IPbus SVN & Wiki hosted on CACTUS project – Website http://projects.hepforge.org/cactus/index.php – HepForge repository http://projects.hepforge.org/cactus/trac/browser/trunk – MicroHAL The Software User Manual, Instant Start Tutorials and Developers Guide http://projects.hepforge.org/cactus/trac/browser/trunk/doc/user_manual /Redwood.pdf?rev=head&format=txt http://projects.hepforge.org/cactus/trac/browser/trunk/doc/user_manual /Redwood.pdf?rev=head&format=txt Firmware Chief: contact Dave Newbold: dave.newbold@cern.ch Software Chief: Rob Frazier: robert.frazier@cern.ch MicroHAL & Redwood: andrew.rose01@imperial.ac.uk 23
24
Next steps Load RAMs with real events from CMS and pass them through the algorithms under development. Virtex 7 design with the necessary 10G links underway Develop and release IPbus v2.0 – Allows access to IPbus via IPMI – Implements retry mechanism for UDP transport IPbus Software Suite v2.0 – Code fairly mature, but improvements and bug – fixes will continue as it becomes more widely used. 24 Still relatively new Welcome feedback on this. e.g. performance, user interface, etc
25
Extra 25
26
Just one example - hierarchical design 26 +
27
Performance – Currently 40Mb/s with full structure albeit with multiple MicroHAL instances on same PC and the ControlHub (i.e. likely scenario in CMS) Scales linearly with number of cards (i.e. 480Mb/s) for a crate – Target of 100+ Mb/s with UDP (Require TCP/IP for 1Gb/s) Reducing copy stages in firmware from 5 to 3 Moving to Jumbo frames 1.5kB to 9kB – x6 27 PC used to synthesise firmware PC used to synthesise firmware
28
Reliability Private network – All unnecessary network protocols switched off (spanning tree, etc) Sent 5 billion block read requests – 10 billion packets total, 53 went missing. – 350 * 32-bit block read – 7 Terabytes IPbus payload data received – 19 IPbus clients used in test – Packet loss averages at 1 in 189 million UDP packets 28 Retry mechanism
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.