Department of Particle & Particle Astrophysics Modular Data Acquisition Introduction and applicability to LCLS DAQ Michael Huffer, Stanford Linear Accelerator Center December 14, 2006 Representing: Ryan Herbst Chris O’Grady Amedeo Perazzo Leonid Sapozhnikov Eric Siskind Dave Tarkington Matt Weaver
Department of Particle & Particle Astrophysics 2 Outline Introduction – Concepts – Architecture – Implementation Examples… – Petabyte scale, low access latency storage for SLAC Computer Center – LSST camera data acquisition system Application design Discuss applicability for LCLS Data Acquisition?
Department of Particle & Particle Astrophysics 3 The Module Is the basic building block of the architecture Specified as: – A hardware design (schematics, BOM & layout guidelines) – A series of base services implemented as: VHDL (interfaced through core IP libraries) Software (OO interface - provided through header files and shared libraries) – documentation Module neither specifies or constrains application’s physical partitioning model Architecture specifies three different types of modules – CEM (Cluster Element Module) Provides a processor + RTOS (the Cluster Element) Provides many channels of generic, high speed, serial I/O Provides commodity network interface (10 GE & 100-Base-T Ethernet) – f CIM (Fast Cluster Interconnect Module) Provides 10 GE connectivity for up to 64 Cluster Elements – s CIM (Slow Cluster Interconnect Module) Provides 100 Base-T & 1 GE connectivity for up to 64 Cluster Elements
Department of Particle & Particle Astrophysics 4 Cluster Element Module (CEM) Two variants… footprint: – ~ 50 cm 2 power: – ~ 7 watts total + – ~ 3/4 Watt/port JTAG reset options reset PHYs (4-20) 10 GE100B-T CE To fCIM To sCIM A Cluster Element (CE) is a processor Each lane operates up to 10 Gb/sec CEM (1 channel) reset options JTAGreset 100B-T10 GE 100B-T PHYs (0-16) Common to both elements CE May mix and match lanes to each CE CEM (2 channel) Two variants… – One channel CE – Two channel C2
Department of Particle & Particle Astrophysics 5 Fast Cluster Interconnect Module (fCIM) footprint: – ~ 144 cm 2 Power: – ~ 1 ½ Watt/port – 64 elements ~ 110 watts 1 GE 10 GE (0 – 8) fCIM Supports a variety of electromechanical standards X2 & XENPACK MSA CX4 Long haul & short haul fibers To CE Is a collection of managed switches To management network 10 GE (0 – 8)
Department of Particle & Particle Astrophysics 6 Slow Cluster Interconnect Module (sCIM) 100B-T (2 – 64) sCIM 1 GE Supports a variety of electromechanical standards To management or control network To CE Is a collection of unmanaged switches footprint: – TBD (less then f CIM) Power: – TBD (much less then f CIM)
Department of Particle & Particle Astrophysics 7 32 Element Cluster To management or control network sCIM To data network 1 GE CE fCIM 10 GE fCIM is managed control network
Department of Particle & Particle Astrophysics 8 CEM block diagram Left side MFD FPGA (SOC) 200 DSPs Lots of gates Xilinx XC4VFX60 Fabric clock Right side MGT clock Right side PPC-405 (450 MHZ) Right side Configuration memory 128 Mbytes) Samsung K9F5608 Right side Memory (512 Mbytes) Micron RLDRAM II Right side Multi-Gigabit Transceivers (MGT) 8 lanes Left side 100-baseT Reset Reset options JTAG
Department of Particle & Particle Astrophysics 9 Base services provided by CEM “Fat” Memory Subsystem – 512 Mbytes of RAM – Sustains 8 Gbytes/sec – “Plug-In” DMA interface (PIC) Designed as a set of IP cores Designed to work in conjunction with MGT and protocol cores Bootstrap loader (with up to 16 boot options and images) Interface to configuration memory Open Source R/T kernel (RTEMS) 10 GE Ethernet interface 100 base-T Ethernet interface Full network stack Utility software to manage I/O
Department of Particle & Particle Astrophysics 10 Extended services provided by CEM Pretty Good Protocol (PGP) – Physical interface is serial with 2 LVDS pairs/lane) – Point-to-Point connectivity – Allows clock recovery – Full duplex Symmetric capabilities in either direction from either end – Provides reliable frame (packet) transmission and reception – Deterministic (and small) latency Lightweight “on the wire” overhead Specifies 4 VCs in order to provide QOS – Implemented as an IP core Small footprint Interface hides user from protocol details and implementation Implemented on CE (through the conical model described above) – Asynchronous Extensible in both bit-rate and # of lanes Flash Memory Module (FSM) – Provides as much as 256 Bytes/CE of persistent storage – Low latency/high bandwidth access(1 Gbyte/sec) – Interfaced using PGP
Department of Particle & Particle Astrophysics 11 Cluster Element as used in petacache FSM CE 10 GE To/From fCIM 100B-T To/From sCIM From management network PGP Mbytes/se c To client nodes on client network Called a SAM (Storage Access Module) 65 Gbyte flash memory (Flash Storage Module) PGP core & interface Application specific
Department of Particle & Particle Astrophysics 12 Cluster Element as used in LSST DAQ CE 10 GE To/From fCIM 100B-T To/From sCIM From Camera Control System on CCS network To client nodes on DAQ network Called a RNA (Raft Network Adapter) Services 9 CCD mosaic 288 Mbytes/sec Application specific Raft Readout System PGP (fiber-300 M) Mbytes/sec PGP core & interface In cryostat (Replicated 25 times)
Department of Particle & Particle Astrophysics 13 The “Chassis” Accepts DC power Passive Backplane 8 U X2 (XENPACK MSA) 1U Fan-Tray 1U Air-Outlet 1U Air-Inlet High-Speed Network Card (8U) Daughter board Card (4U)
Department of Particle & Particle Astrophysics 14 Chassis Physical Interfaces (19”) CCS network card 1 GE client network card 10 GE to odd raft to even raft to CCS to DAQ network Science Array (12) Guider Array (2) WFS Array (2) 8U Number is TBD bank Daughter cards replicated twice for: Redundancy & simulation
Department of Particle & Particle Astrophysics 15 Partition problem into three domains: – Device/sensor specific Read-Out (RO) – Device/sensor monitoring and configuration – Data transport and processing Define a consistent and regular interface between RO & CE systems – independent of device/sensor Define CE customization – How many lanes of I/O necessary between RO and CE? – What are the protocols on these lanes? – Specify data processing How should this processing be partitioned between software and hardware? CE number – What is the underlying, inherent, parallelism of the data (if any)? – How many CPU cycles and gates should be dedicated per data byte? processing effort/byte Define physical partitioning of design – How many boards? – What type and number of modules on a board? – Incorporate with custom logic? A prescription for application design The later two are within the realm of the CE
Department of Particle & Particle Astrophysics 16 Typical usage patterns CE RO CE RO CE RO Many different types of devices Physically separated Processing/byte/device is high Homogeneous devices Perhaps physically separated Processing/byte is high Many different types of devices Physically separated Processing/byte/device is low