Readout Processing and Noise Elimination Firmware for the Fermilab Beam Loss Monitor System Wu, Jinyuan C. Drennan, R. Thurman-Keup, Z. Shi, A. Baumbaugh and J. Lewis Fermilab, April 2007
The Digitizer Card for the Fermilab Beam Loss Monitor System Beam loss input signals from ion chambers are integrated and digitized. Sliding sums are accumulated and compared with pre-loaded thresholds. Over threshold in several places causes beam abort based on pre-defined setting. Beam loss signals are filtered and “de- rippled” for display purposes. Sequence is controlled by “Seq128” block. ADC 21 s/sample RAM Fast Sliding Sum A>B Slow Sliding Sum Very Slow Sliding Sum Immediate Sliding Sum Threshold I Abort Logic A>B Threshold F A>B Threshold S A>B Threshold V CIC Sums De-ripple Process Ion Chamber Input Seq128
The Problem: 3 60Hz AC Rectify noise from power supply using 3-phase 60Hz AC are picked up by the input cable laying in the accelerator tunnel. Time Domain Frequency Domain ADC 21 s/sample
Filter Functions Sliding Sum Cascaded Integrator Comb (CIC) Sum of 2nd Order The CIC sum is a sliding sum of sliding sums. The frequency response of CIC sum is a sinc 2 (x) function that has 2nd order zeros and better stop band suppression. First 360 Hz Frequency 21 s/sample 124 samples
Filtering Works, But Partially Noises >360Hz, the dominating portion, are filtered out in both filter functions. CIC sum is a lot smoother than the sliding sum. But small signals are still buried under ripples of 60 and 180 Hz. Sliding Sum CIC Sum Signals
Why Not Filtering Further? Filtering is an averaging process over many periods. There is not much time after reset. The noises before the accelerator ramping and after have different amplitudes and shapes. A “De-Ripple” algorithm has been developed. Ramping
De-ripple Process (1.1) Waveform Extraction, Storage and Validation Waveform Buffer Page 0 Waveform Mean Waveform Buffer Page 1 Waveform Mean The CIC sum is stored into the waveform buffer and accumulated for the waveform mean.
De-ripple Process (1.2) Waveform Extraction, Storage and Validation Waveform Buffer Page 0 Waveform Mean Waveform Buffer Page 1 Waveform Mean When it shows a good periodic property, the waveform becomes valid.
De-ripple Process (1.3) Waveform Extraction, Storage and Validation Waveform Buffer Page 0 Waveform Mean Waveform Buffer Page 1 Waveform Mean If the data is non-periodic, the waveform becomes invalid.
De-ripple Process (2) Waveform Subtraction Waveform Buffer Page 0 Waveform Mean Waveform Buffer Page 1 Waveform Mean -- The waveform mean is subtracted to preserve DC component in the final result. The De-rippled Sum
Results of De-ripple Process Those otherwise hard- to-see small signals now become visible. DC and very slow signals are also preserved.
Filter Implementation Recursive Implementation Recursive != IIR Non-Recursive Implementation Finite Impulse Respond (FIR) Infinite Impulse Respond (IIR) Possible Yes NO Resource Friendly x[n] s[n] + -x[n-K] x[n] The non-recursive implementation needs: 124 memory fetches, 124 additions and more ops for longer sum lengths. The recursive implementation needs: 1 memory fetch, 2 add/sub operations regardless sum length. Sliding Sum
Recursive Implementation of CIC Sum The non-recursive implementation needs: 248 memory fetches, 248 multiplications, 248 additions and more ops for longer sum lengths. + s[n] -x[n-K] x[n] + y[n] -s[n-K] + u[n] -2x[n-K] x[n] + y[n] x[n-2K] x[n] y[n] *h1 *h2 *h[K] The CIC sum constructed as a sliding sum of sliding sums: 2 memory fetches, 0 multiplications, 4 add/sub ops for any sum length. The re-formulated CIC sum uses the raw data buffer rather than a separate buffer. CIC Sum
Process Sequencing Sum1Sum2Sum3Sum4 Sum1Sum2Sum3Sum4 Sum1Sum2Sum3Sum4 Sum1Sum2Sum3Sum4 CH0 CH1 CH2 CH3 CH0 CH1 CH2 CH3 CIC1CIC2 CIC1CIC2 CIC1CIC2 CIC1CIC2 WF SUB WF E,S,V WF SUB WF E,S,V WF SUB WF E,S,V WF SUB WF E,S,V Sum1Sum2Sum3Sum4CIC1CIC2 WF SUB WF E,S,V Sum1Sum2Sum3Sum4CIC1CIC2 WF SUB WF E,S,V Sum1Sum2Sum3Sum4CIC1CIC2 WF SUB WF E,S,V Sum1Sum2Sum3Sum4CIC1CIC2 WF SUB WF E,S,V Flat design is fast but uses a lot of logic elements. Sequencing the process saves logic elements significantly. Partially flat and partially sequence design sometimes is a better arrangement in FPGA.
BLM DC Process Sequencing The processes of calculating sliding sums and CIC sums are fully sequenced. The de-ripple processor is flat for the process path. But it operates sequentially for 4 channels. Fully Sequencing Partially Flat
FPGA Process Sequencing Options Program Type Program Length (CLK cycles) ReprogramResource Usage Finite State Machine (FSM) Fixed Wired 10HardSmall Enclosed Loop Micro-Sequencer (ELMS) Memory Stored Program EasySmall Microprocessor (MP) Memory Stored Program >1000EasyLarge
ELMS– Enclosed Loop Micro-Sequencer Loop & Return Logic + Stack Conditional Branch Logic Program Counter ROM 128x 36bits A Reset CLK Control Signals PCControl SignalsOpration LDR1, #n LDR2, #addr_a LDR3, #addr_X LDR7, # BckA1LDR4, (R2) INCR LDR5, (R3) INCR MULR6, R4, R5 0a EndA1ADDR7, R7, R6 0b DECR1 0c BRNZBckA1 Special in ELMS Supports FOR loops at machine code level PC+ROM is a good sequencer in FPGA. Adding Conditional Branch Logic allows the program to loop back. Loop & Return Logic + Stack is a special feature in ELMS that supports FOR loops at machine code level. Allows jump back as in microprocessors
ELMS – Detailed Block Diagram User Control Signals FORBckA1 EndA1 #n LDR2, #addr_a LDR3, #addr_X LDR7, #0 BckA1LDR4, (R2) INCR2 LDR5, (R3) INCR3 MULR6, R4, R5 EndA1ADDR7, R7, R6 LDR8, R7 The Stack supports nested loops, up to 128 layers.
Software: Using Spread Sheet as Compiler
What’s Good About ELMS FOR Loops at Machine Code Level Looping sequence is known in this example before entering the loop. Regular micro-processor treat the sequence as unknown. ELMS supports FOR loops with pre-defined iterations at machine code level. Execution time is saved and micro-complexities (branch penalty, pipeline bubble, etc.) associated with conditional branches are avoided. LDR1, #n LDR2, #addr_a LDR3, #addr_X LDR7, #0 BckA1LDR4, (R2) INCR2 LDR5, (R3) INCR3 MULR6, R4, R5 EndA1ADDR7, R7, R6 DECR1 BRNZBckA1 FORBckA1 EndA1 #n LDR2, #addr_a LDR3, #addr_X LDR7, #0 BckA1LDR4, (R2) INCR2 LDR5, (R3) INCR3 MULR6, R4, R5 EndA1ADDR7, R7, R6 25% MicroprocessorThe ELMS Conditional Branch
Conclusion The de-ripple algorithm is an useful alternative method for eliminating low frequency periodic noises. The ELMS is a handy sequence controller in FPGA that uses small amount of resources.
The End Thanks
What’s Good about ELMS No ALU => Small Resource Usage Program DATA Memory Princeton Architecture Harvard Architecture Fermilab Architecture(?) Program Control ALU Program Memory Program Control ALU DATA Memory Program Memory Sequencer (ELMS) Data Processor DATA Memory The Princeton Architecture is more suitable at system level while Harvard Architecture is better suited at micro-structure level. Regular microprocessors cannot run looped program without an ALU. The ALU takes large amount of resource while may not be efficiently utilized for data processing tasks in FPGA. The ELMS can run nested loop program without an ALU. Further separation of Program and data is therefore possible. The ELMS is kept small.