A Unified Approach to Fast Digital Processing for Beam Dampers, Instrumentation, & Controls Bill Foster Beam Instrumentation Workshop May 6, 2004
A Digital Manifesto Or, Example Application: 3-coordinate Bunch-by-Bunch Beam Damper for Fermilab Main Injector Implemented on a Single Altera Stratix FPGA Five other applications using this same hardware
Once Upon a Time, there was a job called: “Audio Frequency Analog Engineer” Their products: Mixers Equalizers Crossover networks Reverbs Fuzz Boxes…. Bob Widlar, “inventor of the IC Op-Amp” and other analog gems
Nowadays, an “Audio Frequency Analog Engineer” is any high-school kid with a PC and SoundBlaster Card Their products: Mixers Equalizers Crossover networks Reverbs Fuzz Boxes Emulated on a PC PLUS: Synthesizers that can fool my ears Time compressors that squeeze 20% off of a song’s play time without altering the pitch. Real-time tone substitution makes even Leonard Cohen sing on key …try doing that with an op-amp!
What Unemployed the Audio Analog Engineers? ADC Sampling Rates and Accuracies exceeded requirements –Audio requirements set by human ear Digital Processing capability exceeded requirements at reasonable cost –2 GHz CPU executes 50,000 instructions per audio waveform sample
Once Upon a Time, there was a job called: “Low-Level Radio Frequency Analog Engineer” Their products: Mixers Equalizers Phase Shifters Down converters Phase-locked Loops Fermilab’s Booster Low-Level RF system as it exists today!
Nowadays, a “LLRF Analog Engineer” is (or should be) any old programmer with a fast digitizer and an FPGA Their products: Mixers Equalizers Phase Shifters Down converters PLLs Implemented in FPGA’s PLUS: Direct Digital Synthesis of complex RF waveforms Built-in system diagnostics Digital Reproducibility (&spares!) High Speed Serial Links Multi-user support
What Unemploys the Analog RF Engineers? ADC Sampling Rates and Accuracies exceed requirements ~ 4 samples per RF clock gives bunch-by-bunch phase and amplitude Digital Processing capability exceeded requirements at reasonable cost –FPGAs and DSP’s For ~100 MHz or less, it is: GAME OVER
Generic Hardware Concept for Accelerator Instrumentation & Control Monster FPGA Clock, control,... Minimal Analog Filter FAST ADC Cables from Tunnel Minimal Analog Filter FAST ADC FAST DAC CPU Bus VME/ VXI/ PCI/ PMC etc. OR SERIAL LINK INPUTS: BPM Stripline Pickup Resistive Wall Flying Wire PMT RF Fanback Kicker Monitor …etc. OUTPUTS: Stripline Kicker RF Fanout Analog Monitor …etc.
53 MHz, TCLK, MDAT,... All-Coordinate Digital Damper Monster FPGA Minimal Analog Filter FAST ADC Stripline Pickup Minimal Analog Filter FAST ADC MHz Stripline Kicker Power Amp Minimal Analog Filter FAST ADC Resistive Wall Monitor Broadband Cavity FAST DACs > 27 MHz FAST DACs Power Amp Transverse Dampers Identical X & Y Longi- tudinal (Z) Damper MHz CPU: VME/ VXI/ PCI/ PMC etc. OR Serial LINK
The Board Alexi Seminov, Sten Hansen, Bill Ashmanskas, Dennis Nicklaus, Hyejoo Kang…
Some Example Applications using this same basic hardware: 1) Universal Beam Position Monitors (BPMs) –Handles full variety of FNAL beam RF structure 2) Generic instrumentation readout “Scope” –ex: Flying Wire readout for arbitrary bunches 3) Beam Loading Compensation 4) Universal Beam Dampers / Beamline Tuner 5) Entire Low-Level RF system
Fast, High Precision Pipelined ADC’s AD6645: 14 Bits, 105 MHz AD9430: 12 Bits, 210 MHz AD12500: 12 Bits, 500 MHz (hybrid) Several : 8 bits, ~1-2 GHz (‘scopes) Private opinion: it appears that ADCs are about to fall off of Moore’s Law curve the same way that CPU’s have…
AD6645 Functional Block Diagram Two-Stage Pipelined ADC Internal Track & Hold Differential Analog Inputs
This ADC can sample 53 MHz signals at 4 samples per cycle to measure both In-Phase and Quadrature on each cycle
Board Layout for High-Speed ADCs is a Lot Easier Than it Used to Be LVDS signals eliminate digital noise –0.25V differential swing far quieter than TTL –Direct “glueless” interface to FPGAs Fast input op-amps and surface mount components with small parasitics –Front-end layout is not critical since it is physically small
Clock Distribution for ADCs is a Lot Easier Than it Used to Be Clock and Signal timing can be fixed ex post facto via FPGA firmware timing adjustments Some A/Ds & D/As have internal PLLs to reduce or eliminate effects of clock jitter FPGAs have high-quality clock distribution which can be used to drive external A/D & D/As FPGA clock distribution can challenge Dedicated Clock Distribution Chips (on not…L. Dolittle)
Q: What ADC Clock Speed is needed? A: 4x RF Bunch Frequency Minimum needed for bunch-by-bunch Phase and Amplitude measurement In frequency domain, 4x RF sampling measures both in-phase and quadrature components. For Fermilab’s 53 MHz RF 212 MHz ADC’s
212 MHz Sampling of RWM Pulse Low-pass Filter Spreads signal +/-5ns in time so it will not be missed by ADC Filter Reduces ADC Dynamic Range requirement, since spike does not have to be digitized
212 MHz Sampling of Stripline Signal Roles of “Phase” and “Amplitude” signals are reversed from unipolar case.
Repetitive Waveform looks like simple sine wave, but contains bunch-by-bunch phase and amplitude “A - B” gives bunch-by-bunch “in-phase” signal Vector Sum sqrt(I**2 +Q**2) is insensitive to clock jitter “D - (C+E)/2” gives bunch-by-bunch “out-of-phase” or “quadrature” signal
Bunch-By-Bunch Phase vs. Turn Number Measured with MI Digital Damper Damper Output comes from derivative of individual bunch phase errors
Bunch-By-Bunch Intensity
Synchronous vs. Asynchronous ADC Sampling The choice is between –N*53 MHz beam phase locked sampling, or –Asynchronous sampling at a (possibly) lower rate Asynchronous sampling of a waveform will allow you to recover all the information, IF: –you know that the input is a pure sine wave, or –you know the input is repetitive (stored beam), or –the sampling rate is much higher than f MAX My belief is, undersampling is just a bad idea…
The Perils of Undersampling a Single-Pass Beam If a single-pass beam does not have uniform bunch populations, the ADC input is NOT a good sine wave and an undersampled waveform can give an erroneous picture of the beam signal. The signal CAN be reconstructed with many passes of stored beam.
… dealing with this variety of beams would be painful in Analog…
Digital Filter looking at many samples can still extract individual bunch transverse positions
Advantages of Digital Processing Digital filters more reproducible (=>spares!) Inputs and Outputs clearly defined (& stored!) –filters can be developed & debugged offline Digital filter can also operate at multiple lower frequencies...simultaneously if desired. Re-use Standard hardware with new FPGA code –or same code with different filter coefficients
Conclusions on ADC Clock Rate A Bunch-by-Bunch processing system must sample the raw waveform at a minimum of 4x the Bunch frequency You can never be: –too rich –too thin –or have too many ADC samples
What is an FPGA? Reconfigurable Logic Array ~10 6 logic gates Pre-built logic subassemblies: “Megafunctions” –Multipliers/Accumulators –Multi-port RAMs –Gigabit serial links –Entire CPUs –Phase Locked Loops –Complex I/O pads More transistors than a Pentium Impressive Support software
XYLINX And ALTERA Are the Industry Leaders
What is an FPGA Good At? Big Synchronous Arithmetic Pipelines –400 MHz multiply/accumulators, filters.. High-Speed Interface with Modern Parts –ADCs, DACs, Serial Links Built-in system diagnostics –“Digital Scope on every signal” Flexibility and Multiple Applications –Use one board design for many applications –Add features without hardware changes
SYNCHRONOUS PIPELINES When people say “Analog is simple”, they are often referring to the deterministic execution time (“propagation delay”). Analog circuits never fail to respond in time because they are off servicing an interrupt. FPGA synchronous pipelines provide dedicated logic which responds at a deterministic time. This captures a big advantage of Analog.
FPGA Programming Languages: Graphical vs. Text-mode Graphical Schematic Entry is useful for: – diagramming data flow – giving talks Text Mode features –Text is faster to enter and more concise –Can “diff” two files to see what’s changed –Code management systems can handle text well I PERSONALLY RECOMMEND TEXT MODE
FPGA Programming Languages: Proprietary vs. Industry Standard Proprietary languages often lock you into a single vendor. –I use one (Altera AHDL) anyway. The “industry standard” VHDL language –It is extremely verbose & repetitive –Translating AHDL into VHDL increases the text length by a factor of ~2. LIKE TRANSLATING A DOCUMENT TO FRENCH
Some Development Models How do you Download & Talk to this Board that you’ve just built? 1.FPGA Programming cable 2.Firmware Serial Port Model 3.Crate Backplane Bus Access 4.On-Board CPU w/Ethernet 5.Compiled-in On-chip CPU w/ Ethernet
CPU Access to FPGA Registers Usually want ADDRESS/DATA R/W model for CPU access to Control Registers These “address and data busses” are synthesized in firmware Example (AHDL) for 32-bit read/write register: (Bus[],Outputs[]) = BUS_REG( ) WITH( ADDRESS = H“ ”, WIDTH = 32 );
Development Model #1: FPGA Programming Cable The programming cable needed to program the FPGA (usually through the PC printer port) can also be used for limited communication Not clear how useful this is for real-time response since it works through serial port driver Altera provides “compiled-in logic analyzer” which provides output that can be compared with simulation.
Development Model #2: Crate Backplane Bus Access Crate Backplane Bus connections to FPGA can provide CPU access to registers in internal address space of FPGA Internal Address Space is defined in firmware Requires many bits of bus buffers, etc. Be Careful of Backplane Noise (TTL)
Development Model #3: Serial Port to PC A firmware-defined Serial Port can be used for 2-wire communication with the.COM port of a PC Terminal emulator can provide simple read/write access the internal address space of the FPGA Can also connect to spread sheets, etc via Visual Basic access to.COM port
Development Model #4: On-Board CPU w/Ethernet “Postage Stamp” Ethernet CPU or homebuilt DSP can provide Ethernet and Web access to FPGA registers Firewire is an alternative Remote update of firmware possible NIM-like modules without need of crate backplane
Development Model #5: Compiled-in On-chip CPU with Firmware-defined Ethernet 1.High-end end FPGAs have built-in or firmware-defined CPUs fast enough to support IP stack, Web Servers, etc. 2.These are available on Demo Boards 3.C-language programming of these is integrated into FPGA development environment (no new software!).
Adding a new ACNET Device Takes about 10 minutes from concept to Fast-Time Plot 1) Add register(s) to FPGA Firmware 2) Start Recompile (takes ~6 minutes) 3) Meanwhile, use DABBEL/D80 to define properties of new ACNET device 4) Download Firmware & Reboot Crate (~2 min.)
Application #1: Universal BPM (Beam Position Monitor) Measures position of each bunch on each pass around the ring with full-bandwidth FIR filter (R-L)/(R+L) for each bunch measurement. Multi-bunch averages available for lower noise –per batch, per turn, many turns, different bandwidths Multiple users can share hardware w/o conflicts –ADC is always active, FPGA stores data many ways Same Hardware OK for Booster, MI, RR, TeV, & beamlines.
FPGA Based “Universal BPM” Monster FPGA(s) 53 MHz, TCLK, MDAT,... Minimal Analog Filter ADC Split Plate Pickup #1 Minimal Analog Filter FAST DAC 14 CPU VME/ VXI/ PCI/ PMC Ethernet etc. Minimal Analog Filter Minimal Analog Filter 212 MHz Pickup #4 Analog Position Monitor Test Point (Optional) OR Modulation Output for Synchronous Lock-in Detection Technique ADC R L T B Serial Link to Real Time Orbit Control System
“Universal BPM” Application: Signal Processing Steps 1) Bandwidth-Limit input signal to ~53 MHz 2) 12 Bit Digitization at 212 MHz 3) FIR filter(s) to get single-bunch signal(s) 4) Sum & Difference of plate signals 5) (Difference / Sum) gives position 6) Linearization lookup table or polynomials 7) Bunch, Batch, Multiturn Averaging 8) “Scope Trace Buffers” on every signal Multiple users can be acquiring and filtering data multiple ways without conflicts Inside FPGA Front End
Main Injector BPM Response Map Linearization can be done in FPGA or readout software J. Crisp
“Universal BPM” Signal Processing Step #7: Averaging and Filtering Many Types of averaging possible: Position Averaging over Bunches in a Batch Multi-Turn Averaging of Positions Multi-turn averaging of Raw Signals Fitting to betatron frequency (injection errors) - this gives info for -function measurement Emulation of DDC chip functions Spectrum analysis of position & phase Different filters can be simultaneously active
MAIN INJECTOR VERTICAL BPM (8 Bits) DIGITAL DAMPER POSITION SIGNAL (Batch Average) 1mm
Single-Bunch BPM Measurement was tested by blowing out nearby bunches during Stacking Cycle
MAIN INJECTOR VERTICAL BPM (8 Bits) DIGITAL DAMPER POSITION FOR SINGLE 53 MHz BUNCH SINGLE-TURN (non-averaged) 1mm BPM Resolution for 212 MHz Digitization of Single 53 MHz Bunch
Multi-User Support FPGA CODE SUPPORTS: –31 different users on different machine cycles –Different averaging algorithms, simultaneously active –Each user can sit and observe a different single bunch –Different bunch frequencies on each cycle No User Interferences since Separate Dedicated Logic is used for each purpose
Application #2: Generic Instrumentation Readout Scope What we want in a “Generic ‘Scope”: 1) Ability to trigger on TCLK events, Beam Synch Events, analog threshold crossings of different channels, etc. 2) Multiple Users Sharing without conflicts - separate copies of trigger logic - separate buffers to store captured signals - separate filter algorithms run simultaneously 3) Common hardware & software among systems
Example Application of Generic ‘Scope: Flying Wire PMT Readout Monster FPGA 53 MHz, TCLK, MDAT,... Minimal Analog Filter FAST ADC PMT(s) in Tunnel Minimal Analog Filter FAST ADC DAC 14 CPU Bus VME/ VXI/ PCI/ PMC etc. Minimal Analog Filter FAST ADC Minimal Analog Filter FAST ADC 106 MHz Encoder Signals Motor Motor Drive
Example Application of Generic ‘Scope: Flying Wire PMT Readout Photomultiplier Tube (PMT) pulses presented to Analog filter to limit BW Summing circuits in FPGA give total PMT pulse height in narrow and wide gates Individual gates report signals for 36x36 or more bunches, average over many turns, etc. FPGA can be used to control & trigger the fly Raw PMT pulses can be simultaneously looked at via “multi-user” hooks
Application #3: God’s Own Beam Loading Compensation 1) Digital Pipeline to reproduce I&Q signals from RW bunch monitor with N-turn delay. (N=1...1/ S ) 2) Digital filters for transients and synchrotron osc. Inputs: Resistive-wall monitor & RF fanback. Digitization: bunch-by-bunch I & Q signals Outputs: I&Q to damper cavity, or LLRF –frequency swing issues for LLRF drive –Antiproton vs. Proton timing
Application #4: “Universal Damper” A single FPGA has enough capability to do damping calculations for X,Y, & longitudinal. Digital Filter which operates on I & Q signals from individual 53 MHz bunches can also be reprogrammed to operate at lower frequencies. Frequency swing during acceleration introduces some timing complications, which can be fixed by components (FIFOs, Dual-Port RAMS) inside of FPGA.
“Universal-Damper” Application: Signal Processing Steps 1) Bandwidth-Limit input signal to ~53 MHz 2) 12 Bit Digitization at 212 MHz 3) FIR filter to get single-bunch signal 4) Sum & Difference of plate signals 5) Multi turn difference filter (FIR) w/delay 6) Pickup Mixing for correct Betatron Phase 7) Bunch-by-bunch gain, dead band etc. 8) Timing Corrections for Frequency Sweep 9) Pre-Emphasis for Kicker Power Amp 10) Power Amp for Kicker Inside FPGA Front End Buy
Longitudinal Beam Instability in FMI Driven by cavity wake fields within bunch train First Bunch ~ OK 7th Bunch Trashed Occurs with 7 bunches filled (out of 588) Prevents low emittance bunch coalescing
Longitudinal Damper FPGA Logic ADC Resistive Wall Pickup 14 8-Turn FIR calculates derivative of bunch phase Bunch-by- Bunch Digital Phase Detector +THRESH -THRESH Multi-Turn Memory Bunch Intensity FIR Filter +THRESH +/- KICK to DAMPER Individual Bunches are kicked + or – depending on whether they are moving right or left in phase option (currently unused)
FPGA Code for Universal Damper (8-turn Filter)
Transverse Damper 3 - Turn Filter Damper kick is calculated from single BPM position reading on 3 successive turns. Arbitrary Betatron Phase of Kicker can be accommodated
HERA-P Damper uses a 3-turn Digital FIR Filter Digital Bunch by 96ns Spacing Immediate digitization following peak detection Klute, Kohaupt et. al. EPAC ‘96
3 Turn Filter Coefficients Damper kick is weighted sum of beam positions on the 3 previous turns. 3 Filter Coefficients Uniquely Determined by: –System Gain –Betatron Phase Desired at Kicker –Constraint that sum of filter coefficients = 0 (so that filter does not respond to DC offsets.)
Frequency Sweep Issues Machines with frequency sweep ( Booster!) must adjust ADC input clock and DAC output clock phases as frequency sweeps. This can be generated with Phase-locked loops and delay-locked loops present inside FPGAs. This requires access to both the RF clock, and a cable-delayed version of the RF clock, as timing references. One-turn digital delay using FIFOs in FPGAs. Same hardware can be used for Booster thru Tevatron.
RF Clocking with acceleration Equal Length Cable Fanout so Beam Sees Same RF Phase at all Cavities as RF Frequency Sweeps During Acceleration # of Clock Cycles per Turn is harmonic number h
ADC Clocking during frequency sweep Round-Trip Cable Delay on ADC Clock ensures ADC Clock & Beam Input Stay in Phase as Beam Accelerates May need additional phase adjustment to track phase jumps at transition, etc.
DAC Clocking during frequency sweep Propagation Delay: CK DAC Cable Kicker should match RF Fanout Delay so Kick Stays in Phase as Beam Accelerates
Generic Damper tolerating frequency sweep FIFO needed due to phase shifts between DAC and ADC clocks as beam accelerates All Logic Inside FPGA
Damper Output Precompensation 53 MHz Bunch-by-bunch kicker wants a 19ns square pulse with a good flat top to minimize timing sensitivity Power Amp and Cable have non-ideal response for square wave (ringing and tail) DAC operating at 424 MHz is used to produce specially sculpted pulse necessary to convince Amp & Cable to make a nice flat pulse at kicker.
Echotek Card Used for Initial Dampers 105 MSPS AD6645 Echotek Board Originally Built to SLAC Design Specification 65MHz DDC version to be used for RR BPM upgrade 105 MHz version (with DAC “daughter card”) used for Dampers 212 MHz DAC Daughter Card (S. Hansen/ PPD)
Butchering the Echotek Board Scorched-Earth FPGA rewrite (GWF) –~65 pages of firmware since Jan ‘ MHz DAC “Daughtercard” –Sten Hansen & T. Wesson (PPD) –3 channels for X,Y,Z 212 MHz Output FIR (W. Schappert, RFI) –Pre-emphasis compensation for analog outputs –Prototype for 424 MHz output on final board Input Buffer Amp/Splitter Box (Brian Fellenz,RFI)
Multi-batch w/o and with transverse dampers with dampersw/o dampers 1 to 11 Booster turns 2.3 GeV/c: X =26.43, Y =25.42, X =-20, Y 11.7 GeV/c: X =26.39, Y 8.9 GeV/c: X =26.44, Y =25.47, X =-5, Y 11.7 GeV/c: X =26.36, Y = Booster turns
Pushing up the intensity… Booster turns 3.3 Beam extracted at 0.65 s, just before 8.9 GeV/c: X =26.43, Y =25.47 X = Y 10.3 GeV/c: X = Y 11.7 GeV/c: X =26.36, Y =25.46 Transverse Dampers ON
What if… I turn dampers off ? turn dampers off intensity horizontal vertical
DON’T TELL BOB MAU (or the rest of the green-yellow color blind operators) THAT WE DID THIS
Filter for Undamped, Damped, and Anti-Damped Bunches
Blowing Selected Bunches out of the Machine (in X,Y, or both) Neutrino Communications! … …
Application #5: “LLRF System on a Chip” The single FPGA on the damper board has 12 Phase-locked loops and enough capability synthesize all the signals needed for a complete LLRF system. The Damper Board has already been used (in a simple way) to drive the FNAL Debuncher Ring LLRF system. A more ambitious (but achievable) goal is to replace the entire Booster LLRF with a Damper.
MOTIVATION Booster Low-Level RF. The Final Frontier.
Booster Low-Level RF
2. RPOS 3. TCLK 1. WCM 5. MI RF 4. BDOT A&B DRIVE OUT Notching and Cogging
Booster LLRF External Connections ~5 Inputs: 1.Wall-Current Monitor (Phase) 2.Transverse Pickup (RPOS) (BNL Uses two…) 3.Start Pulse (TCLK) 4.BDOT (Low bandwidth… replace w/lookup?) 5.MI AA Marker (Phase lock & notch cogging) Two Outputs: Cavity A&B Drives (Optional?) Beam Clock Output
TCLK, 53 MHz, MI AA, MDAT,... Digital Booster LLRF Concept Monster FPGA DDS Beam Synched Clock MHz (4x Booster RF) Minimal Analog Filter FAST ADC Wall Current Monitor (PHASE) FAST DAC 12 ETHERNET Minimal Analog Filter FAST ADC BPM Minimal Analog Filter FAST ADC 12 RPOS [crystal] 400 MHz FAST DAC “A” Drive FAST DAC “B” Drive 12
CONCLUSIONS Fast ADCs and Huge FPGAs are revolutionizing Accelerator Instrumentation The same basic hardware can perform a large number of Instrumentation & Control functions A good first application of this technology is the 3-coordinate bunch-by-bunch beam damper