Presentation is loading. Please wait.

Presentation is loading. Please wait.

BittWare Overview March 2007.

Similar presentations


Presentation on theme: "BittWare Overview March 2007."— Presentation transcript:

1 BittWare Overview March 2007

2 Agenda Corporate Overview Hardware Technical Overview
Software Technology Overview Demo

3 processing vendor, focused on providing:
Who is BittWare? A leading COTS signal processing vendor, focused on providing: the “Essential building blocks” (DSPs, FPGAs, I/O, SW Tools, IP, & Integration) that our customers can use to build “Innovative solutions”

4 BittWare Corporate Overview
Private company founded in 1989 Founded by Jim Bittman (hence the spelling) Essential Building Blocks for innovative Signal Processing Solutions Focused on doing one thing extremely well #2 in recognition for DSP Boards (source: VDC Merchant Board Survey 2004) Committed to providing leading edge, deployable products, produced with timely & consistently high quality Tens of Thousands of boards shipped 100’s of active customers Financially Strong: Profitable & Growing Headquartered in Concord, New Hampshire, USA Engineering/Sales Offices in: Belfast, Northern Ireland (UK) (Formally EZ-DSP, acquired Sept. 2004) Leesburg, Virginia (Washington DC) Phoenix, Arizona 15 International Distributors Representing 38 countries

5 BittWare’s Building Blocks
High-end Signal Processing Hardware (HW) Altera FPGAs & TigerSHARC DSPs High Speed I/O Board Formats: CompactPCI (cPCI), PMC, & PCI VME Advanced Mezzanine Card (AMC or AdvancedMC) Silicon & IP Framework SharcFIN ATLANTiS Development Tools BittWorks Trident Systems & Services

6 BittWare Business Model & Markets
BittWare provides essential building blocks for innovative signal processing solutions at every stage of the OEM Life-cycle Application-specific Products System Integration Custom FPGA Design Interfacing Processing Tailored Signal Processing Boards Specialized/Custom I/O Application Software integration/implementation Technology & Intellectual Property Licensing COTS Signal Processing HW Altera FPGAs TigerSHARC DSPs High Performance I/O Development/Deployment PCI; PMC; cPCI VME AdvancedMC (AMC) Silicon & IP Frameworks SharcFIN ATLANTiS Development Tools BittWorks Tools Function Libraries Trident MP-RTOE Markets • Defense/Aerospace • Communications • High-End Instrumentation • Life Sciences

7 Hardware Technology Overview
Hybrid Signal Processing T2 Family SharcFIN ATLANTiS T2 Boards (PCI, PMC, cPCI, VME) GT and GX Family FINe GT Boards (cPCI, VME) GX Boards (AMC, VME)

8 Hybrid Signal Processing Concept

9 Hybrid Signal Processing Architecture

10 BittWare’s T2 Board Family
TigerSHARC multiprocessing boards for ultra-high performance applications, using a common architecture across multiple platforms and formats Clusters of 4 ADSP-TS201S up to 600MHz 14,400 MFLOPS per cluster Xilinx Virtex-II Pro FPGA interface/coprocessor ATLANTiS™ Architecture: up to 8.0 GB/sec I/O 2 Links per DSP 125MB/sec each routed via FPGA to DIO/RocketIO SERDES Ring of link ports interconnected within cluster SharcFIN ASIC (SFIN-201) providing: 64-bit, 66 MHz PCI bus 8MB Boot Flash FPGA Control Interface PMC+ expansion site(s) Large shared SDRAM memory (up to 512MB)

11 T2 Architecture Block Diagram
ATLANTiS FPGA implements link routing Configured & controlled via SharcFIN Access via TigerSHARCs and Host Can also be used for pre/post-processing 8 Full-duplex Link Ports from DSPs to FPGA (2 from each DSP): Each link provides 125 MB/sec Transmit 125 MB/sec Receive Total I/O bandwidth = 2.0 GB/sec 64-bit, 66 Mhz PCI Local Bus Serdes TS201 #1 #2 #4 #3 SDRAM (SO-DIMM up to 512MB) L2 L3 SharcFIN SF201 Boot Flash 8-bit Bus 4 x L0 8 Ints& 8 Flags RocketIO (8 Channels) 4 x L1 64-bit, 83.3 Mhz Clusterl Bus DIO ( pins) Basic Architecture is the same as before (HH & TS) except the two I/O links per DSP are routed (transferred) via ATLANTiS FPGA ATLANTiS FPGA SharcFIN-201 Bridge provides powerful, easy to use PCI/Host command & control interface Up to 3 separate 64-pin DIO (Digital I/O) ports can be used to implement Link ports, parallel buses, and/or other interconnects 8 channels of RocketIO 2.5 GHz each Each channel provides ~250 MB/sec both ways Total I/O bandwidth is 4.0 GB/sec Connected via two 4x Infiniband-type HW, or backplane

12 SharcFIN 201 Features 64/66MHz PCI bus master Interface (rev. 2.2)
528MB/sec burst 460MB/sec sustained writes (SF to SF) 400MB/sec sustained reads (SF to SF) Cluster bus interface to 83.3MHz Access DSP internal memory & SDRAM from PCI 2 independent PCI bus mastering DMA engines 6 independent FIFOs (2.4KB total) 2 for PCI target to/from DSP DMA (fly-by to SDRAM) 2 for PCI target to/from DSP internal memory 2 for PCI bus mastering DMA to/from DSP DMA General purpose peripheral bus 8-bits wide, 22 address bits, 16MB/sec Reduces cluster bus loading, increasing cluster bus speed Accessible from DSP cluster bus & PCIbus Flash interface for DSP boot & non-volitile storage I2O V1.5 compliant I2S serial controller Programmable interrupt & flag multiplexer 10 inputs; 7 outputs 1 inputs/1output dedicated to PCI Extensive SW support via BittWorks HIL & DSP21K SFIN -201

13 SharcFIN-201 Block Diagram

14 What is ATLANTiS? A Generic FPGA Framework for I/O, Routing & Processing An I/O routing device in which every I/O can be dynamically connected to any other I/O! Like a Software programmable ‘cable’ – but better! ATLANTiS provides communication between the TigerSHARC link ports and all other I/Os connected to the FPGA/Board Off-board I/O defined by board architecture Communication can be point-to-point, or broadcast to various outputs Devices can be connected or disconnected as requirements dictate w/o recompiling or changing cables A configurable FPGA Pre/Post/Co-Processing engine Standard IP blocks Customer/Custom developed blocks

15 T2 ATLANTiS Detail Diagram
External I/O & connectors dependant on specific board implementation

16 8 x 8 ATLANTiS Switch Diagram

17 Other Major ATLANTiS Components

18 ATLANTiS Put Together *Links, DIO, & SerDes are now routed by Switch

19 How is Used? FPGA Configuration Run-Time Set-up and Control
1) BittWare Standard Implementations (Loads) Works out-of-the-box (doesn’t require any FPGA design capabilities) Fixed interfaces & connections define switch I/Os Variety of I/O configuration options are available with boards 2) Developer’s kit Fully customizable (by BittWare and/or end user) All component cores in kit Requires FPGA Development Tools & design capabilities Run-Time Set-up and Control 1) Powerful, easy to use GUI (Navigator) Set up for any and all possible routings 2) Use DSP or Host to program Control Registers Initial configuration Change routing at any time by re-programming Control Registers

20 ATLANTiS Configurator

21 T2 Board Family T2PC: Quad PCI TigerSHARC Board
T2PM: Quad PMC TigerSHARC Board T26U: Octal 6U cPCI TigerSHARC Board T2V6: Octal 6U VME TigerSHARC Board

22 T2-PCI Features One Cluster of Four ADSP-TS201S TigerSHARC® DSPs processors running at 600 MHz each 24 Mbits of on chip SRAM per DSP Static Superscaler Architecture Fixed or Floating point operations 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP Processing power Xilinx Virtex-II Pro FPGA interface/coprocessor ATLANTiS Architecture: up to 4.0 GB/sec I/O Eight external link 250MB/sec each Routed via Virtex-II Pro RocketIO SerDes Xcvrs, PMC+, DIO headers Two link ports per DSP dedicated for interprocessor communications Sharc®FIN (SFIN201) 64/66 PCI interface PMC site with PMC+ extensions for BittWare’s PMC+ I/O modules 64 MB-512 MB SDRAM 8 MB FLASH memory (boots DSPs & FPGA) Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries Standalone operation

23 T2PC Block Diagram PMC+ TS201 #1 #2 #4 #3 SDRAM (SO-DIMM up to 512MB)
DIO Header 64 Signals TS201 #1 #2 #4 #3 SDRAM (SO-DIMM up to 512MB) L2 L3 SharcFIN SF201 JTAG Header Boot Flash 8-bit Bus 4 x L0 PCI-PCI Bridge Ext. Power 8 Ints & 8 Flags 64-bit, 66 Mhz PCI Local Bus 64 VirtexII-Pro J4 DIO Headers 20 signals Rocket I/O (8 Channels) PCI Conn. 4 x L1 64-bit, 83.3 Mhz Clusterl Bus Serdes

24 T2PM Features One Cluster of Four ADSP-TS201S TigerSHARC® DSPs processors running at up to 600 MHz each 24 Mbits of on chip SRAM per DSP Static Superscaler Architecture Fixed or Floating point operations 14.4 GFLOPS (floating point) or 58 GOPS (16-bit) of DSP Processing power Xilinx Virtex-II Pro FPGA interface/coprocessor ATLANTiS Architecture: up to 4.0 GB/sec I/O Eight external link 250MB/sec each Routed via Virtex-II Pro RocketIO SerDes Xcvrs, PMC+, DIO header Two link ports per DSP dedicated for interprocessor communications Sharc®FIN (SFIN201) 64/66 PCI interface PMC format with BittWare’s PMC+ extensions 64 MB-256 MB SDRAM 8 MB FLASH memory (boots DSPs & FPGA) Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries Standalone operation

25 T2PM Block Diagram TS201 #1 #2 #4 #3 SDRAM (up to 256MB) SharcFIN
J1-3 J1-3 Serdes TS201 #1 #2 #4 #3 SDRAM (up to 256MB) L2 L3 SharcFIN SF201 Boot Flash 8-bit Bus 4 x L0 8 Ints& 8 Flags 64-bit, 66 Mhz PCI Local Bus VirtexII-Pro J4 Rocket I/O (8 Channels) PMC Conn. 4 x L1 64-bit, 83.3 Mhz Clusterl Bus PMC+ Conn. FPGA 64 Front Panel JTAG Header (optional)

26 T26U cPCI Features Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs processors (8 total) running at 500 MHz each 24 Mbits of on chip SRAM per DSP Static Superscaler Architecture Fixed or Floating point operations 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing power Two Xilinx Virtex-II Pro FPGA interface/coprocessors ATLANTiS Architecture: up to 6.0 GB/sec I/O Sixteen external link 250MB/sec each Routed via Virtex-II Pro RocketIO SerDes Xcvrs, PMC+, DIO (Cross-cluster) Two link ports per DSP dedicated for interprocessor communications Sharc®FIN (SFIN201) 64/66 PCI interface Two PMC sites with PMC+ extensions for BittWare’s PMC+ I/O modules 128 MB-512 MB SDRAM 16 MB FLASH memory (boots DSPs & FPGAs) Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries Standalone operation

27 T26U Block Diagram PMC+ PMC+ A Cluster A Cluster B B B FPGA FPGA TS201
Rear Panel DIO 4 x L1 PMC+ A Rocket I/O (4 Channels) TS201 #1 #2 #4 #3 SDRAM (up to 256MB) L2 L3 SharcFIN SF201 JTAG Header Boot Flash 4 x L0 PCI-PCI Bridge 8 Ints& 8 Flags 64-bit, 66 Mhz PCI Local Bus 64 Rocket I/O Rear Panel DIO CPCI 64/66 (64 Signals) (4 Channels) (64 Signals) 64 64 PCI-PCI Bridge High-Speed Serdes High-speed Serdes PCI-PCI 64-bit, 66 Mhz PCI Local Bus Bridge Boot Flash 8-bit bus 8 Ints& SharcFIN 8-bit bus 8 Flags SF201 8 Ints& 8 Ints& FPGA 8 Flags 8 Flags FPGA 64 64 4 x L1 4 x L0 Cluster A Cluster B J4 J4 L2 L3 TS201 TS201 High-speed Serdes #1 #4 L3 L2 64-bit, 83.3 Mhz Clusterl Bus PMC+ 64-bit, 83.3 Mhz Clusterl Bus B B L2 L3 TS201 TS201 #2 #3 L3 L2 Rocket I/O SDRAM Rocket I/O (4 Channels) (up to 256MB) (4 Channels)

28 T2 6U VME/VXS Features Two Clusters of Four ADSP-TS201S TigerSHARC® DSPs processors (8 total) running at 500 MHz each 24 Mbits of on chip SRAM per DSP Static Superscaler Architecture Fixed or Floating point operations 24 GFLOPS (floating point) or 96 GOPS (16-bit) of DSP Processing power Two Xilinx Virtex-II Pro FPGA interface/coprocessor ATLANTiS Architecture: up to 8.0 GB/sec I/O Sixteen external link 250MB/sec each Routed via Virtex-II Pro RocketIO SerDes Xcvrs, PMC+, DIO (Cross- cluster) Two link ports per DSP for interprocessor ring Sharc®FIN (SFIN201) 64/66 PCI interface Tundra TSI-148 PCI-VME bridge with 2eSST support VITA-41 VXS Switched-Fabric Interface PMC site with PMC+ extensions for BittWare’s PMC+ I/O modules 128 MB-512 MB SDRAM 16 MB FLASH memory (boots DSPs & FPGAs) Complete software support, including remote control and debug, support for multiple run-time and host operating systems, and optimized function libraries Standalone operation

29 T2V6 Block Diagram T2V6 Block Diagram PMC+ Cluster A Cluster B FPGA
4 x L1 PMC+ VXS/P0 (8 Channels) TS201 #0 #1 #3 #2 SDRAM (up to 256MB) L2 L3 SharcFIN SF201 JTAG Header 4 x L0 8 Ints& 8 Flags 64 J4 64-bit, 83.3 Mhz Clusterl Bus Boot Flash 64-bit, 66 Mhz PCI Local Bus VME-PCI Bridge VME64/2eSST Cluster A Cluster B 8-bit bus High-Speed Serdes High-speed Serdes RocketIO (4 Channels) Factory Options P2 User Pins 4 32 FPGA T2V6 Block Diagram

30 T2V6 Heat Frame - Transparent

31 T2V6 Heat Frame

32 T2V6 Thermal Model

33 BittWare Levels of Ruggedization

34 Hardware Technology Overview
PMC+ Extensions Barracuda High-Speed 2-ch ADC Tetra High-Speed 4-ch ADC

35 BittWare PMC+ Extensions
BittWare’s PMC+ boards are an extension of the standard PMC specification (user- defined J4 connector) Provides tightly coupled I/O and processing to BittWare’s DSP boards: Hammerhead Family 4 links, Serial TDM, flags, irqs, reset, I2C TS Family 4 links, flags, irqs, reset, I2C T2 Family 64 signals, routed as 32 diff pairs to ATLANTiS Standard use is 4 links, plus flags and irqs Can be customized for 3rd party PMCs

36 Barracuda PMC+ Features
2 channel 14 bit A/D, 105 MHz (AD6645) 78 dB SFDR; 67 dB SNR (real-world in-system performance) AC (transformer) or DC (op-amp) coupled options 64 bit, 66 MHz bus mastering PCI interface via SharcFIN 64 MB- 512 MB SDRAM for large snapshot acquisitions Virtex-II 1000 FPGA reconfigurable over PCI used for A/D control and data distribution configurable preprocessing of high speed A/D data, such as digital filtering, decimation, digital down conversion, etc. Developer’s kit available with VHDL source code Optional IP cores and integration from 3rd Parties for DDR/DDC/SDR/comms applications Plethora of other IP cores available PMC+ links (4) in FPGA configurable for use with Hammerhead or Tiger PMC+ carrier boards Internal/external clock and triggering Optional oven controlled oscillator/high stability clock Onboard programmable clock divider & decimator Large Snapshot acquisition to SDRAM (4K- 256M samples) 1 105 MHz 2 75 Mhz Continuous acquisition 2 105 Mhz to TigerSHARC links 1 105 Mhz or Mhz to PCI (system dependent)

37 Barracuda PMC+ Block Diagram

38 Tetra PMC+ Features 4 channel 14 bit A/D, 105 MHz (AD6645)
78 dB SFDR; 67 dB SNR (real-world in-system performance) DC (op-amp) coupled 32 bit, 66 MHz bus mastering PCI interface via SharcFIN Cyclone-II 20/35/50 FPGA reconfigurable over PCI used for A/D control and data distribution configurable preprocessing of high speed A/D data, such as digital filtering, decimation, digital down conversion, etc. Developer’s kit available with VHDL source code Optional IP cores and integration from 3rd Parties including DDC PMC+ links (4) in FPGA configurable for use with TigerSHARC/ATLANTiS Internal/external clock and triggering Can source clock for chaining Onboard programmable clock divider & decimator

39 Tetra PMC+ (TRPM) Block Diagram

40 Hardware Technology Overview
New FINe New ATLANTiS

41 FINe Host Interface Bridge
Host/Control Side (Control Plane) Signal Processing Side (Data Plane)

42 New ATLANTiS - Putting it all Together
FINe

43 New Product Families B2 Family B2AM GT Family GT3U-cPCI
GTV6-Vita41/VXS GX Family GXAM

44 B2AM Features Full-height, single wide AMC (Advanced Mezzanine Card)
ATLANTiS/ADSP-TS201 Hybrid Signal Processing cluster Altera Stratix II FPGA for I/O routing and processing 4 ADSP-TS201S TigerSHARC® DSPs processors up to 600 MHz 57.5 GOPS (16-bit) or 14.4 GFLOPS (floating point) of DSP Processing power Fat Pipes & Common Options Interface for Data & Control Module management Control Implementing IPMI Monitors temperature and power usage of major devices Supports hot swapping SharcFINe bridge providing GigE and PCI Express ATLANTiS provides Fat Pipes Switch Fabric Interfaces: Serial RapidIO™ PCI Express GigE, XAUI™ (10 GigE) System Synchronization via AMC system clocks Front Panel I/O 10/100 ethernet LVDS & General Purpose Digital I/O JTAG port for debug support FiberOptic 2.5GHz (optional) Booting of DSPs and FPGA via Flash nonvolatile memory

45 B2-AMC Block Diagram

46 GT Cluster Architecture

47 BittWare Memory Module (BMM)
Convection or Conduction Cooled 67 mm x 40 mm 240-pin Connector 160 usable signals (plus 80 power/ground) Capability to address TBytes Can be implemented today as: 1 bank of SDRAM up to 1GB (x64) 2 banks of SDRAM up to 512MB each (x32) 1 bank of SRAM up to 64MB (x64) 1 bank or SDRAM up to 512MB (x32) and 1 bank of SRAM up to 32MB (x32) Top Back Side 240-pin Connector to Carrier

48 GT 3U cPCI Features GT3U Features
Altera® Stratix® II GX FPGA for I/O, routing, and processing One cluster of four ADSP-TS201S TigerSHARC® DSPs 57.5 GOPS 16-bit fixed point, 14.4 GFLOPS floating point processing power Four link ports per DSP Two link ports routed to the ATLANTiS FPGA Two link ports routed for interprocessor communications 24 Mbits of on-chip RAM per DSP; Static superscalar architecture ATLANTiS architecture 4 GB/s of simultaneous external input and output Eight link up to 500 MB/s routed from the on-board DSPs 36 LVDS pairs (72 pins) comprised of 16 inputs and 20 outputs Four channels of high-speed SerDes transceivers BittWare Memory Module Up to 1 GB of on-board DDR2 SDRAM or 64 MB of QDR SDRAM BittWare’s SharcFINe PCI bridge 32-bit/66 MHz PCI 10/100 ethernet Two UARTs, software configurable as RS232 or RS422 One link port routed to ATLANTiS 64 MB of flash memory for booting of DSPs and FPGA 3U CompactPCI form factor – Air Cooled or Conduction Complete software support

49 GT3U Block Diagram

50 GTV6 Block Diagram Available Q2 2007

51 GT3U/GTV6 BittWare Levels of Ruggedization

52 GXAM Features Available Q2 2007
Mid-size, single wide AMC (Advanced Mezzanine Card) Common Options region: Port 0 GigE; Ports 1 ,2 & 3 connect to BittWare’s ATLANTiS framework Fat Pipes region has eight ports: ports 4-11 configurable to support. Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE) Rear panel I/O has eight ports (8 LVDS IN, 8 LVDS OUT) System synchronization via AMC system clocks (all connected) High-density Altera Stratix II GX FPGA (2S90/130) BittWare’s ATLANTiS framework for control of I/O, routing, and processing BittWare’s FINe bridge provides control plane processing and interface GigE, 10/100 Ethernet, and RS-232 Over 1 GByte of Bulk Memory Two banks of DDR2 SDRAM (up to 512 MBytes each) One bank of QDR2 SRAM (up to 9 MBytes) Front panel I/O 10/100 Ethernet, RS-232, JTAG port for debug support, 4x SERDES providing: Serial RapidIO™, PCI Express™, GigE, and XAUI™ (10 GigE) BittWare I/O Module 72 LVDS pairs, 4x SerDes, Clocks, I2C, JTAG, Reset Booting of FINe and FPGA via Flash Available Q2 2007

53 GXAM Block Diagram Available Q2 2007 PRELIMINARY PRELIMINARY

54 IFFM Features - Preliminary
The IFFM is an IF transceiver on a Front-panel Module (FM) format. Combined with a GXAM, this forms an integrated IF/FPGA interface & processing AMC board 2 channels of high-speed (HS) ADCs (AD9640: 14-bit, 150 MHz) with good SFDR specs (target is 80db) dual package to better sync channels fast detect (non-pipelined upper 4 bits) helps for AGC control 2 channels of HS-DACs (AD9777: 16-bit; 400 MHz) built-in up conversion interpolation of 1x, 2x, 4x, and 8x High performance Clock generation via PLL/VCO (AD9516) inputs reference clock (e.g. 10MHz) from front panel or Baseboard generates programmable clocks for HS-ADCs and HS-DACs source reference clock to Baseboard (for system distribution) General Purpose (GP) 12-bit ADCs & DACs GP-ADCs can be used for driving AGC on RF front-end GP-DACs can be used for other utility signal such as GPS, positions, ... Available Q3 2007

55 IFFM Block Diagram - Preliminary
Available Q3 2007

56 Software Technology Overview
BittWorks TS Libs Trident MPOE GEDAE

57 Software Products Analog Devices Family Development Tools
VisualDSP C++, C, Assembler, Linker, Debugger, Simulator, VDK Kernal JTAG Emulators (ADI/ White Mountain) BittWorks DSP21k Toolkit (DOS, Windows, LINUX & VxWorks) VDSP Target Remote VDSP Target & DSP21k Toolkit via Ethernet (combined in 8.0 Toolkit) Board Support Packages/Libraries & I/O GUIs SpeedDSP (ADSP-21xxx only - no TS) FPGA Developer’s Kits Porting Kit Function Libraries TS-Lib Float TS-Lib Fixed Algorithmic Design, Implementation, & Integration Real-Time Operating Systems BittWare’s Trident Enea’s OSEck Graphical Development Tools GEDAE MATLAB/SimuLink/RTW Complete list of SW tools we sell except the MathWorks stuff

58 Software Products Diagram

59 DSP21k-SF Toolkit Host Interface Library (HIL)
Provides C callable interface to BittWare boards from host system Download, upload, board and processor control, interrupts Symbol table aware, converts DSP based addresses Full featured, mature application programming interface (API) Supports all BittWare boards, including FPGA and I/O Configuration Manager (BwConfig) Find, track, and manage all BittWare devices in your system Diag21k – Command line diagnostic utility All the power of the HIL at a command prompt Built-in scripting language with conditionals and looping Assembly level debug with breakpoints stdio support (printf, etc). BitLoader Dynamically load FPGAs via PCI bus (or Ethernet) Reprogram FPGA configuration EEPROM DspBAD/DspTest Automated diagnostic tests for PCI, onboard memory, DSP memory & execution DspGraph Graphing utility for exploring board memory (Windows only)

60 BittWare Target Software Debug Target for VisualDSP++
VisualDSP++ source level debugging via PCI bus Supports most features of the debugger Only Software Target for COTS Sharc Boards Other board vendors require JTAG emulator for VisualDSP debug Multiprocessor Debug Sessions on All DSPs in a System Any processor in the system can be included in a debug session Not limited to the board-level JTAG chain Virtually Transparent to Application No special code, instrumentation, or build required Only uses a maximum of 8 words of program memory - user selectable location Some restrictions compared to JTAG debug For very low level debugging (e.g. interrupt service routines), an ICE is still nice

61 Remote Toolkit & Target
Allows Remote Code Development, Debug, & Control Client-Server using RPC (remote procedure calls) Server on system with BittWare hardware in it (Windows, Linux, VxWorks) Client on Windows machine connected via TCP/IP to server Run All BittWare Tools on Remote PC via Ethernet Diag21k, configuration manager, DspGraph, DspBad, Target Great for remote technical support Run All User Applications on Remote PC Just rebuild user app with Remote HIL instead of regular HIL Run VisualDSP++ Debug Session on Remote PC! No need to plug in JTAG emulator Don’t need Windows on target platform! Toolkit 8.0 Combines Remote and Standard Dsp21k-SF Allows you to access boards in local machine and remote machine No need to rebuild application to use remote board

62 Board Support Libraries & Examples
All Boards Ship with Board Support Libraries & Examples Actual contents specific to each board Provides interface to standard hardware Examples of how to use key features of the board Same code as used by BittWare for validation & production test Examples include: PCI, links, SDRAM, FLASH, UART, utilities, ... Royalty free on BittWare hardware Source Provided for User Customization Users may tailor to their specific needs Hard to create “generic” optimal library as requirements vary greatly PCI Library for All DSP Boards Bus mastering DMA read/write Single access read/write Windows GUIs for All I/O Boards Allow user to learn board control and operation IOBarracuda, AdcPerf These products need to be cleaned up and formalized.

63 FPGA Developer’s Kits For Users Customizing FPGAs on BittWare Boards
Source for standard FPGA loads or examples Royalty free on BittWare hardware Mainly VHDL with some schematic (usually top level) Uses standard Xilinx (ISE Foundation) and Altera (Quartus) tools B2/T2 ATLANTiS FPGA Developer’s Kit TS-201 link transmit and receive ATLANTiS Switches Control registers on peripheral bus (TigerSharc and PCI accessible) Digital I/O SerDes I/O (Aurora, SerialLite, Serial Rapid IO in works) Pre/Post/Co-Processing shells

64 TS-Libs Hand optimised, C-callable TigerSHARC Function Libraries
Floating Point Library Over 450 optimised 32-bit floating point signal processing routines With over 200 extra striding versions Integer Library Over 100 optimised 32-bit integer routines With over 80 extra striding versions Fixed point (16-bit) Library Over 120 optimised 16-bit fixed point signal processing routines Fastest, most optimised library for TS (up to 10x faster than C) Uses latest algorithm theory Well documented, easy to use, and proven over wide user base Allows customers to focus on application (not implementation) Supported & maintained by highly experienced TS programmers Additional routines & functions available upon request

65 TS-Libs Function Coverage
FFT & DCTs 1 & 2-dimension, real/complex, Filters Convolution, correlation, IIR, FIR Trigonometric Vector Mathematics Matrix Mathematics Logic-Test-Sort Operations Statistics Windowing functions Compander Distribution and Pseudo-Random Number Generation Scalar/vector log/cubes, etc. Memory Move Matrix/Vector Other Routines Doppler, signal to noise density, Choleski decomposition

66 Software Technology Overview
Trident Multi Processor Operating Environment

67 BittWare’s Trident - MPOE
Multi-Processor Operating Environment Designed specifically for BittWare’s TigerSHARC boards Built on top of Analog Device’s VDK Provides easy-to-use ‘Virtual Single Processor’ programming model Optimized for determinism, low-latency, & high-throughput Trident’s 3 Prongs: Multi-Tasking multiple threads of execution on a single processor Multi-Processor Transparent coordination of multiple threads on multiple processors in a system Data Flow Management managing high-throughput, low-latency data transfer throughout the system

68 Why is Trident Needed? Ease of Programming
Multiprocessor DSP programming is complicated Many customers don’t have this background/experience Higher-level Tool Integration Need underlying support for higher level software concepts (Corba, MPI, etc) Lack of Alternatives Most RTOSs focus on control and single processor, not data flow and multiprocessor VDK is multiprocessor limited multiprocessor messaging but limited to 32 DSPs no multiprocessor synchronization limited data flow management

69 Transparent Multiprocessing
The key feature Trident provides is Transparent Multiprocessing Allows programmer to concentrate on developing threads of sequential execution (more traditional programming style) Provides for messaging between threads and synchronization of threads over processor boundaries transparently Programmer does not need to know where a thread is located in the system when coding Tools allow for defining system configuration and partitioning threads onto the available processors (at build time) Similar to “Virtual Single Processor” model of Virtuoso/VspWorks

70 Trident Threads Multiple threads spread over single or multiple processors allows user to split application into logical units of operation provides for more familiar linear programming style, I.e. one thread deals with one aspect of the system locate threads at build time on appropriate processors Priority based preemptive scheduler (per processor) multiple levels of priority for threads round robin (time slice) or run to completion within a level preemption between levels based on a system event (eg. an interrupt) Synchronization & control of threads Message between threads within a processor or spanning multiple processors semaphores for resource control available for access anywhere in system

71 Trident Runtime Device drivers for underlying board components
Framework: message passing core responsible for addressing, topology and boot-time synchronization to support up to 65k processors Initial Modules CDF, MPSync, MPMQ Optional Modules Future functionality User expansion User API

72 Trident Modules - CDF Continuous Data Flow module provides raw link port support Suitable for device I/O at the system/processing edge, e.g. ADC Simple-to-use interface for reading and writing data blocks across link ports Supports Single data block transfer Vector data block transfer Continuous data block transfers User-supplied call-back Mix-and-match approach Continuous Data Flows API Trident_RegisterCallbackFunction Trident_UnregisterCallbackFunction Trident_Write Trident_Read Trident_WriteV Trident_ReadV Trident_WriteC Trident_ReadC

73 Trident Modules - MPSync
Multiprocessor Synchronization Synchronization methods are essential in any distributed system to protect shared resources or coordinate activities Allows threads to synchronize across processor boundaries Semaphores: counting and binary Barriers: a simple group synchronization method

74 Trident Modules - MPMQ Multiprocessor Message Queues
Provides for messaging between threads anywhere in the system transparently Extends the native VDK channel-based messaging into multiprocessor space Provides point-to-point and broadcast capabilities

75 VDSP++ IDE Integration
Trident Plugin fully integrated within VDSP++ Configures The boards and their interconnections The VDK projects Any Trident objects Builds the configuration files Configures VDK kernel to support Trident runtime

76 Trident – to Market Beta released Summer 2006
First full release November 7 Pricing ~$10k per project (max 3 developers) when purchased with BittWare Hardware Royalty free on BittWare hardware 30 day trials available

77 Trident – Future Directions
Extend debug and config tools Add support for buses (cluster, PCI) Add support for switch fabrics (RapidIO, ?) Incorporate FPGAs as processing elements “Threads” located in FPGAs as sources/sinks for messaging Port to other processors Trident designed to use basic features of a kernel, so could port to other platforms and kernels

78 BittWare’s Gedae BSP for TigerSHARC

79 What Gedae says Gedae is

80 What is Gedae? Graphical Entry Distributed Application Environment
Originally developed by Martin Marietta (now Lockheed Martin) under DARPA’s RASSP initiative to ‘abstract’ HW-level implementation A graphical software development tool for signal processing algorithm design and implementation on real-time embedded multiprocessor systems A tool designed to reduced software development costs and build reusable designs A tool that can help analyze the performance of the embedded implementation and optimize to the hardware

81 System Development in Gedae
1) Develop Algorithm that runs on the workstation - A tool for algorithm development - Design hardware independent systems - Design reusable components 2) Implement systems on the embedded hardware - Port designs to any supported hardware - Re-port to new hardware

82 Designing Data Flow Graphs (DFG)
Basic Gedae interface: Design systems from Standard Function Units in the hardware optimized embeddable library Function blocks represent the function units (FFT, sin, FIR, etc.) Optimized routines/blocks form GEDAE “e_” library 200 routines taken from TS-Libs for BittWare BSP Underlining code that each function block calls for execution is called a Primitive (written similar to C)

83 Designing Data Flow Graphs (DFG)
Create sub-blocks to define your own function units (add to e_ library for component reuse) Connecting lines represent the token streams. The underlying communications are handled by the hardware BSP

84 Gedae Data Communications
Uses data flow by token streams Communication is handled when transfer across hardware Scalar values (or structures) Vectors Matrices

85 Run-time Schedules Static Scheduling Dynamic (Runtime) Scheduling
The execution sequence and memory layout specified by the DFG A schedule boundary is forced by dynamic queues Dynamic (Runtime) Scheduling Static schedule boundaries are forced when variable token streams are only determined at runtime Queues are used to separate two static schedules when this occurs Functions require defined number of tokens to run a branch, valve, merge, switch effect the token flow Produces one static schedule for each part separated by a queue This black square indicates a queue

86 Run-time Schedules – Memory Usage
One of the primary resources available on a DSP is the memory Memory scheduling dramatically reduces the amount of memory used by a static schedule Gedae used memory packer modes: No packer: Gedae uses different memory for each output (wasteful) When function is finished, the memory is reused Other packers trade-off the time to pack with optimality of packing Vertically - static schedule Horizontally - memory used

87 Create parallelism in DFG
A simple flow graph function blocks can be distributed across multiple processors A “family” of function blocks can be distributed across multiple of processors Families creates multiple instances of function block which can express parallelism Gedae treats families as separate function blocks (referenced with a vector index) - 1 2 3 4 5 n

88 Partitioning a Graph Partitioning a Graph to multiple processors
To run the function blocks on separate processors, partition a DFG into parts A separate executable is created for each part Partitions are independent of schedules Gedae creates a static schedule for each partition Extensive Group Controls facilitate management of partitions

89 Visualization Tools: Trace table
Gedae has powerful visualization tools to view the timings of the processor schedules Receive Operation Send Blocked

90 Trace table – Function Details
Gedae has powerful visualization tools to view the trace details of a given function

91 Trace table - Parallel Operation
Parallel DSP Operation

92 BittWare’s Gedea BSP for TigerSHARC
What does the BittWare Gedae BSP Provide? Optimized routines for the Gedae embeddable “e_” library 200 TS-Libs functions – more can be ported if needed Memory Handler Communication Methods Support for HW capabilities: Link & Cluster bus Multi-DSP Board Capability Up to 128 clusters Networking Support Development and control of distributed network of BittWare boards, with remote debug capabilities BSP Support with over 12 man/years of TigerSHARC expertise

93 BSP Data Transfer Methods
SHARED_WORD (cluster bus word-sync transfers) SHMBUF (cluster bus buffered transfers) LINK (link port transfers) DSA_LINK (DMA over the link ports) DSA_SHMBUF (DSA DMA over the cluster bus)

94 Data Transfer Rates – Shared Memory
dsa_shmbuf Data Transfer Rate 100 200 300 400 500 600 700 32 64 128 256 512 1024 2048 4096 8192 Data Transfer Size MBytes/sec best_send_ready Hardware Max rate  Mbytes / second For 1k data packets  450 Mbytes / second (on-board)

95 Data Transfer Rates – Link Ports
Hardware Max rate  250 Mbytes / second For 1k data packets  230 Mbytes / second

96 Gedae/BSP Summary Allows Gedae to target BittWare’s TigerSHARC Boards
Provides portable designs for embedded multi-DSP Scheduling, communication and memory handling is provided Optimized functions are provided for each supported board BittWare’s Gedae BSP for TigerSHARC: Allows Gedae to target BittWare’s TigerSHARC Boards Compiles onto multiple DSP (up to 8 per board) Compiles to multiple boards (currently up to 128 boards) Optimized TigerSHARC library of functions Multiple communication methods (with efficient, high data rates) Removes TigerSHARC specialist engineering

97 Additional Slides/Info

98 Demo Description Dual B2-AMC hybrid signal processing boards
2S90 Stratix II FPGA Quad TigerSHARC DSPs FINe control interface via GigE ATLANTiS framework Reconfigurable data routing ‘Patch-able’ processing 4x Serial RapidIO endpoint implemented in FPGA 12.5 Gb/s inter-board xfer rates; 10 Gb/sec max payload rate 90% efficiencies MicroTCA-like “Pico Box”

99 Demo Hardware BittWare’s B2-AMC CorEdge’s PicoTCA

100 Demo System Architecture

101 ATLANTiS – B2

102 ATLANTiS – SRIO Switch 1

103 ATLANTiS – Connecting to FPGA Filters
Switch 2


Download ppt "BittWare Overview March 2007."

Similar presentations


Ads by Google