Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENG6530 Reconfigurable Computing Systems Hardware Software Co-design

Similar presentations


Presentation on theme: "ENG6530 Reconfigurable Computing Systems Hardware Software Co-design"— Presentation transcript:

1 ENG6530 Reconfigurable Computing Systems Hardware Software Co-design
ENG6530 RCS School of Engineering

2 Topics H/S Co-Design Definition Motivation Design Steps, Xilinx EDK
Profiling, Partitioning Allocation Xilinx EDK ENG6530 RCS School of Engineering

3 References “Embedded System Design: A Unified Hardware/Software Introduction” by Frank Vahid, Wiley, 2002. “Hardware/Software Codesign: A systematic approach targeting data-intensive applications”, Wayne Luk, IEEE Signal processing Magazine, May 2005. “Hardware-Software Co-synthesis for Digital Systems”, R.Gupta, G. De Micheli, G., IEEE Design & Test of Computers, September 1993, pp “Hardware/Software Design Space Exploration for a Reconfigurable Processor”, A. Rosa, 2003. “A Framework for Hardware/Software Co-design”, S. Kumar, Q. Wulf, IEEE 1993. ENG6530 RCS School of Engineering

4 Definition – Hardware/Software Co-Design
The design of computer systems that incorporates both standardized off the shelf processors, or software, as well as specialized hardware. The cooperative design of hardware and software components. The unification of currently separate hardware and software paths. The movement of functionality between hardware and software. Co-design has been a buzzword in the electronics industry for over a decade now… The best definition I’ve found… Other “co-design” studies have focused on the last element, co-verification… co-verification is testing blocks devoted to software on a virtual prototype of system hardware using a co-simulation tool or a real time operating system (RTOS) co-design tools are not yet an integral part of most design flows. This is most likely due to the high level of abstraction at which the blocks emerge from many co-design processes, making it difficult for today's synthesis technologies to convert them into realizable hardware ENG6530 RCS School of Engineering

5 H/S Co-design: Example
Optical wheel speed sensor. System constraints  Area – 40 units, time – 100 cycles This could be implemented using either standardized processors, specialized hardware or a combination of both Input Decoding FIR Filter Tick to Speed Inversion Output Encoding ENG6530 RCS

6 H/S Co-design: Software
Design implemented in software System constraints Area – 48 units > 40 units Time – 132 cycles > 100 cycles Design Time – 2 months Processor #1 Processor #2 ENG6530 RCS

7 H/S Co-design: Hardware
Design implemented in custom RTL hardware System constraints Area – 24 units, < 40 units Time – 52 cycles < 100 cycles Surpasses both area and timing constraints by 40% Design Time – 9 months Delay in design is unacceptable in a competitive world. ENG6530 RCS

8 H/S Co-design Design implemented in hardware & software
System constraints Area – 37 units, < 40 units Time – 95 cycles < 100 cycles Design Time – 3.5 months Not as efficient as design II However, it establishes a balance between two extremes. Processor #1 ENG6530 RCS

9 Motivations Achieve performance by moving software bottlenecks to hardware Use hardware to meet time & area constraints which cannot be met alone using general purpose processors. Not possible to put everything in hardware due to limited resources Some code more appropriate for sequential implementation (i.e. achieve flexibility) Today’s designs are focusing on Embedded Systems which require both hardware and software modules Co-design has been a buzzword in the electronics industry for over a decade now… The best definition I’ve found… Other “co-design” studies have focused on the last element, co-verification… co-verification is testing blocks devoted to software on a virtual prototype of system hardware using a co-simulation tool or a real time operating system (RTOS) co-design tools are not yet an integral part of most design flows. This is most likely due to the high level of abstraction at which the blocks emerge from many co-design processes, making it difficult for today's synthesis technologies to convert them into realizable hardware ENG6530 RCS School of Engineering

10 Motivations … cont The complexity and functionality of computer systems are increasing at a dramatic rate  SystemOnChip (SOC). It is difficult for custom systems to be designed, built, verified within an acceptable time period even with advanced CAD tools unless standardized parts are used. (Solution?) Take advantage of previously designed (IPs) and tested processor to reduce time and improve reliability. ENG6530 RCS

11 Trade-offs/Decisions
Given a set of specified goals and implementation technology, constraints, … designers consider trade-offs in how hardware and software components work together. Decisions, Constraints and Evaluations? Performance. Area. Power. Flexibility (Programmability). Development & Manufacturing costs. Reliability Robustness Maintenance Design evolution. ENG6530 RCS

12 Hw/Sw Co-Design: Research
Research in hardware-software co-design encompasses many interesting areas of research such as: System specification and modeling Design Exploration System co-verification and co-simulation Code generation for hardware/software Hardware/Software interfacing Partitioning Scheduling However the most important objective is to develop a unified design methodology/tool for creating systems containing both hardware and software. Co-design has been a buzzword in the electronics industry for over a decade now… The best definition I’ve found… Other “co-design” studies have focused on the last element, co-verification… co-verification is testing blocks devoted to software on a virtual prototype of system hardware using a co-simulation tool or a real time operating system (RTOS) co-design tools are not yet an integral part of most design flows. This is most likely due to the high level of abstraction at which the blocks emerge from many co-design processes, making it difficult for today's synthesis technologies to convert them into realizable hardware ENG6530 RCS School of Engineering

13 A Simple Approach Profiling Application Partitioning Evaluation
Decision Schedule tasks H/W S/W ENG6530 RCS

14 Profiling and Partitioning
Profiler Benefits Speedups of 2X to 10X typical Far more potential than dynamic SW optimizations (1.2x) Energy reductions of 25% to 95% typical SW ______ SW ______ Critical Regions SW ______ HW Processor SW ______ Processor ASIC/FPGA ENG6530 RCS

15 Profiling Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing. The profiler uses information collected during the actual execution of your program, therefore, it can be used on programs that are too large or too complex to analyze by reading the source. This information can show you which pieces of your program are slower than you expected. These might be candidates for either: Rewriting code to make your program execute faster. Moving these functions to hardware. ENG6530 RCS

16 Profiling: Steps You must compile and link your program with profiling enabled. cc -o myprog.exe myprog.c utils.c –g –pg You must then execute your program to generate a profile data file Your program will write the profile data into a file called `gmon.out’ just before exiting. You must run gprof to analyze the profile data. gprof options myprog.exe gmon.out > outfile The gprof program prints a flat profile and a call graph ENG6530 RCS

17 Profiling: Useful Hints
Options: -e function_name : tells gprof to NOT print information about the function function_name (and its children …) in the call graph. -f function_name: causes gprof to limit the call graph to the function function_name and its children. -b : gprof doesn’t print the verbose blurbs that try to explain the meaning of all of the fields in the tables. ENG6530 RCS

18 Profiling: Flat Profile
% time : is the percentage of the total execution time your program spent in this function. cumulative seconds: This is the cumulative total number of seconds the computer spent executing this function plus time spent in all the functions above. self seconds: This is the number of seconds accounted for by this function alone. calls: this is the total number of times the function was called. self ms/call: This represents the average number of milliseconds spent in this function per call. total ms/call: This represents the average number of milliseconds spent in this function and its descendants per call. name: This is the name of the function. ENG6530 RCS

19 Simple Approach: Drawbacks
Some functions might not be easily mapped onto hardware. Decisions taken very early at profiling phase might not be optimal. No consideration for interfacing and communication. If the application changes slightly then we need to re-profile and re-partition. ENG6530 RCS

20 Applications Not suitable for RCS
Not all applications are suitable for Reconfigurable Computing: Applications that involve extensive recursion, for example, are a poor match because the synthesized “hardware” must be of fixed size. Applications that have only a small percentage of parallelism (1-5%) will not make advantage of RCS. Applications that are I/O bound will also suffer due to memory I/O transfer Applications that require floating point arithmetic ENG6530 RCS

21 Design Space Exploration
Scheduling/Arbitration proportional share WFQ static dynamic fixed priority EDF TDMA FCFS Communication Templates Computation Templates DSP mE Cipher SDRAM RISC FPGA LookUp Which architecture is better suited for our application? Architecture # 1 Architecture # 2 LookUp RISC EDF mE mE mE TDMA static Priority mE mE mE WFQ Cipher DSP ENG Design Exploration

22 H/S Codesign: A Framework
System Representation System Evaluation CoDesign Refinement (Produce a hardware software alternative via evaluation) Decomposition (Break down system functions into a collection of sub-functions) H/S Partitioning (Determine which of the sub-functions should be implemented in H/S) System Integration ENG6530 RCS

23 Co-Synthesis/Co-Design
ENG6530 RCS

24 Partitioning & Scheduling
Task partitioning and task scheduling are required in many applications, for instance co-design systems, Multi Processing Systems and High Level Synthesis. Sub-tasks extracted from the input description should be implemented in the Where?  The right place (using the Partitioner/Placer) When?  The right time (using the scheduler) It is well known that such scheduling and partitioning problems are NP-complete. Optimization techniques based on heuristic methods are generally employed to explore the search space so that feasible and near-optimal solutions can be found. ENG6530 RCS

25 System Partitioning Good partitioning mechanism:
process (a, b, c) in port a, b; out port c; { read(a); write(c); } Specification Line () { a = … detach } Interface Partition Model FPGA Capture Synthesize Processor Good partitioning mechanism: Minimize communication across bus Allows parallelism  both hardware (FPGA) and processor operating concurrently Load Balancing  Near peak processor utilization at all times (performing useful work) ENG6530 RCS

26 Terminology: Hypergraphs
a hypergraph H = <V, Eh> V is a set of vertices h Eh is a subset of vertices, 2V a graph G = <V, E> V is a set of vertices e  E is a pair of vertices (u,v) a netlist is a hyper-graph Hyper-graphs can be approximated as graphs, breaking each hyper-edge into a clique of edges ENG6530 RCS

27 Bi-partitioning Problem
given a hyper/graph G find a partition P of V V1, V2 s.t V1V2=, V1V2=V minimizing number of edges that cross the cut min c(P) = all h w(h) if (uV1 and vV2) where u and v are connected by edge h subject to a capacity constraint a a-1 > |V1| / |V2| > a ENG6530 RCS

28 Bipartitioning Approaches
Exact Methods: Mixed Integer Programming (using Branch and Bound) !! min-cut / max-flow (Ford-Fulkerson 1962) maximum flow through graph = minimum cut useful for establishing unconstrained bound Heuristics (Local Search) Kernighan-Lin (1970) operates on graphs swap all nodes once, in pairs that yield max. gain choose greatest gain over pass,repeat until no improvement O(n2log n) Fiduccia-Mattheyses (1982) operates on hypergraphs O(p), linear time! Meta Heuristics (avoid getting stuck in local minima) Simulated annealing select some random moves based on “temperature” design hopefully “cools” into optimal solution computationally intensive Tabu Search Genetic Algorithms Particle Swarm Optimization ENG6530 RCS

29 Fiduccia-Mattheyses - generate initial partition
- calculate gain g(c) of moving each cell while improvement { clear cells being locked; while max g(c) > 0 | c  locked select cell with max g(c) | c  locked; move c across the cut; c → locked; update g(c) for all of c’s neighbors; } one pass O(p) ENG6530 RCS

30 Example goal: partition graph into two
disjoint halves so as to minimize the number of hyperedges that span the cut c a b e d all edges have unit weight given balance criteria: |V1| -1 ≥ |V2| ≥ |V1| + 1 f ENG6530 RCS

31 Example (cont’d) c a b Step 1. random partition
assigned to keep balance e d f number of cuts = 5 ENG6530 RCS

32 Example (cont’d) +1 +2 +2 Step 2. initial gains are
calculated for each cell results are placed into bucket array c a b e d -1 +1 +2 d number of cuts = 5 ENG6530 RCS

33 Example (cont’d) +1 d Step 3. cell is selected gains of critical nets
+1 d Step 3. cell is selected gains of critical nets are updated cell is locked from further movement c a b e d -1 +1 number of cuts = 3 ENG6530 RCS

34 Example (cont’d) d Step 3. c b Another cell is selected
d Step 3. Another cell is selected gains of critical nets are updated cell is locked from further movement c b e a d -1 -1 number of cuts = 2 ENG6530 RCS

35 Co-design: Tools Co-design tools should provide an almost automatic framework for producing a balanced and optimized design from some initial high level specification. The goal of co-design tools and platforms is not to push towards this kind of total automation. The designer interactions and continuous feedback is considered essential. The main goal is to incorporate in the black box of co-design tools that support for shifting functionality and implementation between HW   SW with effective and efficient evaluation. ENG6530 RCS

36 H/S Co-Design: Approaches
Opposite strategies Vulcan (“primal” approach) Functionality all in HW (HardwareC) initially Move some to CPU to reduce architecture cost Cosyma (“dual” approach) Functionality all in SW (Cx) initially Move some to ASIC to meet performance goals Lycos Convert all functionality to neutral form ENG6530 RCS

37 Partitioning Algorithms
Software Hardware task List of tasks List of tasks Assume everything initially in software Select task for swapping Migrate to hardware and evaluate cost? Timing, hardware resources, program and data storage, synchronization overhead Cost evaluation and move evaluation similar to what we’ve seen regarding min-cut FM Algorithm. ENG6530 RCS

38 Automation Compiler profiler determines dependence and rough performance estimates Result of compilation is synthesizable HDL and assembly code for the processor ENG6530 RCS

39 Interfacing System Description Interfacing between software and hardware modules is crucial for successful Co-design How data is passed between sub-modules efficiently. The rate of exchange of information between modules Hw/Sw Partitioning Co Synthesis Interface Software Hardware System Integration Co-Simulation ENG6530 RCS

40 Interface Models: FIFO
Synchronization through a FIFO FIFO can be implemented either in hardware or in software Effectively reconfigure hardware (FPGA) to allocate buffer space as needed Interrupts used for software version of FIFO r3 p1 p2 p3 r2 d1 FPGA Control/Data FIFO d3 d2 ENG6530 RCS

41 Dynamic Part. Module (DPM)
Warp Processors 2 Profile application to determine critical regions 1 Initially execute application in software only Profiler 3 Partition critical regions to hardware MIPS/ARM I$ 5 D$ Partitioned application executes faster with lower energy consumption Configurable Logic Dynamic Part. Module (DPM) 4 Program configurable logic & update software binary ENG6530 RCS

42 Summary Hardware/Software co-design is becoming the common design style for building systems. H/S co-design allows the majority of a system to be designed quickly with standardized parts while special purpose hardware is used for time critical portions of the system. Xilinx and Altera provide complete flow for H/S co-design. Issues: How to partition the system? Communication overhead!! Platforms to be used Languages that support this paradigm. ENG6530 RCS

43 Extra Slides ENG6530 RCS

44 Embedded CPUs PowerPC 405 (hard core) ARM Cortex –A9 (hard core)
32 bit embedded PowerPC RISC architecture Up to 450 MHz 2x16 kB instruction and data caches Memory management unit (MMU) Embedded in Virtex-II Pro and Virtex-4/5/6 ARM Cortex –A9 (hard core) 32 bit multicore processor Up to 900 MHz Xilinx Zynq 7000 Processing platform Device is processor based attached to FPGA High level of performance Reduces power, cost, size MicroBlaze (soft core) 32 bit RISC architecture 2 64 kB instruction and data caches Hardware multiply and divide OPB and LMB bus interfaces... ENG6530 RCS School of Engineering

45 Hard core Virtex-4 Processors: Soft core Embedded Processors Faster
Fixed position Few devices Virtex-4 Processors: Soft core Slower Can be placed anywhere Applicable to many devices PicoBlaze MicroBlaze MicroBlaze PowerPC Embedded Processor Core Type Max Clock Frequency Slices PLBs Block RAMs PowerPC Hard 222 MHz 1000 250 9 Microblaze Soft 180 MHz 940 235 Picoblaze 221 MHz 333 84 3 (optimized) 233 MHz 274 69 ENG6530 RCS School of Engineering

46 Soft and Hard cores in current FPGAs
Power Supply CLK custom IF-logic SDRAM SRAM Memory Controller UART LC Display Controller Interrupt Controller Timer Audio Codec CPU (uP / DSP) Co- Proc. GP I/O Address Decode Unit Ethernet MAC ENG6530 RCS

47 Next Step... CPU (uP / DSP) Power Supply CLK FPGA GP I/O Timer SRAM
LC Audio Codec CLK Ethernet MAC FPGA Interrupt Controller Timer GP I/O Address Decode Unit CPU (uP / DSP) UART Co- Proc. Memory Controller custom IF-logic CLK SDRAM SRAM Display Controller ENG6530 RCS

48 Configurable System on a Chip (CSoC)
Power Supply SDRAM SRAM LC Audio Codec EPROM ENG6530 RCS

49 Soft CPU Core: „MicroBlaze“ (Xilinx Inc.)
ENG6530 RCS

50 PowerPC-based Embedded Design
405 Core Dedicated Hard IP Flexible Soft IP RocketIO DSOCM BRAM ISOCM IBM CoreConnect™ on-chip bus standard PLB, OPB, and DCR DCR Bus Arbiter Processor Local Bus Instruction Data PLB Arbiter On-Chip Peripheral Bus OPB Bus Bridge Hi-Speed Peripheral GB E-Net e.g. Memory Controller UART GPIO On-Chip Peripheral Off-Chip Memory ZBT SRAM DDR SDRAM SDRAM Build DSOCM and ISOCM—The PowerPC 405 core is provided with non-bused, non-cacheable, low-latency data and instruction memory interfaces, known as the 32-bit data-side On-Chip Memory and the 64-bit instruction-side On-Chip Memory. Typical uses of data-side OCM include scratch-pad memory and using the dual-port feature of block RAM to enable a bidirectional data transfer between the processor and the FPGA. The typical use for instruction-side OCM is storage of interrupt service routines. One of the primary advantages of OCM is that it guarantees a fixed latency of execution because there is no bus arbitration required for the OCM interface. Also, it reduces cache pollution and thrashing, because the cache remains available for caching code from other memory resources. Build PLB—The processor local bus (PLB) interface provides a 32-bit address and three 64-bit data buses attached to the instruction-cache and data-cache units. Two of the 64-bit buses are attached to the data-cache unit, one supporting read operations and the other supporting write operations. The third 64-bit bus is attached to the instruction-cache unit to support instruction fetching. The prime goal of the PLB is to provide a high-bandwidth, low-latency connection between bus agents that are the main producers and consumers of the bus transaction traffic. This is the bus to which you connect your higher speed peripherals (e.g., G-Ethernet Mac) and memory. Build OPB—The On-chip Peripheral Bus (OPB) provides a fully synchronous 32-bit address and 32-bit data bus. The prime goal of the OPB is to provide a flexible connection path to peripherals and memory, while providing minimal performance impact to the PLB bus. Put your slower peripherals on this bus, such as UARTs, GPIO, 10/100 E-Net MAC, etc. Build DCR—The DCR (Device Control Register) bus is a 32-bit bus for removing device configuration slave loads, memory address resource use, and configuration transaction traffic from the main system buses. Most traffic on the DCR bus occurs during the system initialization period; however, some elements, such as the DMA controller and the interrupt controller cores, use the DCR bus to access normal functional registers used during operation. Full system customization to meet performance, functionality, and cost goals ENG6530 RCS School of Engineering

51 MicroBlaze-based Embedded Design
I-Cache BRAM Local Memory Bus MicroBlaze 32-Bit RISC Core Flexible Soft IP BRAM Configurable Sizes D-Cache BRAM Possible in Virtex-II Pro LocalLink™ FIFO Channels 0,1…….32 Custom Functions Arbiter OPB On-Chip Peripheral Bus UART 10/100 E-Net On-Chip Peripheral Build MB —Because MicroBlaze is a soft-logic processor, it runs on all current FPGA families. Build Caches —With MicroBlaze, you can select whether to use an instruction cache and a data cache and their sizes. FPGA BRAM is used for these caches. Build LMB — MicroBlaze has a 32-bit Local Memory Bus (LMB) that is used for low-latency access to on-chip BRAM. The LMB provides single-cycle access to on-chip dual-port block RAM and is split into instruction-side LMB and data-side LMB. Build off-chip memory —EDK includes memory controllers for off-chip flash, SDRAM, or DDR SDRAM Build LocalLink — MicroBlaze contains zero to eight input LocalLink interfaces and zero to eight output LocalLink interfaces. The LocalLink channels are dedicated uni-directional point-to-point data-streaming interfaces. The LocalLink interfaces on MicroBlaze are 32 bits wide. Further, the same LocalLink channels can be used to transmit or receive either control or data words. A separate bit indicates whether the transmitted (received) word is control or data information. There are two two cycle-assembly instructions: get and put. Wrapping this into custom C functions with appropriate custom hardware also provides an optimal method by which you can implement custom instructions for the processor. Build OPB — MicroBlaze also shares the On-chip Peripheral Bus (OPB) with the PowerPC. The same peripherals that work on the OPB with the PowerPC can also be used with MicroBlaze. Build PPC — Finally, because the OPB is shared, one can hang a MicroBlaze as an OPB peripheral connected to a PowerPC system on Virtex-II Pro. Off-Chip Memory FLASH/SRAM ENG6530 RCS School of Engineering

52 MicroBlaze: Architecture & Features
OPB LMB Features RISC Thirty-two 32-bit general purpose registers 32-bit instruction word with three operands and two addressing modes Separate 32-bit instruction and data buses OPB (On-chip Peripheral Bus) Separate 32-bit instruction and data buses LMB (Local Memory Bus) ENG6530 RCS

53 MicroBlaze: Bus Configurations
1. MicroBlaze core 2. 3. 4. 5. LMB: Memory Controller (BRAMs) OPB: Ext. Memory Ctrl., Interrupt Ctrl., UART, Timer, Watchdog, SPI, JTAG-UART, etc. 6. ENG6530 RCS

54 Embedded Development Tool Flow Overview
Standard FPGA HW Development Flow Synthesizer Place & Route Simulator VHDL/Verilog ? Download to FPGA Standard Embedded SW Development Flow C Code Compiler/Linker (Simulator) Object Code ? CPU code in off-chip memory CPU code in on-chip memory Download to Board & FPGA This slide illustrates a typical development flow for both the software (left side) and hardware (right side). The ultimate objective here is to use an FPGA in a real application. Hardware developers will start with the HDL coding, use a synthesis tool of their choice, simulate the design, and verify that the synthesis tool is working accurately. Then, they will use an implementation tool to place & route the design and generate a bitstream. Use a programming tool, such as iMPACT, to download the bitstream. The software side of the house will start developing an application in HLL (example-C), compile the code, generate the object code, link it to create an executable code, and program it in an off-chip memory sitting next to the FPGA. Productivity is improved by innovative methods of using Data2MEM to update the block RAM memory with changes to software code without re-running P&R and software and hardware IP generation. In the center, you see a tool called the Embedded Development Kit (EDK), which facilitates all the functions we have identified so far in this slide. This tool “ties” the hardware implementation and software implementation flows together by automatically generating both the hardware components and the software components for a processor system. Debugger ENG6530 RCS School of Engineering

55 EDK The Embedded Development Kit (EDK) consists of the following:
Xilinx Platform Studio – XPS Base System Builder – BSB Create and Import Peripheral Wizard Hardware generation tool – PlatGen Library generation tool – LibGen Simulation generation tool – SimGen GNU software development tools System verification tool – XMD Virtual Platform generation tool - VPgen Software Development Kit (Eclipse) Processor IP Drivers for IP Documentation Use the GUI or the shell command tool to run EDK Xilinx Platform Studio (XPS) provides an integrated environment for creating the software and hardware specification flows for a Embedded Processor system. XPS also provides an editor and a project management interface to create and edit source code. XPS offers customization of tool flow configuration options. XPS also provides a graphical system editor to connect processors, peripherals, and buses. Base System Builder (BSB) is a software tool that helps you quickly build a working hardware system for a specific development board. For the boards supported by BSB, you only need to input minimum information, while BSB automatically fills in the rest based on a board description file and intelligent defaults. After BSB is done, you can go back to XPS and further modify and enhance the system. Hardware generation is performed by the Platform Generator (PlatGen) tool and an MHS file. This will construct the embedded processor system in the form of hardware netlists (HDL and implementation netlist files). PlatGen generates the necessary banks of memory and the initialization files for the BRAM Block (bram_block). To configure libraries and device drivers, the Library Generator (LibGen) is, generally, the first tool to run. LibGen takes an MSS file, created by the you, as input. The Simulation Model Generator (SimGen) creates and configures various VHDL and Verilog simulation models for a specified hardware.SimGen takes an MHS file as input that describes the hardware. The Xilinx Microprocessor Debugger (XMD) facilitates a unified GDB interface and provides a Tcl interface for debugging programs and verifying systems using the IBM PowerPC or MicroBlaze microprocessors. The Virtual Platform Generator (VPgen) is a cycle-accurate simulation model of the hardware system. The Virtual Platform can be used to debug and profile software application code on the host machines, eliminating the need to get the actual hardware working on a prototype board. The Xilinx Platform Studio Software Development Kit (SDK) is a complementary GUI to XPS and provides a development environment for software application projects. SDK is based on the Eclipse open source standard. Platform Studio SDK features include: Feature-rich C/C++ code editor and compilation environment, Project management, Application build configuration and automatic Makefile generation, Error Navigation, Well-integrated environment for seamless debugging of embedded targets, Source code version control School of Engineering

56 EDK Files MHS = Microprocessor Hardware Specification
MSS = Microprocessor Software Specification MPD = Microprocessor Peripheral Description PAO = Peripheral Analyze Order BBD = Black-Box Definition MDD = Microprocessor Driver Description BMM = BRAM Memory Map School of Engineering

57 Design Flow: Hardware I
Platform Definition (peripherals, configuration, connectivity, address space) *.mhs Generate Netlist  EDK / Xilinx Platform Studio ENG6530 RCS

58 Design Flow: Hardware II, ISE Env
Platform Definition (peripherals, configuration, connectivity, address space)  EDK: Embedded Development Kit  XPS: Xilinx Platform Studio  ISE: Integrated Software Environment  MHS: Microprocessor Hardware Specification *.mhs Generate Netlist ISE Platform Ext. Proj.Nav. / VHDL *.bit XPS Generate Bitstream *.ucf ENG6530 RCS

59 Design Flow: Software Hardware Software Platform Definition (peripherals, configuration, connectivity, address space)  EDK: Embedded Development Kit  XPS: Xilinx Platform Studio  ISE: Integrated Software Environment  MHS: Microprocessor Hardware Specification *.h Gen. Libs *.elf *.c *.asm Compile & Link *.mhs Generate Netlist ISE Platform Ext. Proj.Nav. / VHDL *.bit XPS Generate Bitstream *.ucf ENG6530 RCS

60 Design Flow: Combine HW + SW
Hardware Software Platform Definition (peripherals, configuration, connectivity, address space)  EDK: Embedded Development Kit  XPS: Xilinx Platform Studio  ISE: Integrated Software Environment  MHS: Microprocessor Hardware Specification *.h Gen. Libs *.elf *.c *.asm Compile & Link *.mhs Generate Netlist ISE Platform Ext. Proj.Nav. / VHDL *.bmm *.bit XPS Generate Bitstream *.ucf Update Bitstream *.bit ENG6530 RCS

61 Summary Xilinx provides a CAD tool in the form of EDK/ISE to implement a soft core and manage the whole hardware/software development process. The soft cores in the form of a single Micro-Blaze enables hardware/software co-design where sequential code can run on the processor and bottlenecks can run on a dedicated hardware accelerator attached to the Micro-Blaze. ENG6530 RCS


Download ppt "ENG6530 Reconfigurable Computing Systems Hardware Software Co-design"

Similar presentations


Ads by Google