Download presentation
1
ALTERA FPGAs and NIOSII
ELG6158 Computer Systems Architecture Miodrag Bolic
2
Presentation Outline Basic description of Stratix Altera Devices
NIOS II processor architecture How to design a system using NIOS II processor
3
Stratix EP1S10 [2]
6
TriMatrix™ Memory [1] Dedicated External Memory Interface M512 Blocks
M4K Blocks M-RAM Small FIFOs Shift Register Rake Receiver Correlator FIR Filter Delay Line Header / Cell Storage Channelized Functions ATM cell–packet processing Nios Program Memory Packet / Data Storage Nios Program Memory System Cache Video Frame Buffers Echo Canceller Data Storage Look-Up Schemes Packet & Cell Buffering Cache More Bits For Larger Memory Buffering 512 Kbits per block + parity 4 Kbits per block + parity 512 bits per block + parity More Data Ports for Greater Memory Bandwidth
7
Memory Bandwidth Summary Stratix Device Family [1]
Total RAM Bits M-RAM Blocks M4K Blocks M512 Blocks Maximum Bandwidth (Mbps) EP1S10 920,448 1 60 94 1,245,024 EP1S20 1,669,248 2 82 194 2,096,928 EP1S25 1,944,576 138 224 2,894,400 EP1S30 3,317,184 4 171 295 3,750,192 EP1S40 3,423,744 183 384 4,384,800 EP1S60 5,215,104 6 292 574 6,762,528 EP1S80 7,427,520 9 364 767 8,784,720
9
Logic Array Blocks (LAB) [2]
Control Signals 10 LEs Local Interconnect LAB-Wide Control Signals 4 LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 4 4 4 4 Local Interconnect 4 4 4 4 4
10
LAB Arrangement LAB LAB LAB LAB LAB LAB M512 LAB LAB LAB LAB LAB LAB
LABs Communicate Directly to Each Other & Other Blocks Both Horizontally & Vertically LAB Column LAB LAB LAB LAB LAB LAB M512 LAB Row LAB LAB LAB LAB LAB LAB M512
11
Logic Elements Stratix™ LE
Smallest Units of Logic Used for Combinatorial/Registered Logic Carry-In Register Chain Input LUT Chain Input Stratix™ LE General Routing & Local Routing Carry-Out LUT Chain Output Register Chain Output
12
Total LE Resources Device Total LEs EP1S10 10,570 EP1S20 18,460 EP1S25
25,660 EP1S30 32,470 EP1S40 41,250 EP1S60 57,120 EP1S80 79,040
13
LE Datasheet Image
14
LE Features 4-Input Look-Up Table (LUT) Configurable Register
2 Operation Modes Dynamic Add/Subtract Control Carry-Select Chain Logic Performance-Enhancing Features LUT & Register Chain Area-Enhancing Features Register Packing & Feedback
15
LE Inputs/Outputs Inputs Outputs 4 Data
2 LE Carry-Ins & 1 Lab Carry-In 1 Dynamic Addition/Subtraction Control Register Controls Outputs 2 LE Carry-Outs 2 Row/Column/DirectLink Outputs 1 Local Output 1 LUT Chain & 1 Register Chain
16
Operation Modes Normal Dynamic Arithmetic
General Combinatorial or Registered Logic Dynamic Arithmetic Used for Adders Counters Accumulators Comparators Uses Carry Chain for Faster Operation Chosen Automatically by Quartus® II & NativeLink® Synthesis Tools Based on Design & Design Constraints
17
LE Register Controls Clock/Clock Enable
Synchronous & Asynchronous Clear Synchronous & Asynchronous Load & Data Asynchronous Preset Preset Function Loads a ‘1 ALD/PRE ADATA D Q ENA CLRN
18
Normal Mode LUT Chain Input Register Chain Input
Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic
19
Combinatorial Logic Only
LUT Chain Input Register Chain Input Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic
20
Sequential Logic Only LUT Chain Input Register Chain Input
Register Control Signals addnsub cin (2) data1 4-Input LUT Sync Load & Clear Logic data2 D DATA Row, Column & DirectLink Routing data3 data4 Local Routing Register Feedback LUT Chain Output Register Chain Output Note: Functional Diagram Only. Please See Datasheet for more Details. Addnsum & data1 connected via XOR logic
21
Dynamic Arithmetic Mode
LAB Carry-In Register Chain Input Register Control Signals Carry-In Logic Carry-In0 Carry-In1 addnsub data1 Sum Calculator Sync Load & Clear Logic D DATA data2 Row, Column & DirectLink Routing data3 Carry Calculator Local Routing Carry-In0 Carry-Out Logic Carry-In1 Register Chain Output Carry-Out1 Carry-Out0 Note: Functional Diagram Only. Please See Datasheet for more Details.
22
Carry-Select Logic Each Cell Pre-Calculates Sum & Carry-Out for Carry = 1 & Carry = 0 Carry-In Selects which Pre-Calculation Is Used CIN 1 Single LUT A0+B0+1 A0+B0+0 SUMOUT COUT1 COUT0 COUT
23
Carry Chain Details Carry Chains Begin & End in Any LE
1 LAB Carry-In A1 LE1 LE1 Sum1 B1 A2 LE2 LE2 Sum2 B2 A3 LE3 LE3 LE3 Sum3 Carry Chains Begin & End in Any LE 2 Carry Chains Can Exist In Any LAB Carry-Select Generated in LEs 5 & 10 Every LE Not in Critical Timing Path B3 A4 LE4 Sum4 LE4 B4 A5 LE5 Sum5 B5 1 A6 LE6 Sum6 B6 A7 LE7 Sum7 B7 A8 LE8 Sum8 B8 A9 LE9 Sum9 B9 A10 LE10 Sum10 B10 LAB Carry-Out
24
LUT & Register Chains LUT Chain Register Chain
Output of LUT Connects Directly to LUT Below Available Only In Normal Mode Ex. Wide Fan-In Functions Register Chain Output of Register Connects Directly to Register Below (Shift Register) LUT Can Be Used for Unrelated Function Ex. LE Shift Register Both Chains End at LAB Boundary LE1 LUT D Q LE2 LUT D Q LUT Chain Register Chain LEs
25
Stratix Interconnects
Global Signals LE & Register Chains Carry Chains Local Interconnect DirectLink™ MultiTrack Interconnects Row Interconnects Column Interconnects
26
# of Local Lines Depends on Block
Local Interconnect Groups 10 LEs Together Provides Input Signals to Blocks (LABs, Memory, DSP Blocks) Local Interconnect M512 Local Interconnect LAB # of Local Lines Depends on Block
27
DirectLink Allows Blocks to Drive Local Interconnects of Neighboring Blocks in the Same Row Local Interconnect LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 Local Interconnect LE1 LE2 LE3 LE4 LE5 LE6 LE7 LE8 LE10 LE9 Local Interconnect M512
28
DirectLink (cont.) Provides Fast Communication between Neighboring Blocks One LE Has Fast Access to Up to 29 Other LEs in Area Saves Row Resources
29
MultiTrack Interconnect Architecture
Provides Connections between All Device Blocks Series of 3 Types of Continuous Row & Column Interconnects Each Has a Fixed Speed and Length Constant Performance Across Family Members within Given Area Simplifies Block Design Same Routing Resources Available Regardless of Location
30
Row Resources 3 Row Interconnect Lengths R4 R8 R24 R4 160 Lines Wide
4 LABs R4 160 Lines Wide R8 48 Lines Wide R24 24 Lines Wide
31
R4 Routing Line Driving Left R4 Routing Line Driving Right
Row Resources (cont.) Each Block Has Own Row Resource to Drive Right and Left R4 Routing Line Driving Left R4 Routing Line Driving Right : : : : : : : : :
32
Row Resource Details R4 R8 R24 Terminate at M-RAM
Only Connect to Local & R8/C8 Interconnects Faster than 2 R4s R24 Do Not Interface with Blocks Directly Can Cross M-RAM Fastest Resource for Long Connections (Ex. Design Block to Design Block)
33
Column Resources 3 Interconnect Lengths
Features Similar to Row Interconnects Each Block Has Column Resource to Drive Up and Down Interconnects Are Staggered Interconnects Can Drive End-to-End C8 C4 4 LABs
34
Presentation Outline Basic description of Stratix Altera Devices
NIOS II processor architecture How to design a system using NIOS II processor
36
NIOS II Overview [3] Soft IP Core
A soft-core processor is a microprocessor fully described in software, usually in an HDL, which can be synthesized in programmable hardware, such as FPGAs. Reduced Instruction Set Computer (RISC) No pipeline, 5 or 6 stages pipeline configurations Full 32-bit instruction set, data path, and address space 32 general-purpose registers 32 external interrupt sources Access to a variety of on-chip peripherals, and interfaces to off-chip memories and peripherals Software development environment based on the GNU C/C++ tool chain and Eclipse IDE
37
NIOS II Scalability Powerful multiprocessing systems can be built
38
NIOS II Processor Core [3]
How do we build
39
Implementation The functional units of the Nios II architecture form the foundation for the Nios II instruction set. The Nios II architecture describes an instruction set, not a particular hardware implementation. Trade-offs: More or less of a feature - amount of instruction cache memory. Inclusion or exclusion of a feature - the JTAG debug module. Hardware implementation or software emulation - divider
40
Types of Processors
41
Memory Organization What is the name of the technique for accessing peripherals?
42
Cache Performance Memory I-Cache D-Cache Normalised Performance
SDRAM No No 40.2% SDRAM No Yes 55.2% SDRAM Yes No 64.3% SDRAM Yes Yes 96.4% OnChip No No % OnChip No Yes 98.0% OnChip Yes No % OnChip Yes Yes % Memory I-Cache D-Cache Normalised Performance SDRAM No No 40.2% SDRAM No Yes 55.2% SDRAM Yes No 64.3% SDRAM Yes Yes 96.4% OnChip No No % OnChip No Yes 98.0% OnChip Yes No % OnChip Yes Yes % Performance relative to on chip RAM with no Cache running dhry.c modified for unbuffered I/O
43
Tightly Coupled Memory
Fast data buffers Fast sections of code Fast interrupt handler Critical loop Constant access time; guaranteed not to have arbitration delays Up to 4 tightly coupled memories Software Guidelines Software accesses tightly-coupled memory addresses just like any other addresses. Cache operations have no effect when targeting tightly-coupled
44
Pipelining Static branch prediction is implemented using the branch offset direction; a negative offset is predicted as taken a positive offset is predicted as not-taken
46
Presentation Outline Basic description of Stratix Altera Devices
NIOS II processor architecture Review pipelining techniques Review memory access techniques How to design a system using NIOS II processor
48
Hardware Abstraction Layer (HAL) [4]
Isolates the application software from hardware modifications. Applications are device-independent because they abstract information from such systems as: Character mode devices: UART core, JTAG UART core, LCD display controller Flash memory devices Timer devices DMA controller core Ethernet MAC/PHY Controller HAL application program interface (API) is integrated with the ANSI C standard library.
49
Layers of HAL API [4] HAL library generatioin:
SOPC Builder generates a hardware system Nios II IDE generates a custom HAL system library to match the hardware configuration Changes in the hardware configuration automatically propagate to the HAL device driver configuration NIOS II is programmed in C
50
Programming NIOS II Processor [4]
Programming UART Standard Input, Standard Output routines in C #include <stdio.h> #include <string.h> int main (void) { char* msg = “hello world”; FILE* fp; fp = fopen (“/dev/uart1”, “w”); if (fp) fprintf(fp, “%s”,msg); fclose (fp); } return 0;
51
References Altera Corp., Stratix & Stratix II Module 3: Using TriMatrix Memories, 2004 Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004. Altera Corp., Nios II Processor Reference Handbook, 2005. Altera Corp., Nios II Software Developer's Handbook, 2005.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.