Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 5: Programmable Components in SoC I

Similar presentations


Presentation on theme: "Module 5: Programmable Components in SoC I"— Presentation transcript:

1 Module 5: Programmable Components in SoC I
이 찬 호 (숭실대학교, 정보통신전자공학부)

2 목차 Introduction Processor architecture RISC machine
About the ARM architecture Architecture versions Performance comparison Processor architecture Processor modes Registers Instruction format About Thumb instructions Memory model Copyrightⓒ2003

3 목차 Organization Processor cores 3-stage pipeline organization
Multiplier Processor cores Architecture evolutions ARM7 Thumb family StrongARM ARM9 family ARM9E family ARM10 family ARM11 family X-Scale Copyrightⓒ2003

4 목차 ARM development environment IP solutions ARM Applications
Real-time debug and trace On-chip debug technology RealView development tools IP solutions AMBA PrimeCell peripherals ARM Applications Network microcontroller The Psion Series 5MX GSM system OneC VWS22100 GSM chip Copyrightⓒ2003

5 목차 Introduction Processor architecture Organization Processor cores
RISC machine About the ARM architecture Architecture versions Performance comparison Processor architecture Organization Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003

6 1. Introduction 1.1 RISC machine RISC architecture [1]
1/3 RISC architecture [1] Fixed instruction size (e.g., 32bit) Load-store architecture Operands must be located in registers The operation result is put into register Large register file Simple addressing modes RISC organization Hard-wired instruction decoding logic Pipelined execution Single-cycle execution Copyrightⓒ2003

7 1.1 RISC machine Advantage Disadvantage 2/3 Simple hardware
Small die size Low power consumption Simple decoding Higher performance Easy to implement an effective pipelined structure Disadvantage Poor code density RISC has a fixed size of instruction format Small number of instructions Copyrightⓒ2003

8 1.1 RISC machine Summary of 80386 and MIPS R2000 architectures [17]
3/3 Summary of and MIPS R2000 architectures [17] MIPS R2000 Intel 80386 Date announced 1986 1985 Instruction size (bits) 32 Variable Address space (size, model) 32 bits, flat 32 bits, segmented with paging support Data alignment Aligned No Data addressing modes 2 11 Protection Page Segmented Scheme Integer registers (number, model, size) 31 GPR*32 bits 8 GPR*32 bits, 6 segment registers*16 bits, 2 other * 16 bits Separate floating-point registers 16*32 or 16*64 bits 8*80 bits Floating-point format IEEE 754 single, double IEEE 754 single, double, extended Copyrightⓒ2003

9 1.2 About the ARM architecture
RISC + additional features Occupies almost 75% of 32bit embedded RISC microprocessor market Additional features of ARM Auto-increment/decrement addressing modes Single data-processing instruction can perform both ALU and shifter operations Load/Store multiple instruction Conditional execution Copyrightⓒ2003

10 1.3 Architecture versions [3]
1/3 Copyrightⓒ2003

11 1.3 Architecture versions
2/3 v4 The oldest version of the architecture supported today 32bit address space T variant: 16 bit Thumb instruction set M variant: long multiply(64bit result) v5 Improvement of ARM/Thumb inter-working CLZ instruction E variant: Enhanced DSP instruction set J variant: acceleration of Java byte-code execution v6 Improvement of the memory system Support of Single Instruction Multiple Data (SIMD) Copyrightⓒ2003

12 1.3 Architecture versions
3/3 Architecture variants T: 16-bit Thumb instruction D: On-chip Debug support M: Hardware long Multiplier I: Embedded ICE E: DSP extension S: Synthesizable core J: Jazelle Java accelerator ~20: with cache and MMU ~40: with cache, protection unit rather than MMU ~22: smaller cache than ~20 Copyrightⓒ2003

13 1.4 Performance comparison
Copyrightⓒ2003

14 목차 Introduction Processor architecture Organization Processor cores
Processor modes Registers Instruction format About Thumb instructions Memory model Organization Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003

15 2. Processor Architecture [2]
2.1 Processor modes Mode reg. CPSR[4:0] Use User usr 10000 Normal program execution mode with restricted system resources FIQ fiq 10001 Processing fast interrupts IRQ irq 10010 Processing general-purpose interrupts Supervisor svc 10011 Processing software interrupts Abort abt 10111 Processing memory faults Undefined und 11011 Handling undefined instruction traps System sys =usr 11111 Running privileged OS tasks (ARM architecture v4 and above) Copyrightⓒ2003

16 2.2 Registers 1/3 Copyrightⓒ2003

17 General-purpose registers (GPR)
2/3 Visible registers 31 general-purpose registers, 6 program status registers At any time, 16 general-purpose registers and one or two status registers are visible according to processor mode General-purpose registers (GPR) Unbanked registers, R0-R7, R15 The same physical registers in all processor modes Banked registers, R8-R14 The physical register referred to by each of them depends on the current processor mode Special function of R13-15 Stack pointer (R13) Link register (R14): save the return address Program counter (R15): point to address of instruction to be fetched Copyrightⓒ2003

18 Program status registers (PSR)
3/3 Program status registers (PSR) CPSR (Current PSR) SPSR (Saved PSR) Each exception mode has a SPSR To preserve the value of the CPSR when the exception occurs Copyrightⓒ2003

19 2.3 Instruction format ADDEQS Rd, Rn, Rm, LSL #2 3 address format
32 28 27 26 25 24 21 20 19 16 15 12 11 7 6 5 4 3 cond 00 # opcode S Rn Rd #shift Sh Rm Condition evaluation load enable flags Register Bank Rd 3 address format Conditional execution Specification of flag-update a shifted operand Update flags if(S==1) Rn Rm Shifter #shift, sh ALU opcode Copyrightⓒ2003

20 2.4 About Thumb instructions
1/3 Thumb instruction set Re-encoded subset of the most commonly used ARM instruction set 16 bit format: to allow better code density 32-bit performance at 8/16-bit system cost At least, few 32bit ARM codes are needed Exception → the processor switch to ARM state: PSR-manipulating instructions can be called only in ARM state Thumb state T in CPSR == 1 Thumb entry By executing BX instruction Copyrightⓒ2003

21 2.4 About Thumb instructions
2/3 Registers Visible GPR Lo registers(r0-r7) Special purpose registers Some thumb IR access Program counter(r15) Link register(r14) Stack pointer(r13) Restricted register access A few instructions allow the ‘High’ registers(r8~r15) to be specified Copyrightⓒ2003

22 2.4 About Thumb instructions
3/3 Thumb-ARM similarities Load-store architecture Support 8bit byte, 16bit half-word, 32bit word Half-words are aligned on 2byte boundary Words are aligned on 4byte boundary A 32bit unsegmented memory Thumb-ARM differences All Thumb instructions except branch are executed unconditionally 2-address format Lesser addressing modes than ARM Copyrightⓒ2003

23 2.4.1 Thumb implementation [1]
1/2 Implementation into a 3-stage pipeline The 5-stage pipeline implementations are trickier. Copyrightⓒ2003

24 2.4.1 Thumb implementation Instruction mapping 2/2
Thumb code: ADD|SUB Rd, #<imm8> Equivalent ARM code: ADDS Rd, Rd, #<imm8> Copyrightⓒ2003

25 목차 Introduction Processor architecture Organization Processor cores
3-stage pipeline organization 5-stage pipeline organization Multiplier Processor cores ARM development environment IP solutions ARM applications Copyrightⓒ2003

26 3. Organization 3.1 3-stage pipeline 1/3 Organization
Address generating block Address register Incrementer Address selector Register bank 31-GPRs, 6-PSRs 2 read, 1 write ports Additional 1 read, 1 write port for PC Barrel shifter ALU IO registers Instruction pipeline Read data register Byte replicator Control logic External interface Instruction decoder Datapath control Copyrightⓒ2003

27 Pipeline stages 2/3 Fetch Decode Execute Instruction fetch from memory
Instruction decoding Datapath control signals for the next cycle Execute Reading registers Shift and ALU operations Writing back to the register bank DP F D E PC+i PC+2i Copyrightⓒ2003

28 3/3 Branch LDR Copyrightⓒ2003 B F D E1 E2 E3 PC+i PC+2i T E T+i LDR F
discarded PC+2i T E T+i LDR F D E1 Calc E2 xfer E3 move E Copyrightⓒ2003

29 3.1.1 Multiple load/store instruction
LDM LDM F D A1 A2 A3 An L1 L2 Ln-1 Ln M1 Mn-2 Mn-1 Mn E Copyrightⓒ2003

30 3.2 5-stage pipeline organization
1/4 To increase performance [1] Increase of the clock rate Simplifying each pipeline stage Increasing the number of pipeline stages Reduction of the average number of clock cycles per instruction (CPI) To prevent von Neumann’s bottleneck Exploiting Harvard architecture Copyrightⓒ2003

31 Organization 2/4 Harvard architecture Register bank
Separated cache Register bank 3 read, 2 write ports Additional address incrementer for multiple load/store Forwarding paths to resolve data dependencies Copyrightⓒ2003

32 Pipeline comparison Interlock [6] PC behavior [1] 3/4
The ADD instruction cannot start until the data is returned from the load The ADD instruction has to delay entering the execute stage of the pipeline by one cycle PC behavior [1] The 5-stage pipeline emulate the behavior of the 3-stage designs 3/4 LDR rN, [..] ; load rN from somewhere ADD r2, r1, rN ; and use it immediately Copyrightⓒ2003

33 4/4 LDR Branch ADD B LDR Separated cache Instruction and data cache
F D E M W B E1 E2 E3 ADD F D E M W LDR Separated cache Instruction and data cache are accessible at the same time Copyrightⓒ2003

34 3.3 Multiplier [1] Low-cost multiplication hardware 1/2
32-bit results for multiply and multiply-accumulate Recently not used Shift and add: the barrel shifter and ALU to generate a 2-bit product in each cycle → 16 cycles in worst case Early termination logic Employ modified booth’s algorithm (radix-4) Copyrightⓒ2003

35 High-performance multiplication
2/2 High-performance multiplication 64-bit results for multiply and multiply-accumulate Employ 32x8 multiplier 4 layers of carry-save adder array, each handling two multiplier bits Multiply eight bits per cycle 4 cycles in worst case Early termination logic Copyrightⓒ2003

36 목차 Introduction Processor architecture Organization Processor cores
Architecture evolutions ARM7 Thumb family ARM9 family ARM9E family X-Scale ARM development environment IP solutions ARM applications Copyrightⓒ2003

37 4. Processor Cores 4.1 Architecture evolutions Copyrightⓒ2003

38 4.2 ARM7 Thumb family [7] ARM7 Thumb family(v4T) 1/4
Low-power, 32bit RISC cores optimized for cost and power-sensitive applications 3 stage pipeline Unified bus interface Copyrightⓒ2003

39 ARM7TDMI [1] 2/4 Base integer core (Hard macro cell)
a 3 volt compatible rework of the ARM6 32-bit integer core Low power, fully static design 3-stage pipeline Unified bus interface The Thumb 16bit compressed instruction set On-chip Debug support Interface for direct connection to Embedded Trace Macrocell JTAG interface unit Enhanced Multiplier with yielding a full 64 bit result Embedded-ICE hardware to give on-chip breakpoint and watchpoint support Copyrightⓒ2003

40 ARM7TDMI-S ARM720T macrocell 3/4
A synthesizable version of the ARM7TDMI Delivered as a high-level language module The core can be synthesized with reduced functionality ARM720T macrocell High-performance processor for systems requiring full virtual memory management and protected execution spaces. Additional features 8K unified cache Memory Management Unit Write buffer AMBA AHB bus interface ARM7200T는 ARM7TDMI core에 CP15, 즉 memory management unit과 cache를 컨트롤 하는 coprocessor가 포함된 core Copyrightⓒ2003

41 ARM7EJ-S Enhanced core Performance 4/4 ARM v5TEJ Jazelle technology
hardware acceleration in the execution of Java byte-code DSP extensions 16bit data operations Saturating, signed arithmetic Enhanced MAC operations Performance 4/4 Copyrightⓒ2003

42 4.4 ARM9 family [8] 1/4 ARM9 family(v4T) Copyrightⓒ2003

43 2/4 ARM9 family (v4T) Very high-performance, low power optimized 32-bit RISC cores for wide variety of cost and power-sensitive applications ARM and Thumb instruction sets 5-stage pipeline Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13mm process Single 32-bit AMBA interconnect interface MMU supporting virtual memory system Harvard architecture 8-entry Write buffer Copyrightⓒ2003

44 ARM920T and ARM922T macrocell
3/4 ARM920T and ARM922T macrocell To support platform OS such as Linux 16k I-cache and 16k D-cache (ARM920T) or 8k I-cache and 8k D-cache (ARM922T) MMU AMBA bus interface Embedded Trace Macrocell ARM940T Applications such as DSL modem chipset 4k I-cache and 4k D-cache Protection unit rather than MMU Copyrightⓒ2003

45 4/4 Performance Copyrightⓒ2003

46 4.5 ARM9E family [10] 1/4 Copyrightⓒ2003

47 2/4 ARM 9E family (v5TE) Single core solutions for microcontroller, DSP and Java applications Synthesizable soft IP 5-stage integer pipeline Harvard architecture ARM, Thumb and DSP instruction sets ARM Jazelle technology for Java acceleration (ARM926EJ-S) Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13µm process Integrated real-time trace and debug support Optional VFP9 coprocessor for floating-point operation High-performance AHB system Memory management unit 16-entry write buffer Copyrightⓒ2003

48 3/4 The DSP extensions Single cycle 16x16 and 32x16 MAC (multiply-accumulate) operation Enhanced saturation arithmetic behavior and performance Tightly Coupled Memory TCMs are intended for storing real-time code and data Access to TCMs are deterministic and do not incur access penalties Cache preloads instructions Copyrightⓒ2003

49 4.8 XScale 1/2 Intel ARM v5TE architecture
Intel superpipelined RISC Technology 7-stage interger pipeline MAC pipeline with early terminateion 8-stage memoy pipeline Branch target buffer (BTB) Seperated cache & MMU 32k I-cache, 32k D-cache Multiply-Accumulate Coprocessor provides 40-bit accumulation of 16x16, dual 16x16(SIMD), 16x32 signed multiplies Copyrightⓒ2003

50 4.8 XScale Clock and Power management Performance monitoring unit 2/2
supports dynamic clock and voltage scaling Performance monitoring unit two 32-bit event and one 32-bit clock counter Copyrightⓒ2003

51 목차 Introduction Processor architecture Organization Processor cores
ARM development environment IP solutions ARM applications Copyrightⓒ2003

52 5 ARM development environment
ARM Developer Suite (ADS) Integrated Development Environment (IDE) Codewarrior IDE: edit, compilation, … AXD debugger: GUI debug environment ARMulator (Software emulator) Debug Hardware Multi-ICE JTAG-based In-Circuit Emulator Controls EmbeddedICE-RE and ETM logic MultiTrace Traces port analyzer unit passively Collects information from ETM Copyrightⓒ2003

53 목차 Introduction Processor architecture Organization Processor cores
ARM development environment IP solutions AMBA PrimeCell Peripherals ARM applications Copyrightⓒ2003

54 6. IP Solutions [14] 6.1 AMBA The de facto Standard for On-Chip Bus
AMBA is an open standard on-chip bus specification The Advanced High-performance Bus (AHB) Connect high-performance system modules Single clock edge Support burst and split transactions Centrally multiplexed bus scheme AHB-Lite A subset of full AHB specification Single bus master is used Multi-layer AHB Multiple bus masters The Advanced Peripheral Bus (APB) Simpler bus protocol designed for peripherals Connection to the system bus via a bridge Copyrightⓒ2003

55 6.2 PrimeCell Peripherals [14]
Re-usable soft IP macrocells developed to enable the rapid assembly of SoC designs Ready to use, fully verified and compliant with the AMBA on chip bus standard Fully packaged, ready to use soft IP macrocells Rapid and easy integration into AMBA-based SoC designs Royalty-free license for single or multiple use Supplied in VHDL and Verilog HDL with synthesis scripts Software device drivers are included as source code Copyrightⓒ2003

56 목차 Introduction Processor architecture Organization Processor cores
ARM development environment IP solutions ARM applications The Psion Series 5MX Copyrightⓒ2003

57 7.1 The Psion Series 5MX 1/2 Copyrightⓒ2003

58 7.2 The Psion Series 5MX 2/2 ARM7100 Copyrightⓒ2003

59 Summary The Advanced RISC machine Thumb instruction set
Enhanced RISC architecture Simple hardware but effective instruction sets Thumb instruction set High-density code on ARM cores ARM offers a wide range of processor cores ARM Ltd., provides designers with fully integrated development environment ARM cores are widely used in embedded markets Copyrightⓒ2003

60 References [1] steve furber, "ARM system-on-chip architecture 2nd. ed.", Addison wesley, 2000 [2] "ARM Architecture Reference Manual", ARM Ltd., June 2000 [3] "ARM Architecture Version 6 (v6) White Paper", ARM Ltd., January 2002 [4] "Improving ARM code density and performance", ARM Ltd., June 2003 [5] Application Note 04 "Programmer's Model for Big-Endian ARM", ARM Ltd., December 1994 [6] "ARM9TDMI Rev3 Technical Reference Manual", March 2000 [7] "ARM7 Family Flyer", ARM Ltd. [8] "ARM9 Family Flyer", ARM Ltd. [9] "ARM9E Family Flyer", ARM Ltd. [10] "ARM10E Family Flyer", ARM Ltd. [11] "White paper - The ARM11 Microarchitecture", ARM Ltd., April 2002 Copyrightⓒ2003

61 References [12] "Intel XScale Microarchitecture Technical Summary", Intel Co., 2000 [13] "ARM debugging techniques for embedded systems using real-time software trace", ARM Ltd. 2002 [14] "ARM Product Backgrounder", ARM Ltd., November 2003 [15] "Samsung communication MCU S3C4510", Samsung Electronics Co., Ltd. [16] "Sceptre HPE EDGE/GPRS/GSM High performance solution", Agere Systems Inc., November 2003 [17] Comparison between CISC and RISC, Yi Gao, Shilang Tang, Zhongli Ding, University of Maryland [18] ARM application note 29, "Interfacing a memory system to the ARM7TDMI without using AMBA", ARM Ltd., December 1995 [19] "Profile guided selection of ARM and Thumb instructions", Arvind Krishnaswamy, Rajiv Gupta, The University of Arizona Copyrightⓒ2003


Download ppt "Module 5: Programmable Components in SoC I"

Similar presentations


Ads by Google