ARM Architecture 4 1 2 3 6 4T 5TE 5TEJ Improved ARM/Thumb Interworking CLZ 5TE 4 Jazelle Java bytecode execution 5TEJ Halfword and signed halfword / byte support System mode 1 SA-110 Saturated maths DSP multiply-accumulate instructions ARM9EJ-S ARM926EJ-S 2 SA-1110 ARM7EJ-S ARM1026EJ-S 3 ARM1020E SIMD Instructions Multi-processing V6 Memory architecture (VMSA) Unaligned data support 6 Thumb instruction set 4T XScale Early ARM architectures This slide is aimed at showing the development of the ARM Architecture. The “Stars” mark each relevant Architecture Level. The “Boxes” give examples of ARM products implementing each particular Architecture level. This is not meant to be a complete list of products, what they offer, or a product roadmap. Within each Architecture The “Notes by the Stars” give the major enhancements specified by this particular Architecture over the previous one. Note architectures 1,2,3 have been removed - these are obsolete (the only part which contains arch 3 core is ARM7500FE). ARM1020T was architecture v5T, however we are rapidly transitioning to ARM1020E and 1022E. Jazelle adds Java bytecode execution, which increases Java performance by 5-10x and also reduces power consumption accordingly. 9EJ - Harvard - 200MIPS 7EJ - Von Neumann - 70MIPS Brief notes on V6: SIMD instructions provide greatly increased audio/video codec performance LDREX/STREX instructions improve multi-processing support VMSA (Virtual Memory System Architecture): Complete L1 cache and TCM definition; physically-tagged cache; ASID for improved task-switching SRS and RFE instructions to improve exception handling performance Hardware and instruction set support for mixed-endianness 1136JF-S has integral VFP coprocessor ARM7TDMI ARM9TDMI ARM9E-S ARM720T ARM940T ARM966E-S ARM1136EJ-S
ARM10E Product Roadmap
ARM7TDMI
ARM9TDMI
SA-1110
ETM10C Interface
ARM920T
ARM Thumb - AT91F40816
LPC21xx
Bulverde
Exception Handling and the Vector Table When an exception occurs, the core: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits If core implements ARM Architecture 4T and is currently in Thumb state, then ARM state is entered. Mode field bits Interrupt disable flags if appropriate. Maps in appropriate banked registers Stores the “return address” in LR_<mode> Sets PC to vector address To return, exception handler needs to: Restore CPSR from SPSR_<mode> Restore PC from LR_<mode> Exceptions, in order serviced, are: Reset - supervisor mode Data abort - abort mode External Fast Interrupt Request - FIQ mode (eg DMA) External Interrupt Request - IRQ mode Instruction Prefetch abort - abort mode Software Interrupt (SWI)- supervisor mode (typically used to extend operating system) Undefined instruction - undefined mode Only one memory location for each vector Each vector contains branch to that particular exception handler FIQ vector is last one. This allows its handler to be run sequentially from that address, removing need for branch and its associated delays. Important because speed is essential for FIQ. Interrupt routine’s responsibility to clear interrupt condition. Can return using one instruction See exception handling module for details.
Intel® IXA – The Next Generation Customer Applications Intel® IXA characteristics: Definable: Intel® IXA is Intel’s packet processing architecture focused on our network processors Measurable: Architectural core is the microengine technology + Intel® XScale™ microarchitecture Lasting: Software portability across multiple product generations Enables software portability Intel® IXA portability framework Intel® IXA Network Processor Enables low power, high density processing Intel® XScale™ microarchitecture + Microengine technology Micro- engine Micro- engine Definable: IXA is a packet processing architecture focused on our network processors XScale™ Architecture + Microengine Technology Achievable: Architectural core is the microengine cluster + XScale™ Architecture Intel proprietary, parallel processing, RISC processor cluster at the heart of NPU data plane processing Competitive/valuable as a multiprocessing, multiprocessor, multithreaded architecture Measurable: Intergenerational SW portability Hardware Abstraction Layer (HAL) enables consistency across microengine generations Customer code reuse between generations and across product family Enables high-performance, programmable network processing
IXP2800 Intel® XScale™ Core MEv2 10 11 12 15 14 13 Rbuf Tbuf PCI 9 16 32K IC 32K DC MEv2 10 11 12 15 14 13 Rbuf 64 @ 128B Tbuf Hash 64/48/128 Scratch 16KB QDR SRAM 2 1 RDRAM 3 G A S K E T PCI (64b) 66 MHz IXP2800 16b 18 64b P I 4 or C X Stripe E/D Q 9 16 7 6 5 8 CSRs -Fast_wr -UART -Timers -GPIO -BootROM/SlowPort
Pin and Software Compatible with Intel® IXP2800 XScale™ Core 32K IC 32K DC MEv2 10 11 12 15 14 13 Rbuf 64 @ 128B Tbuf Hash 64/48/128 Scratch Rings 16KB QDR SRAM 2 1 RDRAM 3 G A S K E T PCI (64b) 66 MHz Intel® IXP2850 16b 18 64b P I 4 or C X Stripe/byte align E/D Q 9 16 7 6 5 8 CSRs -Fast_wr -UART -Timers -GPIO -BootROM/SlowPort Crypto 1 Crypto 2 Pin and Software Compatible with Intel® IXP2800 Crypto units sit right on the internal bus for easy, efficient, low latency communication to internal resources
Fabric Interface Chip (FIC) 10Gbps SONET line card Optional TCAM 000 TCAM SAR’ing Classification Metering Policing Initial Congestion Management Ingress Processor D R A M D R A M D R A M RDR Packet Memory QDR SRAM Queues & Tables Control Plane Processor Q D R Q D R Q D R Q D R PCI 64/66 Intel IXP2800 Ingress Processor Fabric Interface Chip (FIC) CDR, DEMUX 10GbE OC-192c SPI I/F 10Gbs 15Gbs CSIX I/F Fabric Flow Ctl CDR, DEMUX 10Gbs 15Gbs Intel IXP2800 Egress Processor Traffic Shaping Flexible Choices diff serve TM 4.1 … Egress Processor 10 GbE WAN / PPP/ ATM/ OTN / SONET/ SDH IXP2800 is used in a half duplex 10 Gbps system configuration with a CSIX switch fabric behind the network processors. The ingress IXP2800 network processor and the egress IXP2800 network processor are the exact same silicon with different software running on them. Typical features/services are highlighted on the right side of the slide. OPG is developing the Calypso solution and will make it available prior to IXP2800 sampling. For more information on Calypso please contact Ed Pullin (PME) QDR SRAM Queues & Tables Q D R Q D R Q D R Q D R D R A M D R A M D R A M RDR Packet Memory
10Port 1Gbps Ethernet line card Optional 000 TCAM TCAM SAR’ing Classification Metering Policing Initial Congestion Management Ingress Processor D R A M D R A M D R A M RDR Packet Memory QDR SRAM Queues & Tables Control Plane Processor Q D R Q D R Q D R Q D R PCI 64/66 Intel IXP2800 Ingress Processor Fabric Interface Chip (FIC) 10x1GbE SPI I/F 10Gbs 15Gbs CSIX I/F Fabric Flow Ctl 10Gbs 15Gbs Intel IXP2800 Egress Processor Traffic Shaping Flexible Choices diff serve TM 4.1 … Egress Processor 10 x 1 GbE LAN Same as the previous slide with the difference being the substitution of Ben Nevis for Calypso. This configuration is excellent for Metropolitan Area Networks aggregation switches/routers. Ben Nevis is being developed by Intel’s OPG. For more details regarding Ben Nevis please see Ron Thornburg. QDR SRAM Queues & Tables Q D R Q D R Q D R Q D R D R A M D R A M D R A M RDR Packet Memory
10Gbs Ethernet to SONET card Optional TCAM 000 TCAM D R A M D R A M D R A M RDR Packet Memory QDR SRAM Queues & Tables Control Plane Processor Q D R Q D R Q D R Q D R PCI 64/66 Intel IXP2800 Ingress Processor Server or Disk Farms Metro Or WAN 10GbE 10x1Gb SPI I/F 10Gbs 10Gbs SPI I/F OC-192 4xOC48 Flow Ctl 10Gbs 10Gbs Intel IXP2800 Egress Processor A POS framer has been substituted for the CSIX switch fabric. This configration is ideal for edge applications connecting Ethernet and Sonet networks. QDR SRAM Queues & Tables Q D R Q D R Q D R Q D R D R A M D R A M D R A M RDR Packet Memory