Download presentation
Presentation is loading. Please wait.
Published bySebastian Bednarek Modified over 5 years ago
1
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the basic components of the processor architecture.
2
ARM920T Processor The ARM920T processor is a member of the ARM9TDMI family of general purpose microprocessors Includes the ARM9TDMI core plus cache and MMU. ARM9TDMI processor: Harvard architecture Increases available memory bandwith Simultaneous access to instruction and data memory can be achieved 5-stage pipeline 32-bit ARM instruction set and 16-bit THUMB instruction set. The ARM920T is a member of the Advanced RISC Machines, ARM9TDMI family of general purpose 32-bit microprocessors, which includes the ARM9TDMI core plus cache and MMU. The ARM9TDMI processor architecture is a Harvard load/store architectury implementing a five stage pipeline. The ARM9TDMI processor within the ARM920T executes both the 32-bit ARM and 16-bit Thumb instruction stes.
3
ARM920T Processor ARM7TDMI ARM9TDMI FETCH DECODE EXECUTE FETCH DECODE
Instruction Fetch ThumbARM decompress ARM decode Reg Read Shift ALU Reg Write Reg Select FETCH DECODE EXECUTE The ARM9TDMI processor is a Harvard architecture device, implementing a five stage pipeline consisting of fetch, decode, execute, memory and write stages. A Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both busses. A Von Neuman architecture (ARM7TDMI) has only one bus used for both data transfers and instructions fetches, they can not be performed at the same time. Instruction Fetch ARM or Thumb Inst Decode Shift + ALU Memory Access Reg Write Reg Decode Reg Read FETCH DECODE EXECUTE MEMORY WRITE
4
ARM920T Processor Coprocessor, ETM9 & AMBA ASB Interfaces
Cached Processor for Platform OS Applications 16K Instruction & Data Caches ARMv4 MMU for: PalmOS, EPOC, Linux & WindowsCE Includes support for Coprocessor and ETM Coprocessor, ETM9 & AMBA ASB Interfaces Control Logic & BIU MMU Write Buffer ARM9TDMI CORE 16K D Cache I Cache The ARM920T cached processor is a member of the ARM9 Thumb family of high performance 32-bit processors. It provides a complete high performance CPU subsystem including: ARM9TDMI CPU, 16KB Instruction and Data caches, Instruction and Data Memory Management Unit, Write Buffer and Embedded Trace Macrocell Interface.
5
ARM920T MMU What is an MMU ? Memory Management Unit MMU consists of
Controls memory access permissions Translates virtual addresses into physical addresses MMU consists of Translation Look-aside Buffer (TLB) Cache of recently used page translations Hardware for page table walks Updates TLB Access control logic If MMU is disabled External address bus will output virtual addresses directly The MMU translates virtual addresses generated by the CPU core into physical addresses to access external memory. It also derives and checks the access permission using a Translation Look-aside Buffer (TLB). The MMU table walking hardware is used to add entries to the TLB. The translation information that comprises both the address translation data and the access permission data, resides in a translation table located in physical memory.
6
Translation and checking
ARM920T MMU Virtual to Physical Address Mapping Virtual Memory Translation and checking mechanism. Physical Memory Process D Translation Tables MMU Process C VRAM Process B RAM When the ARM generates a memory access, the MMU first looks up the virtual address of the access in the TLB. If the TLB does not contain an entry for the virtual address, translation table walk hardware is invoked to retrieve the translation and access permission information from the translation table held in physical memory. Once retrieved, the information is placed in the TLB. I TLB ROM D TLB Process A RAM RAM Manager RAM Protection & Aborts
7
ARM920T MMU Translation Look-aside Buffer acts as a cache of recent VA to PA translations Provides translation and access permission information for most memory accesses For TLB misses, the translation table walking hardware retrieves the information from the translation table in physical memory and the TLB is updated If the TLB is full, a value will be overwritten using a cyclic scheme Translation Tables resides in physical memory Level 1 table is a list of 4096 translations, indexed by bits 31:20 of the virtual address Entries contain a pointer to a 1MB section of physical memory along with attribute information, or … A pointer to the base address of a another table, containing pointers to smaller pages of physical memory
8
Check if TLB contains virtual address Do translation table walk
ARM920T MMU Translation Process Performed by hardware and is transparent to the user Translation tables created by software Virtual address Check if TLB contains virtual address yes no The process of doing a full translation table is known as a translation table walk. It is performed automatically by hardware and has a significant execution time cost. To reduce the average cost of a memory access, the results of translation table walks are cached in one or more structures known as translation look-aside buffers. Get physical address Do translation table walk Get physical address Update TLB
9
ARM920T MMU Two Translation Lookaside Buffers (TLBs)
64-entry Instruction TLB 64-entry Data TLB Two-level Hardware table walking : Address translation Access control logic with permissions Highly flexible mapping scheme - supports: 1MB sections (with permissions) 64kB large pages (permissions for each subpage of a page) 4kB small pages (permissions for each subpage of a page) 1kB tiny pages (with permissions) The ARM920T MMU implements two translation lokkaside buffers of 64-entry. The MMU is controlled from a single set of two level page tables stored in the main memory, providing a single address and translation protection scheme. The MMU supports memory accesses based on sections or pages: sections of 1MB blocks of memory and three different page sizes are supported; tiny pages: 1KB, small pages: 4KB and large pages: 64KB.
10
ARM920T MMU MMU usage in Linux
Allows Memory mapping (physical to virtual address) Allows memory allocation Allows to safely run several processes, each one has a protected memory area. Provides protection against direct access to a peripheral’s physical address
11
ARM920T Caches What is a cache ?
Small fast memory, local to the processor Holds copies of recently accessed memory locations Relies on memory re-use Only improves performance for slow memory or narrow memory Reduces bus bandwidth requirements Reduces power consumption A cache is a block of high speed memory locations whose addresses can be change, and whose purpose is to increase the average speed of a memory access. External Memory CPU Cache Bus Interface Address Data
12
ARM920T Caches 16KB instruction and data caches
512 lines of 8 words arranged as a 64-way set-associative cache MMU must be enabled to enable Dcache 8 words 8 TAG = MVA [ 31 : 8 ] 8 words 7 8 words 6 8 words 5 8 words 4 8 words 3 8 words 2 TAG words 1 The ARM920T caches have 512 lines of 32 bytes (8 words) arranged as a 64-way set associative cache. The caches are organized as eight segments, each containing 64 lines, and each line containing eight words. Bits [31..8] of the MVA of each cache line are called the TAG. The MVA TAG is stored in the cache, along with the 8 words of data. 64 lines
13
ARM920T Caches Cache hit and miss Replacement algorithm
Cache hit, if region is cachable, data are returned from the cache Cache miss, an eight-word linefill is performed replacing another entry Replacement algorithm Random by default Round-Robin : entries of each segment are replaced sequentially. More efficient. Caches operate at processor speed Max processor speed is 180 Mhz Max AMBA ASB speed is 60 Mhz A memory access which can be processed at high speed because the data it addresses is already in the cache is known as a cahe hit. Other memory accesses are called cache misses. Random replacement is selected at reset. Round robin replacement means that entries are replaced sequentially in each segment.
14
ARM920T Caches 16-word (64Bytes) write buffer Lockdown features
Lockdown instruction and data caches independently with a granularity of 1/64 th of cache Must lockdown the associated TLB entry in the TLB to avoid page table walks during accesses to the locked data or instruction Provide optimum and predictable execution time The ARM920T features a 16-word write buffer, a block of high speed memory whose purpose is to optimize stores to main memory. Part of the cache may be locked down to avoid eviction with granularity of 1/64th of the cache. This might be needed to provide guaranteed real time performance.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.