ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.

Slides:



Advertisements
Similar presentations
Memory Management: Overlays and Virtual Memory
Advertisements

Memory Management Unit
Memory Management Unit
The ARM7TDMI Hardware Architecture
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Vacuum tubes Transistor 1948 ICs 1960s Microprocessors 1970s.
Memory Management 2010.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
Virtual Memory I Chapter 8.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Virtual Memory By: Dinouje Fahih. Definition of Virtual Memory Virtual memory is a concept that, allows a computer and its operating system, to use a.
ARM Processor Architecture
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Architecture Lecture 28 Fasih ur Rehman.
Introduction of Intel Processors
IT253: Computer Organization
Presented By: Rodney Fluharty Dec. 07, Who is ARM? Advanced Risc Microprocessor is the industry's leading provider of 16/32-bit embedded RISC microprocessor.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Introduction to Virtual Memory and Memory Management
80386DX functional Block Diagram PIN Description Register set Flags Physical address space Data types.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
EFLAG Register of The The only new flag bit is the AC alignment check, used to indicate that the microprocessor has accessed a word at an odd.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
ARM 7 & ARM 9 MICROCONTROLLERS AT91 1 ARM920T Processor.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
CMSC 611: Advanced Computer Architecture
Nios II Processor: Memory Organization and Access
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
Virtual Memory Chapter 7.4.
Cache Memory.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
From Address Translation to Demand Paging
Section 9: Virtual Memory (VM)
From Address Translation to Demand Paging
Today How was the midterm review? Lab4 due today.
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Address Translation Mechanism of 80386
Architecture Background
CS-301 Introduction to Computing Lecture 17
CSE 153 Design of Operating Systems Winter 2018
Virtual Memory Chapter 8.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CMSC 611: Advanced Computer Architecture
MICROPROCESSOR MEMORY ORGANIZATION
Virtual Memory فصل هشتم.
Overheads for Computers as Components 2nd ed.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
TLB Performance Seung Ki Lee.
Virtual Memory Overcoming main memory size limitation
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
ARM Introduction.
CS703 - Advanced Operating Systems
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Virtual Memory.
Overview Problem Solution CPU vs Memory performance imbalance
Virtual Memory 1 1.
Advanced Computer Architecture Lecture 19
Presentation transcript:

ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the basic components of the processor architecture.

ARM920T Processor The ARM920T processor is a member of the ARM9TDMI family of general purpose microprocessors Includes the ARM9TDMI core plus cache and MMU. ARM9TDMI processor: Harvard architecture Increases available memory bandwith Simultaneous access to instruction and data memory can be achieved 5-stage pipeline 32-bit ARM instruction set and 16-bit THUMB instruction set. The ARM920T is a member of the Advanced RISC Machines, ARM9TDMI family of general purpose 32-bit microprocessors, which includes the ARM9TDMI core plus cache and MMU. The ARM9TDMI processor architecture is a Harvard load/store architectury implementing a five stage pipeline. The ARM9TDMI processor within the ARM920T executes both the 32-bit ARM and 16-bit Thumb instruction stes.

ARM920T Processor ARM7TDMI ARM9TDMI FETCH DECODE EXECUTE FETCH DECODE Instruction Fetch ThumbARM decompress ARM decode Reg Read Shift ALU Reg Write Reg Select FETCH DECODE EXECUTE The ARM9TDMI processor is a Harvard architecture device, implementing a five stage pipeline consisting of fetch, decode, execute, memory and write stages. A Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both busses. A Von Neuman architecture (ARM7TDMI) has only one bus used for both data transfers and instructions fetches, they can not be performed at the same time. Instruction Fetch ARM or Thumb Inst Decode Shift + ALU Memory Access Reg Write Reg Decode Reg Read FETCH DECODE EXECUTE MEMORY WRITE

ARM920T Processor Coprocessor, ETM9 & AMBA ASB Interfaces Cached Processor for Platform OS Applications 16K Instruction & Data Caches ARMv4 MMU for: PalmOS, EPOC, Linux & WindowsCE Includes support for Coprocessor and ETM Coprocessor, ETM9 & AMBA ASB Interfaces Control Logic & BIU MMU Write Buffer ARM9TDMI CORE 16K D Cache I Cache The ARM920T cached processor is a member of the ARM9 Thumb family of high performance 32-bit processors. It provides a complete high performance CPU subsystem including: ARM9TDMI CPU, 16KB Instruction and Data caches, Instruction and Data Memory Management Unit, Write Buffer and Embedded Trace Macrocell Interface.

ARM920T MMU What is an MMU ? Memory Management Unit MMU consists of Controls memory access permissions Translates virtual addresses into physical addresses MMU consists of Translation Look-aside Buffer (TLB) Cache of recently used page translations Hardware for page table walks Updates TLB Access control logic If MMU is disabled External address bus will output virtual addresses directly The MMU translates virtual addresses generated by the CPU core into physical addresses to access external memory. It also derives and checks the access permission using a Translation Look-aside Buffer (TLB). The MMU table walking hardware is used to add entries to the TLB. The translation information that comprises both the address translation data and the access permission data, resides in a translation table located in physical memory.

Translation and checking ARM920T MMU Virtual to Physical Address Mapping Virtual Memory Translation and checking mechanism. Physical Memory Process D Translation Tables MMU Process C VRAM Process B RAM When the ARM generates a memory access, the MMU first looks up the virtual address of the access in the TLB. If the TLB does not contain an entry for the virtual address, translation table walk hardware is invoked to retrieve the translation and access permission information from the translation table held in physical memory. Once retrieved, the information is placed in the TLB. I TLB ROM D TLB Process A RAM RAM Manager RAM Protection & Aborts

ARM920T MMU Translation Look-aside Buffer acts as a cache of recent VA to PA translations Provides translation and access permission information for most memory accesses For TLB misses, the translation table walking hardware retrieves the information from the translation table in physical memory and the TLB is updated If the TLB is full, a value will be overwritten using a cyclic scheme Translation Tables resides in physical memory Level 1 table is a list of 4096 translations, indexed by bits 31:20 of the virtual address Entries contain a pointer to a 1MB section of physical memory along with attribute information, or … A pointer to the base address of a another table, containing pointers to smaller pages of physical memory

Check if TLB contains virtual address Do translation table walk ARM920T MMU Translation Process Performed by hardware and is transparent to the user Translation tables created by software Virtual address Check if TLB contains virtual address yes no The process of doing a full translation table is known as a translation table walk. It is performed automatically by hardware and has a significant execution time cost. To reduce the average cost of a memory access, the results of translation table walks are cached in one or more structures known as translation look-aside buffers. Get physical address Do translation table walk Get physical address Update TLB

ARM920T MMU Two Translation Lookaside Buffers (TLBs) 64-entry Instruction TLB 64-entry Data TLB Two-level Hardware table walking : Address translation Access control logic with permissions Highly flexible mapping scheme - supports: 1MB sections (with permissions) 64kB large pages (permissions for each subpage of a page) 4kB small pages (permissions for each subpage of a page) 1kB tiny pages (with permissions) The ARM920T MMU implements two translation lokkaside buffers of 64-entry. The MMU is controlled from a single set of two level page tables stored in the main memory, providing a single address and translation protection scheme. The MMU supports memory accesses based on sections or pages: sections of 1MB blocks of memory and three different page sizes are supported; tiny pages: 1KB, small pages: 4KB and large pages: 64KB.

ARM920T MMU MMU usage in Linux Allows Memory mapping (physical to virtual address) Allows memory allocation Allows to safely run several processes, each one has a protected memory area. Provides protection against direct access to a peripheral’s physical address

ARM920T Caches What is a cache ? Small fast memory, local to the processor Holds copies of recently accessed memory locations Relies on memory re-use Only improves performance for slow memory or narrow memory Reduces bus bandwidth requirements Reduces power consumption A cache is a block of high speed memory locations whose addresses can be change, and whose purpose is to increase the average speed of a memory access. External Memory CPU Cache Bus Interface Address Data

ARM920T Caches 16KB instruction and data caches 512 lines of 8 words arranged as a 64-way set-associative cache MMU must be enabled to enable Dcache 8 words 8 TAG = MVA [ 31 : 8 ] 8 words 7 8 words 6 8 words 5 8 words 4 8 words 3 8 words 2 TAG 8 words 1 The ARM920T caches have 512 lines of 32 bytes (8 words) arranged as a 64-way set associative cache. The caches are organized as eight segments, each containing 64 lines, and each line containing eight words. Bits [31..8] of the MVA of each cache line are called the TAG. The MVA TAG is stored in the cache, along with the 8 words of data. 64 lines

ARM920T Caches Cache hit and miss Replacement algorithm Cache hit, if region is cachable, data are returned from the cache Cache miss, an eight-word linefill is performed replacing another entry Replacement algorithm Random by default Round-Robin : entries of each segment are replaced sequentially. More efficient. Caches operate at processor speed Max processor speed is 180 Mhz Max AMBA ASB speed is 60 Mhz A memory access which can be processed at high speed because the data it addresses is already in the cache is known as a cahe hit. Other memory accesses are called cache misses. Random replacement is selected at reset. Round robin replacement means that entries are replaced sequentially in each segment.

ARM920T Caches 16-word (64Bytes) write buffer Lockdown features Lockdown instruction and data caches independently with a granularity of 1/64 th of cache Must lockdown the associated TLB entry in the TLB to avoid page table walks during accesses to the locked data or instruction Provide optimum and predictable execution time The ARM920T features a 16-word write buffer, a block of high speed memory whose purpose is to optimize stores to main memory. Part of the cache may be locked down to avoid eviction with granularity of 1/64th of the cache. This might be needed to provide guaranteed real time performance.