Simon Jackson James Sleeman Pete Hemery
Simon Jackson
Divided into a number of “Zones” Zone_DMA : 0 – 16MB ZONE_NORMAL : 16MB – 896MB ZONE_HIGH : 896MB – 4GB Most Kernel operations may only take place in ZONE_NORMAL Organised into Pages, x86 has 4KB Pages Include/linux/mm_types.h
Each page has a struct page associated with it The kernel maintains one or more arrays of these that track all of the physical memory on the system Functions and macros are defined for translating between struct page pointers and virtual addresses struct page *virt_to_page(void *kaddr); struct page *pfn_to_page(int pfn); void *page_address(struct page *page);
0 – 16MB Used for Direct Memory Access Legacy ISA devices can only access first 16MB of memory and thus the kernel tries to dedicate this area to them
16MB – 896MB AKA Low Memory Normally addressable region for kernel Kernel addresses that map it are called Logical Addresses and have a constant offset from their physical addresses
896MB – 4GB Kernel can only access by mapping into ZONE_NORMAL Results in a virtual address, not logical Kmap first checks to see if page is already in low memory Kmap uses a page table to track mapped memory called pkmap_page_table which is located at PKMAP_BASE and set up during system initialisation
Virtual addresses mapped to physical memory by Page Tables Each process has it’s own page tables Once the MMU is enabled, Virtual Memory applies to all programs, including the kernel Kernel doesn’t necessarily use that much physical memory, it just has that address space available to map physical memory
Kernel space is constantly present and maps the same physical memory in all processes – it is Resident Marked as exclusive to privileged code in page tables, i.e. kernel only Mapping for the user land VM changes whenever a process switch happens
For devices that cannot access full address range, such as 32bit devices on 64bit systems In memory low enough for device to address Copied to desired page in high memory Used as buffer pages for DMA to and from the device Data is copied via the bounce buffer differently depending on whether it is a read or write buffer Buffer can be reclaimed once IO done
In 2.4, the high memory manager was the only subsystem that maintained emergency pools of pages In 2.6, memory pools are implemented as a generic concept where a minimum of memory is needed even when memory is low Two emergency pools are maintained for the express use by bounce buffers
Maintains a three-level architecture independent page table to handle 64 bit addresses Architectures that manage their MMU differently emulate three-level page tables Each process has a pointer to its own Page Global Directory (PGD) which is a physical page Each active PGD entry points to a page containing an array of Page Middle Directory (PMD) entries Each PMD entry points to a page of Page Table Entries (PTE), which in turn point at pages of actual data
Linear addresses may be broken up into parts to yield offsets within these three page table levels and an offset within the actual page Macro definitions on x86
James Sleeman
Slab allocation Buddy Allocation Mempools Look aside buffers
The main motivation for slab allocation is initialising and freeing Kernel data objects can outweigh the cost of allocating them. With slab allocation, memory chunks suitable to fit data objects of certain type or size are preallocated.
Is a fast memory allocation technique that divides memory into power of 2 partitions and attempts to allocate memory on a best fit approach When memory is freed by the user, the buddy block is checked to see if any of its contiguous neighbours have also been freed. If so, the blocks are combined to minimize fragmentation
A memory pool has the type mempool_t, defined in
Kmalloc is a memory allocation function that returns contiguous memory from kernel space. Void *kmalloc(size_t size, int flags) buf = kmalloc(BUF_SIZE, GFP_DMA | GFP_KERNEL); void kfree(const void *ptr) Kfree(buf); and
#define BUF_LEN 2048 void function(void) { char buf[BUF_LEN]; /* Do stuff with buf */ } #define BUF_LEN 2048 void function(void) { char *buf; buf = kmalloc(BUF_LEN, GFP_KERNEL); if (!buf) /* error! */ }
All flags are listed in include/linux./gfp.h Type flags: GFP_ATOMIC GFP_NOIO GFP_NOFS GFP_KERNEL GFP_USER GFP_HIGHUSER GFP_DMA
unsigned long get_zeroed_page(int flags); unsigned long __get_free_page(int flags); unsigned long __get_free_pages(int flags, unsigned long order); unsigned long __get_dma_pages(int flags, unsigned long order);
#include DEFINE_PER_CPU(type, name); get_cpu_var(sockets_in_use)++; put_cpu_var(sockets_in_use); per_cpu(variable, int cpu_id); cpu = get_cpu( ) ptr = per_cpu_ptr(per_cpu_var, cpu);
sudo cat /proc/slabinfo | awk '{printf "%5d MB %s\n", $3*$4/(1024*1024), $1}' | sort –n 0 MB vm_area_struct 1 MB dentry 2 MB ext4_inode_cache 2 MB inode_cache 8 MB buffer_head
Some of the causes of OOM: The kernel is really out of memory, its used more memory than the system has in ram and swap Kernel memory leaks Deadlocks kind of, writing data to disk may require memory allocation OOM KILLER: Linux/mm/oom_kill.coom_kill.c vm enough memory(); out_of_memory();
Thomas Habets had an unfortunate experience recently. His Linux system ran out of memory, and the dreaded "OOM killer" was loosed upon the system's unsuspecting processes. One of its victims turned out to be his screen locking program.
DMA is a feature inside modern microcontrollers that allows other hardware subsystems to access system memory independently of the CPU. Without DMA, large amount of CPU cycles are taken up, and PIO can be tied up for the entire duration of the read or write.
Useful websites: Kmalloc and more: lwn.net/images/pdf/LDD3/ch08.pdf slab-allocator/
Pete Hemery
Programmed I/O (Polling)Simplest method but inefficient Interrupt Driven I/OInterrupt Service Routine in Device Driver How does the CPU know when a device is ready?
Direct Memory AccessBypasses the CPU to get to system memory
A DMA deals with physical addresses, so: Programming a DMA requires retrieving a physical address at some point (virtual addresses are usually used) The memory accessed by the DMA shall be physically contiguous The CPU can access memory through a data cache Using the cache can be more efficient (faster accesses to the cache than the bus) But the DMA does not access the CPU cache, so care needs to be taken for cache coherency (cache content vs. memory content) Either flush or invalidate the cache lines corresponding to the buffer accessed by DMA and processor at strategic times
Need to use contiguous memory in physical space. Can use any memory allocated by kmalloc (up to 128 KB) or __get_free_pages (up to 8MB). Can use block I/O and networking buffers, designed to support DMA. Can not use vmalloc memory (would have to setup DMA on each individual physical page).
Memory caching could interfere with DMA Before DMA to device: Need to make sure that all writes to DMA buffer are committed. After DMA from device: Before drivers read from DMA buffer, need to make sure that memory caches are flushed. Bidirectional DMA Need to flush caches before and after the DMA transfer.
The ARM Cortex™-A8 processor is based on the ARMv7 architecture and has the ability to scale in speed from 600MHz to greater than 1GHz. The Cortex-A8 processor can meet the requirements for power-optimized mobile devices needing operation in less than 300mW; and performance- optimized consumer applications requiring 2000 Dhrystone MIPS. Cortex A8 Netbook
Arbitration “The process by which the parties to a dispute submit their differences to the judgment of an impartial person or group appointed by mutual consent or statutory provision.”
Sitara™ ARM® Microprocessors Welcome to the Sitara™ ARM® Microprocessors Section of the TI E2E Support Community. Ask questions, share knowledge, explore ideas, and help solve problems with fellow engineers. To post a question, click on the forum tab then "New Post". This group contains forums for discussion on Cortex A8 based AM35x, AM37x and AM335x processors and ARM9 based AM1x processors. For faster response please be sure to tag your post.
I am currently working on getting WLAN up and running. It seems that the SDIO driver is broken for libertas_sdio: libertas_sdio: probe of mmc1:0001:1 failed with error -16 A second problem is the USB Host interface. it seems to be completely broken. Hotplugging USB mouse: [ ] drivers/hid/usbhid/hid-core.c: can't reset device, ehci-omap.0-2.3/input0, status -71 Adding a webcam: [ ] Linux video capture interface: v2.00 [ ] gspca: main v2.9.0 registered [ ] gspca: probing 046d:08da [ ] twl_rtc twl_rtc: rtc core: registered twl_rtc as rtc0 [ ] lib80211: common routines for IEEE drivers [ ] lib80211_crypt: registered algorithm 'NULL' [ ] ads7846 spi1.0: touchscreen, irq 274 [ ] input: ADS7846 Touchscreen as /devices/platform/omap2_mcspi.1/spi1.0/input/input1 [ ] cfg80211: Calling CRDA to update world regulatory domain [ ] libertas_sdio: Libertas SDIO driver [ ] libertas_sdio: Copyright Pierre Ossman [ ] zc3xx: probe 2wr ov vga 0x0000 [ ] zc3xx: probe sensor -> 0011 [ ] zc3xx: Find Sensor HV7131R(c) [ ] input: zc3xx as /devices/platform/ehci-omap.0/usb1/1-2/1-2.3/input/input2 [ ] gspca: video0 created [ ] gspca: found int in endpoint: 0x82, buffer_len=8, interval=10 [ ] kernel BUG at arch/arm/mm/dma-mapping.c:409! [ ] Unable to handle kernel NULL pointer dereference at virtual address [ ] libertas_sdio: probe of mmc1:0001:1 failed with error -16 [ ] cfg80211: World regulatory domain updated: [ ] (start_freq - bandwidth), (max_antenna_gain, max_eirp) [ ] ( KHz KHz), (300 mBi, 2000 mBm) [ ] ( KHz KHz), (300 mBi, 2000 mBm) [ ] ( KHz KHz), (300 mBi, 2000 mBm) [ ] ( KHz KHz), (300 mBi, 2000 mBm) [ ] ( KHz KHz), (300 mBi, 2000 mBm) [ ] pgd = cff58000 [ ] [ ] *pgd=8ff36031, *pte= , *ppte= [ ] Internal error: Oops: 817 [#1] PREEMPT [ ] last sysfs file: /sys/devices/platform/ehci-omap.0/usb1/1-2/1-2.3/bcdDevice [ ] Modules linked in: libertas_sdio libertas cfg80211 joydev rfkill ads7846 mailbox_mach lib80211 mailbox rtc_twl gspca_zc3xx(+) rtc_core gspca_main videodev v4l1_compat [ ] CPU: 0 Not tainted ( #1)
This is a case where a thorough knowledge of the hardware is essential to making the software work. DMA is almost impossible to troubleshoot without using a logic analyzer. No matter what mode the transfers will ultimately use, and no matter what the source and destination devices are, I always first write a routine to do a memory to memory DMA transfer. This is much easier to troubleshoot than DMA to a complex I/O port. You can use your ICE to see if the transfer happened (by looking at the destination block), and to see if exactly the right number of bytes were transferred. At some point you'll have to recode to direct the transfer to your device. Hook up a logic analyzer to the DMA signals on the chip to be sure that the addresses and byte count are correct. Check this even if things seem to work - a slight mistake might trash part of your stack or data space. Some high integration CPUs with internal DMA controllers do not produce any sort of cycle that you can flag as being associated with DMA. This drives me nuts - one lousy extra pin would greatly ease debugging. The only way to track these transfers is to trigger the logic analyzer on address ranges associated with the transfer, but unfortunately these ranges may also have non- DMA activity in them. Be aware that DMA will destroy your timing calculations. Bit banging UARTs will not be reliable; carefully crafted timing loops will run slower than expected. In the old days we all counted T- states to figure how long a loop ran, but DMA, prefetchers, cache, and all sorts of modern exoticness makes it almost impossible to calculate real execution time.
linux /drivers/mmc/host/omap_hsmmc.c
Modified Version of omap_hsmmc_start_dma_transfer
M33x_Announcement&HQS=am335x M33x_Announcement&HQS=am335x 65.aspx 65.aspx