Chin-Hsien Wu & Tei-Wei Kuo

An Adaptive Two-Level Management for the Flash Translation Layer in Embedded Systems
Chin-Hsien Wu & Tei-Wei Kuo IEEE/ACM international conference on Computer-aided design ICCAD '06, November 2006 Advisor: Hung Shih-Hao Presenter : Chen Yu-Jen

Outline Characteristics of Flash Memory FTL versus NFTL
Our Approach – AFTL Performance Evaluation Conclusion FTL與NFTL是兩個 translation layer

Flash Memory Characteristics
Media Access Time Read Write Erase DRAM 60ns (2B) 2.56μs (512B) - NOR Flash 150ns (1B) 14.4μs (512B) 211μs (1B) 3.52ms (512B) 1.2s (16KB) NAND Flash 10.2μs (1B) 35.9(512B) 201μs (1B) 226μs (512B) 2ms (16KB) Disk 12.4ms (512B) (average) Media Access Time Read Write Erase DRAM 60ns (2B) 2.56μs (512B) - NOR Flash 150ns (1B) 14.4μs (512B) 211μs (1B) 3.52ms (512B) 1.2s (16KB) NAND Flash 10.2μs (1B) 35.9(512B) 201μs (1B) 226μs (512B) 2ms (16KB) Disk 12.4ms (512B) (average) J. Kim, J. M. Kim, S. H. Noh, S. L. Min, and Y. Cho. A space-efficient flash translation layer for compact-flash systems. IEEE Transactions on Consumer Electronics, 48(2):366–375, May 2002. 100X 15X 400X 50X DRAM DISK READ WRITE平均 FLASH則不同 NAND雖然READ稍慢但是WRITE ERASE則快速許多現在一般都large-scale的flash memory都用NAND 也廣泛應用在 EMBEDDED SYSTEM中

NAND Flash Memory Characteristics
Spare Area 16B 1 Page = 512B + 16B 1 Block = 32 pages User Area 512B Read/Write one page Block 0 1. This is a typical organization of NAND flash memory. 2. NAND flash consists of blocks and each block contains 32 pages. 3. The size of each page is 512B + 16B. 512B is called user area and is for the storage of data. 16B is called spare area and is for the storage of meta-data (e.g., LBA, ECC). 4. Page is an unit for read and write, and written page can not be overwritten unless they are erased. 5. Block is an unit for erasing. Block 1 Block 2 Erase one block Block 3 … …

Flash Memory Characteristics
Write-Once No writing on the same page unless its residing block is erased! => Out-place Update Pages are classified into valid, invalid, and free pages. Bulk-Erasing Pages are erased in a block unit to recycle used but invalid pages. Wear-Leveling Each block has a limited lifetime in erasing counts. Flash memory is write-once, bulk-erasing. And It has a limited lifetime for each block. We shall illustrate the characteristics by example.

Example: Garbage Collection
Flash Memory Characteristics Example: Garbage Collection A live page A dead page A free page This block is to be recycled. (3 live pages and 5 dead pages) L D F 經過一段使用期, free page少了 dead page多了所以需要garbage collection recycle dead page 某個block滿了 => to be recycled

Flash Memory Characteristics Example: Garbage Collection D D D D D D D D Live data are copied to somewhere else. L L D L L L L D L F L L L L D L When the live pages are copied, all pages of the block become dead pages. A live page L L L F L L F D A dead page A free page

Flash Memory Characteristics Example: Garbage Collection F F F F F F F F The block is then erased. Overheads: live data copying block eraseing. L L D L L L L D L F L L L L D L After erasing the block, all pages of the block become free pages. As a result, garbage collection is time-consuming because it can consist of erase, read, and write operations. A live page L L L F L L F D A dead page A free page

Our Approach – AFTL Performance Evaluation Conclusion

System Architecture – FTL
The problem is large memory space requirements. FTL adopts a page-level address translation mechanism. For example, 256MB NAND flash with a page size of 512 bytes needs 524,288 (256*1024*1024/512) entries Assume that an entry needs 4 bytes, the address translation information of FTL requires 2,048KB memory space. FTL needs large memory space for the address translation table But the main problem of FTL is its large memory space requirements for storing the address translation information. For example, 256MB NAND flash with a page size of 512Bytes need 524,288 entries for a address translation table. Assume that an entry needs 4 bytes, the address translation information of FTL requires 2,048KB memory space.

System Architecture – NFTL
LBA=1011 VBA = 1011 / 8 = 126 block offset = 1011 % 8 = 3 A logical address is divided into a virtual block address (VBA) a block offset. NFTL Address Translation Table (in main-memory) A Primary Block Address = 9 A Replacement Block Address = 23 Write data to LBA=1011 Free Used Free Used . Free Free NFTL is proposed to reduce the memory space requirements. We use an example to illustrate NFTL. A logical address under NFTL is divided into a virtual block address and a block offset. For example, LBA=1011, its VBA is 126 and block offset is 3. NFTL has a address translation table, where each entry in the table record two addresses for a primary block and a replacement block. Assume that we write data to LBA=1011, we check if the page with the block offset=3 in the primary block can be used. If the page has been written, we write data to the first free page in the replacement block. Used Free (9,23) Block Offset=3 Free Free VBA=126 If the page has been used . Write to the first free page Free Free Free Free Free Free

Memory Space Requirements - NFTL
NFTL does not need large memory space requirements, compared to FTL. NFTL adopts a block-level address translation. NFTL would need 64KB memory space to store 16,384 (256*1024/16) entries for 256MB NAND flash. Assume that a block consists of 32 pages. 因為NFTL所需要的空間較少所以一般在embedded system上都是用NFTL 再來看一下NFTL其他方面的表現 56*1024*1024/32*512

Address Translation Time - NFTL
The address translation performance of read and write requests can be deteriorated, due to linear searches of physical addresses. Assume that each block contains 8 pages. Let LBA A, B, C, D, and E be written for 5, 5, 1, 1, and 1 times, respectively. Their data distribution could be like to what in the left figure. For example, it might need to scan 9 spare areas for LBA B. 剛剛是NFTL的優點再來缺點是慢

Garbage Collection Overhead - NFTL
2. Erase the old primary block and the replacement block. Copy the most-recent content to the new primary block. 3. Overhead is 2 block erases and 5 page writes.

Space Utilization - NFTL
3 free pages are wasted.

Motivation An adaptive two-level management design of a flash translation layer, called AFTL. Exploit the advantages of the fine-grained address mechanism and the coarse-grained address mechanism. FTL NFTL AFTL Memory Space Requirements Large Small A little larger than NFTL Address Translation Time Short Long Much Better than NFTL Garbage Collection Overhead Less More Space Utilization High Low An adaptive two-level management design of a flash translation layer is called AFTL. AFTL is to exploit the advantages of the fine-grained address mechanism and the coarse-grained address mechanism. We list a table to have a comparison of FTL, NFTL, and AFTL.

Two-Level Management Fine-grained AddrTM Coarse-grained AddrTM
Has a fine-grained hash table, where each entry is a link list of fine-grained slots. (In order to have the memory space requirements inder control, the total number of fine-grained slots should be bounded) (LBA, PBA) Coarse-grained AddrTM Has a coarse-grained hash table, where each entry is a link list of coarse-grained slots. (VBA, PPBA, RPBA) Any given LBA is first given to Fine-grained AddrTM for PBA look-up. If there is no match, LBA is sent to coarse-grained AddrTM.

AFTL – Coarse-to-Fine Switching
AFTL doesn’t erase the two blocks immediately. 2. AFTL moves the mapping information of the replacement block to the fine-grained hash table by adding fine-grained slots. 因為要節省空間所以fine-grained slots的總數必須限制住要一起erase, 如果A, B, C, D, E都出現在replacement block的話 3. The RPBA field of the corresponding mapping information is nullified.

AFTL – Fine-to-Coarse Switching
The number of the fine-grained slots is limited. Some least recently used mapping information of fine-grained slots should be moved to the coarse-grained hash table. Assume that this fine-grained slot is to be replaced. Data stored in the page with the given (physical) address are copied to the primary or replacement block of the corresponding coarse-grained slot, as defined by NFTL. (F, PBA) 這種switch 並不是因為data無用所以valid page copy的overhead是不必要的必須限制這種情況的產生 3. If there dose not exist any corresponding coarse-grained slot, a new one is created.

AFTL – Fine-to-Coarse Switching
Coarse-to-fine switches would introduce fine-to-coarse switches and overhead in valid page copying. It is because the number of the fine-grained slots is limited. Stop any coarse-to-fine switch when some frequency bound in coarse-to-fine switches is reached. We set a parameter in the experiments to control the frequency of switches to explore the behavior of the proposed mechanism.

The Advantages of AFTL Improve the address translation performance.
It is because the moving of their mapping information to the fine-grained hash table. Improve the garbage collection overhead. The delayed recycling of any replacement block reduces the potential number of valid data copyings and blocks erased. Improve the space utilization. The delayed recycling of any primary block lets free pages of a primary block be likely used in the future.

Performance Evaluation
Performance Setup The characteristics of the experiment trace was over a 20GB disk. CPU Intel Celeron 750MHz RAM 320 MB OS Windows XP File Systems NTFS Applications Web Applications, Clients, MP3 Player, MSN Messenger, Word, Excel, PowerPoint, Media, Player, Programming, and Virtual Memory Activities Durations One week Total Write / Read Requests 13,198,805 / 2,797,996 sectors Different LBA’s 1,669,228 We run the experiments under the trace over a 20GB disk.

Performance Evaluation
Performance Setup The maximum number of fine-grained slots is controlled by a parameter MFS. A parameter ST controls the frequency of switches between the two address translation mechanisms – n/ST. ST=0 => No constraint on the number of switches. Smaller ST => More switches. Larger ST => Less switches.

Memory Space Requirements
Since AFTL adopts a two-level address translation mechanism, the increased memory space requirements are for different settings of MFS. When MFS increases, the increased memory space also increases. However, AFTL uses a little more memory space than NFTL. 1. MFS ranged from 2,500, 5,000, 7,500, 10,000, 12,500, to 15,000. 2. AFTL uses a little more memory space than NFTL.

Address Translation Performance
1. Larger MFS => smaller address translation time - More address translations going through the fine-grained address translation mechanism. 2. Smaller ST => longer address translation time - More coarse-to-fine switches When MFS is larger, the address translation time is smaller. It is because a larger number of address translation go through the fine-grained address translation mechanism. When ST is smaller, the address translation time is larger. It is because a smaller ST value encourages a significant number of coarse-to-fine switches. As a result, the mapping information of LBA's rotates quickly between the two-level address translation mechanism so that fine-grained slots are not used effectively before they face fine-to-coarse switches again.

Garbage Collection Overhead
MFS大的可以增加c to f switch 所以可以避免掉NFTL立即的recycling primary and replacement blocks還有valid data copying When garbage collection overhead is considered, AFTL outperforms NFTL. It is because coarse-to-fine switches can avoid immediately recycling of their primary and replacement blocks and related valid data copyings. As a result, a smaller ST value can increase the number of coarse-to-fine switches so that the garbage collection overhead could be less. AFTL outperforms NFTL. - Coarse-to-fine switches can avoid immediately recycling of their primary and replacement blocks and related valid data copyings. - A smaller ST value can increase the number of coarse-to-fine switches.

Space Utilization The Space utilization might be better under AFTL.
- Coarse-to-fine switches can delay the recycling of replacement blocks. - Free pages of primary blocks might be used in the future. The space utilization might be better under AFTL. It is because coarse-to-fine switches can delay the recycling of primary blocks such that the free pages of primary blocks might be used in the future.

Conclusion AFTL is proposed to
exploit the advantages of fine-grained/coarse-grained address translation mechanisms, and to switch dynamically and adaptively the mapping information between the two address translation mechanisms. AFTL does provide good performance in address mapping and space utilization and have garbage collection overhead and memory space requirements under proper management.

Thank You

System Architecture Garbage Collection Address Translation
FTL/NFTL Layer File system (FAT, EXT2, NTFS......) Device Driver fwrite(file,data) Block write (LBA,size) Flash I/O Requests Control signals File Systems process Applications Flash-Memory Storage System Physical Devices (Flash Memory Banks) This is a block diagram of flash-memory storage systems. FTL/NFTL is to manage the flash memory. Address translation is to resolve the “out-place update” problem. Especially, garbage collection is to recycle the invalidated data in the flash memory.

Chin-Hsien Wu & Tei-Wei Kuo

Similar presentations

Presentation on theme: "Chin-Hsien Wu & Tei-Wei Kuo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chin-Hsien Wu & Tei-Wei Kuo

Similar presentations

Presentation on theme: "Chin-Hsien Wu & Tei-Wei Kuo"— Presentation transcript:

Similar presentations

About project

Feedback