Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, 2000. Presenter : Jo-Ning Yu Date : 2010/11/03.

Slides:

Advertisements

Similar presentations

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Advertisements

Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.

Lecture 12 Reduce Miss Penalty and Hit Time

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

1 Searching Very Large Routing Tables in Wide Embedded Memory Author: Jan van Lunteren Publisher: GLOBECOM 2001 Presenter: Han-Chen Chen Date: 2010/01/06.

CS61C L22 Caches III (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #22: Caches Andy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)

CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

A. Frank - P. Weisberg Operating Systems Simple/Basic Paging.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

Chapter 9 Classification And Forwarding. Outline.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory

CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.

CMPE 421 Parallel Computer Architecture

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

Computer Architecture Lecture 26 Fasih ur Rehman.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.

Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.

IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.

1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:

Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.

COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.

On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

IP Addressing. A 32-bit logical naming convention A dotted-decimal notation is used: – –Each number represents 8 bits. Number is Part.

CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.

COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.

CMSC 611: Advanced Computer Architecture

Memory Hierarchy Ideal memory is fast, large, and inexpensive

COSC3330 Computer Architecture

Lecture 12 Virtual Memory.

Cache Memory Presentation I

ECE 445 – Computer Organization

ECE232: Hardware Organization and Design

Morgan Kaufmann Publishers

CS-447– Computer Architecture Lecture 20 Cache Memories

CS 3410, Spring 2014 Computer Science Cornell University

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Synonyms v.p. x, process A v.p # index Map to same physical page

MEET-IP Memory and Energy Efficient TCAM-based IP Lookup

Presentation transcript:

Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03

  Introduction  Baseline : Host Address Cache(HAC)  Host Address Range Cache(HARC)  Intelligent Host Address Range Cache(IHARC) Outline 2

  Rather than blindly pushing the performance of packet processing hardware, an alternative approach is to avoid repeated computation by applying the time-tested architectural idea of caching to network packet processing.  Given caches of a fixed configuration, the only way to improve the cache performance is to increase their effective coverage of the IP address space, i.e., each cache entry covering a larger portion of the IP address space. Introduction 3

  Architecture Host Address Cache(HAC) 4

  A distinct difference between network packet streams and program reference streams is that the former lacks spatial locality, as evidenced by the fact that for a given cache size and degree of associativity, decreasing the block size monotonically decreases the cache miss ratio.  Poorer performance for caches with larger block size results because larger block size leads to inefficient cache space utilization when references to addresses within the same block are not correlated temporally.  We conclude that the block size of network processor caches should always be small, preferably one entry wide. Host Address Cache(HAC) 5

  Unlike CPU cache, temporal inconsistency in the host address cache is tolerable, because the routing protocol itself takes time to converge to the new routes.  Therefore, there is much more latitude in the timing of consistency maintenance actions.  As the flush interval increases, the miss ratio decreases as expected.  But the performance difference due to flushing, as shown by the ratio of the miss rates corresponding to the 100K and ∞ flush intervals, increases with the cache size.  The reason for this behavior is that larger caches require a longer cold-start time, and therefore tend to suffer more than smaller caches when the flush interval is small. Host Address Cache(HAC) 6 ※ Assume the cache is direct- mapped and its block size is one entry wide.

 Host Address Range Cache(HARC) 7  Each routing table entry corresponds to a contiguous range of the IP address space. 0x82f500000xffff0000  For example, a routing table entry with a network address field of 0x82f50000 and a network mask field of 0xffff0000 corresponds to a contiguous range in the IP address space.  Network addresses need to go through two additional processing steps before host address range cache (HARC) could be put to practical use.

 Host Address Range Cache(HARC)  First, with the longest prefix match requirement, it is possible that some routing table entry’s address range covers another’s address range.  The former is called an encompassing entry while the latter is an encompassed entry.  An encompassing entry’s network address is a prefix of those entries it encompasses. every address range in the IP address space is covered by exactly one routing table entry  The address range associated with each encompassed routing table entry needs to be ”culled” away from the address ranges of all the entries that encompass it, so that every address range in the IP address space is covered by exactly one routing table entry. 8

 Host Address Range Cache(HARC)  Second, adjacent address ranges that share the same output interface should be merged into larger ranges as much as possible.  Then the minimum of all resulting address range sizes is calculated.  This minimum size becomes the the minimum_range_granularity parameter of the HARC.  Range size, which is defined as log(minimum_range_granularity), thus represents the number of least significant bits of an IP address that could be ignored during routing-table lookup, since destination addresses falling within a minimum address range size are guaranteed to have the same lookup result. 9

 Host Address Range Cache(HARC) 10  Architecture  The destination address of an incoming packet is logically right-shifted by range size before being fed to the baseline cache.  Because each address range corresponds to a cacheable entity, HARC’s effective coverage of the IP address space is increased by a factor of minimum range granularity.

  HAC’s miss ratio is between 1.68 to 2.11 times higher than that of HARC.  In terms of average routing-table lookup time, HARC is between 58% and 78% faster than HAC, assuming that the hit access time is one cycle and the miss penalty is 120 cycles.  The miss ratio gap between HAC and HARC widens with the degree of associativity, because HARC benefits more from higher degrees of associativity by eliminating more conflict misses than HAC. Host Address Range Cache(HARC) 11 ※ Assume the block size is one entry wide.

  A traditional CPU cache of size 2 K and block size 1 directly takes the least significant K bits of a given address to index into the data and tag arrays.  In this section, we show that by choosing a more appropriate hash function for cache lookup, it is possible to further increase every cache entry’s coverage of the IP address space. 12 Intelligent Host Address Range Cache(IHARC)

  In this case, the total number of address ranges is 8, because the minimum range granularity is 2.  To further grow the address range that a cache entry can cover, one could choose the index bits carefully such that when the index bits are ignored, some of the identically labeled address ranges are now ”adjacent” and thus could be combined. Intelligent Host Address Range Cache(IHARC) 13  ex : index bit 為 bit 1(ignore)  > > > > 011 四個 host address 相鄰且 output interface 相同，可合併 output interface 皆為 1

  The K index bits divide the IP address space into 2 K partitions, each of which is mapped to one cache set. Each partition contains a number of address ranges and each range is associated with an output interface that is different from its neighboring address ranges. 14 Intelligent Host Address Range Cache(IHARC)

 15 Intelligent Host Address Range Cache(IHARC)

  Architecture  Since distinct address ranges in a cache set need unique tags, the number of distinct address ranges in a cache set represents the degree of contention in the cache set.  Thus, the index bits are selected in such a way that after the merging operation, the total number of address ranges and the difference between the number of address ranges across cache sets is minimized. 16 Intelligent Host Address Range Cache(IHARC)

 17 Intelligent Host Address Range Cache(IHARC)

  M i (S)  M i (S) is the number of ranges in the ith partition resulting from the set of index bits S.  M(S)  M(S) is the average of the metric M i (S) over all partitions i.  The first term of Equation 1 represents the total number of cacheable entities competing for the entire cache.  The second term is called the deviation term.  It quantifies the deviation in the number of cacheable entities across all the partitions induced by the set of index bits S.  In other words, the first and second terms measure the extents of capacity and conflict misses respectively. The weighting factor w in Equation 1 determines the relative importance of conflict miss reduction with respect to capacity miss reduction. 18 Intelligent Host Address Range Cache(IHARC)

  However, a general range check is still too expensive to be incorporated into caching hardware.  By guaranteeing that each address range size is a power of two and that the starting address of each range is aligned with a multiple of its size during the merge step, one can perform the range check simply by a mask-and-compare operation.  Therefore, each tag memory entry in the IHARC includes a tag field as well as a mask field, which specifies the bits in the address to be used in the ’tag match’.  To put these numbers in perspective, the number of entries in the original routing table is 39,681, and the number of address ranges from HARC is 2 27 or 134,217, Intelligent Host Address Range Cache(IHARC)

 HAC  Compared to HAC, IHARC reduces the average routing table lookup time by up to a factor of 5. HARC  In terms of average routing-table lookup time, HARC is between 2.24 and 3.18 times slower than IHARC. HARC  This is because HARC’s miss ratios are 2.91 to 7.09 times larger than IHARC’s.  In addition, the miss ratio gap between HARC and IHARC increases with the degree of associativity.  This result conclusively demonstrates that there is significant performance improvement to be gained from IHARC over HARC. 20 Intelligent Host Address Range Cache(IHARC) ※ Assume the block size is one entry wide.