ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Slides:



Advertisements
Similar presentations
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Advertisements

Cosc 3P92 Week 9 Lecture slides
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Chapter 12 Memory Organization
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—4-1 Implementing Inter-VLAN Routing Deploying Multilayer Switching with Cisco Express Forwarding.
Chapter 4 Conventional Computer Hardware Architecture
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Computer Architecture & Organization
Chapter 8 Hardware Conventional Computer Hardware Architecture.
10.2 Characteristics of Computer Memory RAM provides random access Most RAM is volatile.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Chapter 4 Network Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 14.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Chapter Hardwired vs Microprogrammed Control Multithreading
Memory Organization.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
ECE 526 – Network Processing Systems Design Packet Processing II: algorithms and data structures Chapter 5: D. E. Comer.
ECE 526 – Network Processing Systems Design
Chapter 9 Classification And Forwarding. Outline.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
Chapter 4 Queuing, Datagrams, and Addressing
Computer Networks Switching Professor Hui Zhang
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Paper Review Building a Robust Software-based Router Using Network Processors.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Cis303a_chapt06_exam.ppt CIS303A: System Architecture Exam - Chapter 6 Name: __________________ Date: _______________ 1. What connects the CPU with other.
A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
Router Architecture Overview
ECE 526 – Network Processing Systems Design Networking: protocols and packet format Chapter 3: D. E. Comer Fall 2008.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
1 Memory Hierarchy The main memory occupies a central position by being able to communicate directly with the CPU and with auxiliary memory devices through.
CPEN Digital System Design
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the system bus and bus protocol Describe how the CPU and bus interact with peripheral devices.
CS 4396 Computer Networks Lab Router Architectures.
Computer Architecture Lecture 32 Fasih ur Rehman.
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture
Overview von Neumann Architecture Computer component Computer function
Architectural Considerations A Review of Some Architectural Concepts.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
ECE 526 – Network Processing Systems Design Network Address Translator II.
Overview of microcomputer structure and operation
WAN Technologies. 2 Large Spans and Wide Area Networks MAN networks: Have not been commercially successful.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Network Processing Systems Design
Architecture & Organization 1
Chap. 12 Memory Organization
Chapter 1 Introduction.
Memory Organization.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Duo Liu, Bei Hua, Xianghui Hu, and Xinan Tang
Presentation transcript:

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer

Ning WengECE 5262 NP Architectures Last class: ─ Key requirement of network processor: flexibility and scalability ─ Optimized instruction set and parallel processing using multiprocessors This class: ─ Internal organization of NP: Computation, storage and communication Operating support Content addressable memory (CAM) ─ NP scaling issues

Ning WengECE 5263 NP Architectures NP architecture characteristics ─ Computation Processor hierarchy Special-purpose functional units ─ Storage Memory hierarchy Content addressable memory (CAM) ─ Communication Internal buses External interfaces ─ Operation support Concurrent/parallel execution support Programming models Dispatch mechanisms

Ning WengECE 5264 Processor Functionality

Ning WengECE 5265 Processor Pyramid

Ning WengECE 5266 Packet Flow through Hierarchy Accommodating tasks of different complexity and frequency ─ Low level: simple and frequent processing ─ High level: occasional and complex processing Computation scaling ─ Faster processor ─ More concurrent threads ─ More processors ─ More processor types

Ning WengECE 5267 Memory Hierarchy Different memory technologies used for performance, cost and area Conventional Approach: ─ Register + cache + off-chip DRAM Exploiting locality: temporal and spatial ─ Optimized for average case ─ Transparent to programmer Network Processors: ─ Register, scratch pad, control store, onboard RAM, CAM/TCAM, SRAM and SDRAM ─ Specialized for network processing application Little temporal locality ─ Explicit to application developer Different to programming More control ─ Memory hierarchy is not “cached” but used explicitly

Ning WengECE 5268 Memory Technology Characterized by access latency, area ─ SRAM: 2-10 ns, 4-6 transistors ─ DRAM: ns, 1 or 3 transistors What data should be store where? ─ Instruction data ─ Packets data: header, payload and meta-data ─ Temporal data: data structure allocated on the stack ─ Application data: persistent data, e.g., routing table, rule file

Memory Size Example Consider a network system that processes IP datagram. Assume the system executes 5,000 instructions per packet, each instruction occupies 4 bytes, 10% of instructions need to access 4-byte value memory, each datagram consists of 1500 bytes, a lookup examines 10 4-byte values on average in an IP routing table, and a datagram arrives and leaves in an Ethernet frame. Compute the total number of memory locations accessed to process on datagram. Assume no memory caching. ─ Instruction Memory: ─ Packet Memory: ─ Application Memory: ─ Temporary Memory: Total: Ning WengECE 5269

Ning WengECE Memory Scaling Memory access time: raw access speed ─ Technology dependent ─ Important for random access Memory bandwidth ─ Important for overall system performance ─ Scale with Multiple ports Multiple banks Wider bus ─ Limits by Pins and package cost

Ning WengECE Content Addressable Memory Not using address to locate content CAM using content as input in a query-style format Organized as array of slots Combination of mechanisms ─ Random access storage ─ Exact-match pattern search Rapid search enabled with parallel hardware

Ning WengECE Lookup using Conventional CAM Given ─ Pattern for which to search ─ Known as key CAM returns ─ First slot that match key or ─ All slots that match key Algorithm for each slot do { if (key == slot) { declare key matches slot; } else { declare key does not match slot; }

Ning WengECE Ternary CAM (TCAM) Regular CAM ─ Binary value: 0 and 1 ─ Requiring key to match all the content in one slot ─ Not flexible TCAM ─ Ternary value: 0, 1 and don’t care ─ Implemented using masking of entries Good for network processor flow classification

Ning WengECE TCAM Lookup Each slot has bit mask Hardware uses mask to decide which bits to test Algorithm for each slot do { if (key & mask ) == (slot & mask)) { declare key matches slot; } else { declare key does not match slot; }

Ning WengECE Partial Matching using TCAM Key matched slot 1 Packet belonging to flow ID: Here “additional information” stored in each slot

Ning WengECE Classification using TCAM Flexibility: “additional information” stored in separate memory Extracting values from fields in headers Forming values in contiguous string Using a key for TCAM lookup Storing classification in slot

Ning WengECE Communication Internal interfaces: channels between processing elements, memories ─ Internal bus ─ Hardware FIFO: sequential access ─ Transfer register: random access ─ Onboard shared memory: shared random access External interfaces ─ Memory interfaces: accesses to larger off-chip memory ─ Direct I/O interfaces: e.g., access to link interfaces ─ Bus interfaces: accesses to other devices, e.g., control CPU ─ Switching fabric interface Access to switching fabric Several standards (e.g., CSIX by NP Forum)

Communication Cost Example Consider a second generation network system that forwards IP datagram. If the system has 16 interfaces that each connect to an OC-192 line (data rate is 10 Gbps). These 16 interfaces are interconnected with a shared communication channel. The packet size is in the range of 40 bytes to 1500 bytes. What aggregate bandwidth is needed on the communication channel for the two design scenarios: ─ Every bit of a packet transfers through the shared communication channels. ─ Only a 4-byte packet memory address transfers through the shared communication channels. Ning WengECE 52618

Ning WengECE NP Operating Support Programming model: interrupt, event vs. thread based Parallel and concurrent execution support Dispatch mechanism: how threads are initiated

Ning WengECE Summary NP scaling by ─ Heterogeneous multiprocessors structured hierarchically ─ Mixed memory technologies explicitly available to programmer ─ Different communication mechanisms ─ Operating support important to achieve high system performance NP scaling limited by ─ Physical space: chip area (less than 400 mm 2 ) ─ Pin limits and packaging technology ─ Power consumption and heat dissipation

Ning WengECE For Next Class and Reminder Read Comer: chapter 15 and 16 Homework solution on-line by Friday Midterm: 10/6 Project ─ topic finalized 10/5 (group leader me) ─ proposal presentation 10/22