Network Processors Harsh Chilwal. 900MHz Voice 1G 900MHz 1800MHz Voice 2G 900-1800-1900MHz Smart Phone Full web service 3G 900-1800MHz Voice Tiny Internet.

Slides:



Advertisements
Similar presentations
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Advertisements

Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v2.2—8-1 MPLS TE Overview Introducing the TE Concept.
Multi-Layer Switching Layers 1, 2, and 3. Cisco Hierarchical Model Access Layer –Workgroup –Access layer aggregation and L3/L4 services Distribution Layer.
VoipNow Core Solution capabilities and business value.
©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
SDN and Openflow.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Chapter 10 Wide Area Networks. Contents The need for Wide area networks (WANs) Point-to-point approaches Statistical multiplexing, TDM, FDM approaches.
Fixed Mobile Convergence T Research Seminar on Telecommunications Business Johanna Heinonen.
Router Architecture : Building high-performance routers Ian Pratt
By Aaron Thomas. Quick Network Protocol Intro. Layers 1- 3 of the 7 layer OSI Open System Interconnection Reference Model  Layer 1 Physical Transmission.
UCB Switches Jean Walrand U.C. Berkeley
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
4/22/2003 Network Processor & Its Applications1 Network Processor and Applications Prof. Laxmi Bhuyan
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report.
UCB Switches Jean Walrand U.C. Berkeley
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
IETF 90: VNF PERFORMANCE BENCHMARKING METHODOLOGY Contributors: Sarah Muhammad Durrani: Mike Chen:
MPLS networking at PSP Co Multi-Protocol Label Switching Presented by: Hamid Sheikhghanbari 1.
EWAN Equipment Last Update Copyright 2010 Kenneth M. Chipps Ph.D. 1.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
CECS 5460 – Assignment 3 Stacey VanderHeiden Güney.
Chapter 6 High-Speed LANs Chapter 6 High-Speed LANs.
Paper Review Building a Robust Software-based Router Using Network Processors.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Lecture 1 Introduction to Application Oriented Networking.
3G Single Core Modem A New Telecommunications Device Group 4: Warren Irwin, Austin Beam, Amanda Medlin, Rob Westerman, Brittany Deardian.
Chapter 1. Introduction. By Sanghyun Ahn, Deot. Of Computer Science and Statistics, University of Seoul A Brief Networking History §Internet – started.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
To be smart or not to be? Siva Subramanian Polaris R&D Lab, RTP Tal Lavian OPENET Lab, Santa Clara.
The University of Bolton School of Games Computing & Creative Technologies LCT2516 Network Architecture CCNA Exploration LAN Switching and Wireless Chapter.
LAN Switching and Wireless – Chapter 1
Chapter 1 Communication Networks and Services Network Architecture and Services.
ﺑﺴﻢﺍﷲﺍﻠﺭﺣﻣﻥﺍﻠﺭﺣﻳﻡ. Group Members Nadia Malik01 Malik Fawad03.
Convergence Technology. Ch 01 Telecom Overview  Define communications and telecommunications  Components of a communications system  Difference between.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
© 1999, Cisco Systems, Inc. 1-1 Chapter 2 Overview of a Campus Network © 1999, Cisco Systems, Inc.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
Networks and Distributed Systems Mark Stanovich Operating Systems COP 4610.
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
1 | © 2015 Infinera Open SDN in Metro P-OTS Networks Sten Nordell CTO Metro Business Group
Introduction to Network Processors Readout Unit Review 24 July 2001 Beat Jost Cern / EP.
Advanced Computer Networks Lecturer: E EE Eng. Ahmed Hemaid Office: I 114.
Introducing a New Concept in Networking Fluid Networking S. Wood Nov Copyright 2006 Modern Systems Research.
3G wireless system  Speeds from 125kbps-2Mbps  Performance in computer networking (WCDMA, WLAN Bluetooth) & mobile devices area (cell.
OpenFlow MPLS and the Open Source Label Switched Router Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Networks and Distributed Systems Sarah Diesburg Operating Systems COP 4610.
MPLS Introduction How MPLS Works ?? MPLS - The Motivation MPLS Application MPLS Advantages Conclusion.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Monday 3:00 to 4:00 and.
Chapter 1 Communication Networks and Services
ECE354 Embedded Systems Introduction C Andras Moritz.
Lecture 1 Overview of Communication Networks and Services
Lec 11 – Multicore Architectures and Network Processors
Instructor: L.N. Bhuyan CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L.N. Bhuyan.
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Chapter 8 – Data switching and routing
Presentation transcript:

Network Processors Harsh Chilwal

900MHz Voice 1G 900MHz 1800MHz Voice 2G MHz Smart Phone Full web service 3G MHz Voice Tiny Internet 2.5G 12 kb/s Data Rate Evolution : Cellular phone generation

Evolution : 3G cellular phones base station (BS) mobile station (MS) base station controller (BSC) 12Kb/secon d 5Mb/second 100Mb/second Network 100 MS 10 BS

base station (BS) mobile station (MS) base station controller (BSC) 1Mb/second 500Mb/second 50Gbit/second Network Evolution : 3G cellular phones 500 MS 100 BS NP

DS0 Year Bandwidth (Mb/s) 64K 1.5M DS DS3 44Mb OC12 622Mb x24 x28 x12 10Gb x16 10,000 OC ,000 x4 OC768 40Gb 2005 Evolution : Networks NP DS= Digital signalOC = Optical carrier

Networking Trends  Increasing Networking Traffic.  New sophisticated protocols are being introduced at rapid pace.  Need for supporting new applications to provide new services.  Convergence of voice and data networks introducing a lot of changes in the communication industry.  Increasing TTM Pressures  Decreasing product life cycles.

General Purpose Processor based Software Router  Benefits  Flexible for upgrading the system  Easy for supporting additional interfaces  Quick to develop new products with short TTM.  The core processor performs all the routing functionalities  Drawbacks  Not able to scale up for higher bandwidths, maximum up to OC-12 speeds only  Can support complex network operations viz., traffic engineering, QoS, etc  with a major reduction in performance

ASIC based Routers » Benefits  Provide wire-speed performances  provided high speed » Drawbacks  Lacks flexibility; difficult to meet changing market needs/demands  Long design cycles increases TTM reduces PLC.  Change in design or failure in design involves more risks  Need to replace the ASIC to provide new functionality  Complex network operation are still executed in software

Network Processor based boxes  Promises to provide performance and flexibility  Comprises of many packet processing elements supporting multiple threads  Achieves higher performance by pipelining and parallel processing both in terms of threads and packet processing elements  Brings-in flexibility by due software programming  Easy to add features

Network Processor

Basic Architecture of Network Processors

Basic architecture (contd.) Dispatcher Merger CP2CP1CP3CP4 Look-A-Side Co-processors Risc Com – Engine Multiple Streams

Intro: Systems and Protocols: Relation with Standards IETF / Forces WG: Data / Forwarding Plane Control Plane NPF: Service Layer System Wide No awareness where things are Functional Layer Awareness where things are Operational Layer Interface Management ITU-T/ANSI/ATM Forum: ATM IEEE Ethernet IETF/Protocols IPv4 MPLS PPP/L2TP IPv6 MIBs Protocols Systems

OSI Network Architecture DATA Application Pre. Session Transport Network Data Link Physical DATAAH DATAPH DATASH DATATH DATANH DATADH DATAPH Application Pre. Session Transport Network Data Link Physical Network AB

Typical Applications  WAN/LAN Switching and Routing, Multi- service Switches, Multi-layer switches, Aggregators  Web caching, Load balancing, Web switching, Content based load balancers  QoS solutions  VoIP Gateways  2.5G and 3G wireless infrastructure equipments  Security - Firewall, VPN, Encryption, Access control  Storage solutions  Residential Gateways

Software Framework

Scene setting - why specs are not enough  2 NPU vendors want to promote their solution with some ‘numbers’  Both chip architectures comprise –RISC engines –Hardware support engines –Various types of interfaces –Support for internal and external memory  They report the following data –Aggregate MIPS –Max number of lookups per second –... Commonalties in building blocks Commonalties in building blocks Commonalties in specifications Commonalties in specifications Commonalties in Interpretation? Commonalties in Interpretation?

Specifications

Test scenario  What is measured? Performance in packets per second versus a forwarding information base (FIB) that is increased in size.  Start application is IPv4.  Next, counters are added for per flow billing purposes.  Next, load balancing is introduced as an additional feature.  Finally, encryption becomes an additional requirement for 2% of the data that is being forwarded

Performance curves FIB (K entries) FIB (K entries) Performance (Mpps) Performance (Mpps) NPU B NPU A IPv4

Performance curves FIB (K entries) FIB (K entries) Performance (Mpps) Performance (Mpps) NPU B NPU A IPv4 + counters Requires more memory references Requires more memory references

Performance curves FIB (K entries) FIB (K entries) Performance (Mpps) Performance (Mpps) NPU B NPU A IPv4 + counters + Load balancing Requires even more memory references Requires even more memory references

Performance curves FIB (K entries) FIB (K entries) Performance (Mpps) Performance (Mpps) NPU B NPU A IPv4 + counters + Load balancing + encryption No extra references and resources available No extra references and resources available A does not have sufficient resources A does not have sufficient resources

Architecture A Key extract Key extract LU Count Int. mem Int. mem 3 MIPS cores 3 MIPS cores Int. mem Int. mem Int. mem Int. mem External Buffer Mem External Buffer Mem Sched OC-192 POS Hash IPv4 + counters + LB + crypto IPv4 + counters + LB + crypto

Architecture B IPv4 + counters + LB + crypto IPv4 + counters + LB + crypto LB 10 MIPS cores 10 MIPS cores External Buffer Mem External Buffer Mem 10GE Memory interface IMEM

Specifications - revisited

So  No clear value statement could be made in favor of either NPU solutions –NPU A achieves higher throughput but with limited flexibility –NPU B achieves lower throughput but is more flexible  Were the provided specs accurate? –Yes. –The devices performed up to spec. –Although NPU B looks better on paper at first sight, more resources have to be consumed for less per formant results. –There is a cost associated with flexibility  Were the provided specs relevant? –No. They represent granular maximum performances. –For ‘real world’ applications,  some resources could not be maximally consumed  some resources were over consumed

Benchmarking considerations  Processor core metrics are not always relevant for networking applications –It might be relevant for NPU B, since functionality relies almost totally on those cores. –It is definitely not the case for NPU A, since there is extensive additional hardware support for specific functions. GRANULARITY Highly granular specifications, data or benchmarking information can offer a wrongful picture of the actual performance capabilities of the DUT. Since Network Processing Devices are designed with specific applications in mind, benchmarks must exist for those specific applications GRANULARITY Highly granular specifications, data or benchmarking information can offer a wrongful picture of the actual performance capabilities of the DUT. Since Network Processing Devices are designed with specific applications in mind, benchmarks must exist for those specific applications

Benchmarking considerations  External factors affect NPD performance (where you don’t always suspect it) –A forwarding application relies on FIB lookups to determine the destination of a packet –The size of the FIB table can influence performance in many ways  Usage of multiple memory banks  increasing number of hash collisions EXTERNAL FACTORS Benchmarks should include parameters that take into account external factors that are relevant to the particular applications that are being benchmarked. EXTERNAL FACTORS Benchmarks should include parameters that take into account external factors that are relevant to the particular applications that are being benchmarked.

Benchmarking considerations  Interfaces present performance boundary conditions –Ethernet applications require inter frame gaps that result in more relaxed pps numbers INTERFACES Benchmarks should also specify the types of interfaces that are being used since those interfaces have an impact all by themselves on maximum performance figures INTERFACES Benchmarks should also specify the types of interfaces that are being used since those interfaces have an impact all by themselves on maximum performance figures

Benchmarking considerations  Combinations of applications or minor extensions have a completely different impact on both network processing devices –NPU A has a lot of well engineered hardware support that can offer additional services BUT fails almost completely when additional computing resources are required –NPU B is very ‘soft’; performance degrades slowly when additional services are requested and shows no abrupt peaks in the performance curves. HEADROOM Benchmarks should combine applications as they occur in the real world to give a ‘sense’ of headroom that is available to support real world scenarios. It is however very hard to define a metric for headroom HEADROOM Benchmarks should combine applications as they occur in the real world to give a ‘sense’ of headroom that is available to support real world scenarios. It is however very hard to define a metric for headroom

CommBench – A Telecommunication Benchmark For NPs CommBench HPAs PPAs  RTR  FRAG  DRR  TCP  CAST  ZIP  REED  JPEG

Benchmark Characteristics – Code & Computational Kernel Sizes

Benchmark Characteristics – Computational Complexity N a,l – Num Of Instructions/byte required for app a operationg on a packet of length l

Benchmark Characteristics – Instruction Set Characteristics

Benchmark Characteristics – Memory Hierarchy

Example System: Cisco Toaster  Almost all data plane operations execute on the programmable XMC  Pipeline stages are assigned tasks – e.g. classification, routing, firewall, MPLS –Classic SW load balancing problem  External SDRAM shared by common pipe stages

Example System: IXP 2400  XScale core replaces StrongARM  Microengines –Faster –More: 2 clusters of 4 microengines each  Local memory  Next neighbor routes added between microengines  Hardware to accelerate CRC operations and Random number generation  16 entry CAM ME0ME1 ME2ME3 ME4ME5 ME6ME7 Scratch /Hash /CSR MSF Unit DDR DRAM controller XScale Core QDR SRAM controller PCI

References  Network Processor Design – Patrick Crowley etal.  CommBench - A Telecommunications Benchmark for Network Processors, Tilman Wolf and Mark Franklin. Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),  Network Processing Forum - Benchmarking Network Processing Forum - Benchmarking  