Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb.

Slides:

Advertisements

Similar presentations

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.

Advertisements

Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.

CS-334: Computer Architecture

FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

1 Lecture 2: Review of Computer Organization Operating System Spring 2007.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.

Computer System Overview

1 Computer System Overview OS-1 Course AA

Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.

1 CSIT431 Introduction to Operating Systems Welcome to CSIT431 Introduction to Operating Systems In this course we learn about the design and structure.

Computer System Overview

Chapter 7 Interupts DMA Channels Context Switching.

Chapter 1 and 2 Computer System and Operating System Overview

Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.

Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.

Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.

1 Computer System Overview Chapter 1 Review of basic hardware concepts.

Hardware Overview Net+ARM – Well Suited for Embedded Ethernet

Input / Output CS 537 – Introduction to Operating Systems.

Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.

Chapter 7 Input/Output Luisa Botero Santiago Del Portillo Ivan Vega.

Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Computer Systems Overview. Page 2 W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware resources of one.

1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.

Computer System Overview Chapter 1. Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users.

Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

Samsung ARM S3C4510B Product overview System manager

Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.

CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION

1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.

Make Hosts Ready for Gigabit Networks. Hardware Requirement To allow a host to fully utilize Gbps bandwidth, its hardware system must be ready for Gbps.

Operating Systems and Networks AE4B33OSS Introduction.

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 1 Computer System Overview.

1 Chapter 2: Computer-System Structures  Computer System Operation  I/O Structure  Storage Structure  Storage Hierarchy  Hardware Protection  General.

I/O Interfacing A lot of handshaking is required between the CPU and most I/O devices. All I/O devices operate asynchronously with respect to the CPU.

I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.

Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.

Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)

CH10 Input/Output DDDData Transfer EEEExternal Devices IIII/O Modules PPPProgrammed I/O IIIInterrupt-Driven I/O DDDDirect Memory.

Computer Science/Ch.3 Data Manipulation 3-1 Chapter 3 Data Manipulation.

Operating System Isfahan University of Technology Note: most of the slides used in this course are derived from those of the textbook (see slide 4)

Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.

We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.

Concurrency, Processes, and System calls Benefits and issues of concurrency The basic concept of process System calls.

EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture

Lecture 1: Review of Computer Organization

Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,

Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.

IT3002 Computer Architecture

Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.

Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.

1 Computer Architecture. 2 Basic Elements Processor Main Memory –volatile –referred to as real memory or primary memory I/O modules –secondary memory.

1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.

Amdahl’s Law & I/O Control Method 1. Amdahl’s Law The overall performance of a system is a result of the interaction of all of its components. System.

Computer Systems Overview. Lecture 1/Page 2AE4B33OSS W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware.

1 Computer System Overview Chapter 1. 2 Operating System Exploits the hardware resources of one or more processors Provides a set of services to system.

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview

Low Overhead Interrupt Handling with SMT

Presentation transcript:

Architectural Considerations for CPU and Network Interface Integration C. D. Cranor; R. Gopalakrishnan; P. Z. Onufryk IEEE Micro Volume: 201, Jan.-Feb. 2000, Page(s): Advisor: Ing-Jer Huang Student: Ming-Chih Chen 10/30/2000

What’s the Problem ? Due to the demands for internet and consumer appliances. Communication processor: integrates processing, networking, and system functions into a SOC. The primary challenges are minimizing cost and time to market. Reducing on-chip buffer is also to decreasing die size of a network processor.

Design Approaches Complex DMA controller and buffer management unit are the important considerations of designing network equipments, such as NIC, and network processor. UNUM supports extremely fast event processing and high performance data movement. The design of a communication processor often integrates IPs by licensing standard CPU and network interface cores. A multi-channel DMA design is both complex and time consuming.

Design Approaches (Cont.) Only one DMA channel can be active at a time. A DMA state machine is used to store the state information associated with each DMA channel (SA, DA, byte count, and descriptor pointer). When a DMA channel becomes active, the controller transfer its state from RAM to the state machine. After the operation completes, the controller writes back the update channel state to RAM.

Design Approaches (Cont.) The DMA controller must often be designed from scratch due to highly system dependencies. These system dependencies include: Performance requirements On-chip bus architecture Memory controller design The type and number of supported network Interfaces

Design Approaches (Cont.) Due to the transfer requirements of diverse DMA control and status information, channel customization interface-specific processing is used for data movement. Due to the complexity of a multi-channel DMA controller, a dedicated processor replaces data transfers of DMA’s. A communication processor for low-cost consumer applications should contain a single processor that performs all data movement, interface-specific processing, and application processing.

Multithreaded CPU for Event Processing Minimizing on-chip buffers results in small bus transfer thus increasing the number of events. Current processors service external events, using either polling or interrupts. A major component of interrupt latency is saving and restoring the state of the interrupted context.

Techniques used to reduce this overhead include: -- coding interrupt service routines in assembly language to use a small number of registers, -- switching to an alternate register set for interrupt processing, -- saving processor registers to unused floating-point registers, and -- providing on-chip memory for saving and restoring state. Multithreaded CPU for Event Processing (Cont.)

UNUM consists of three major components: -- An event mapper: used to initiate event service routine execution. -- A context scheduler: used to issue instructions to the CPU pipeline. -- A CPU pipeline: used to support concurrent execution of instructions from multiple contexts. 31n x 32 register file: n is the number of supported hardware contexts. UNUM employs a traditional single-issue RISC pipeline and multithreading which used to reduce event service latency.

Data Movement Instructions Programmed I/O (PIO) generates memory-to- memory transfers that require twice the bus bandwidth of fly-by DMA operations. PIO operations: -- Non-cacheable loads and stores result in single-word data transfers that achieve poor bus utilization. -- Cacheable loads and stores that generate burst transfers result in data cache pollution.

Data Movement Instructions (Cont.) IBIU (Internal Bus Interface Unit) incorporates a data mover and aligner that segments the on-chip bus into a memory bus and an I/O bus. Data movement operation occurs: -- Issue data movement instruction. -- Either stall the CPU pipeline until the operation completes or allow the pipeline to continue execution from on-chip caches as long as they are no misses.

Fly-by transfers: data bypasses the data cache and does not pollute it. TM2D: transfers data from memory to an interface. TD2 M: transfers data from an interface to memory. To maintain cache consistency, the cache supplies dirty data during TM2D processing, and performs cache invalidates during TD2M. Data Movement Instructions (Cont.)

TD2C: loads data directly into the data cache from an interface, eliminating an unnecessary transfer through memory. TM2DD: discards dirty data from the data cache as it is written to a network interface, potentially eliminating an unnecessary future write-back. Data Movement Instructions (Cont.)

Aligner Unaligned data transfers from memory to I/O device. The aligner uses a holding register, shifter, and multiplexer to align data as it flows from one bus to the other.

Simulation Results Simulation constraints: -- Cycle-accurate simulator of a UNUM-based communication processor MHz UNUM processor. -- Instruction cache: 8-Kbye, two-way set-associative data cache. -- Data cache: 2-Kbyte, two-way set-associative data cache bit system bus, 100-MHz SDRAM. -- Benchmarks were written in C.

Data Movement Three UNUM configurations: UNUM (hardware context switch with state preservation), fast interrupts (alternate register set with no interrupt state preservation), and normal processor interrupts. 40 Mbytes/sec, the cost of PIO dominates all other overheads x x x x Data cache miss had little effect The state of the service routine fits within a UNUM context, no state information needs to be loaded from memory

ATM Soft-SAR ATM (AAL-5) Segmentation and Reassembly (SAR): -- Datagram, connectionless, variable transfer rate, unreal-time transfer service, such as FTP, and Telnet. -- Transmit and receive processing like MAC. The SAR software uses three hardware contexts: -- ATM receive event processing -- ATM transmit event processing -- ATM transmit cell scheduling

ATM Soft-SAR Transmit throughput is lower because of the extra overhead associated with cell scheduling

Conclusion UNUM communication processor: bit RISC processor (Single-issue) n x 32 register file, where n is the number of supported hardware contexts. -- Data movement instructions replace DMA processing. (Fly-by processing) Low cost and time-to-market are the primary challenges of SOC design. The trend of Net+ Processor is irresistible.