Chapter 1 Introduction to the Systems Approach

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Lecture 6: Multicore Systems
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
KeyStone ARM Cortex A-15 CorePac Overview
The University of Adelaide, School of Computer Science
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Vacuum tubes Transistor 1948 –Smaller, Cheaper, Less heat dissipation, Made from Silicon (Sand) –Invented at Bell Labs –Shockley, Brittain, Bardeen ICs.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.
Computer System Architectures Computer System Software
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Introduction CEG 4131 Computer Architecture III Miodrag Bolic.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
CHAPTER 4 The Central Processing Unit. Chapter Overview Microprocessors Replacing and Upgrading a CPU.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Virtual Memory 1 1.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Lecture 2: Computer Architecture: A Science ofTradeoffs.
Pipelining and Parallelism Mark Staveley
Introduction to Virtual Memory and Memory Management
Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Introduction ELG 6158 Digital Systems Architecture Miodrag Bolic.
EKT303/4 Superscalar vs Super-pipelined.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Multiprocessor SoC integration Method: A Case Study on Nexperia, Li Bin, Mengtian Rong Presented by Pei-Wei Li.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Visit for more Learning Resources
Chapter 8: Main Memory.
Lecture 24: Memory, VM, Multiproc
Chapter 1 Introduction.
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 8: Efficient Address Translation
Main Memory Background
Fundamentals of Computing: Computer Architecture
The University of Adelaide, School of Computer Science
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

Chapter 1 Introduction to the Systems Approach Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processors, cache, memory, interconnect, design tools need to know user view: variety of processors basic information: technology and tools processor internals: effect on performance storage: cache, embedded and external memory interconnect: buses, network-on-chip evaluation: processor, cache, memory, interconnect advanced: specialized processors, reconfiguration design productivity: system modelling, design exploration

System on a Chip: driven by semiconductor advances

Basic system-on-chip model

SOC vs processors on chip with lots of transistors, designs move in 2 ways: complete system on a chip multi-core processors with lots of cache System on chip Processors on chip processor multiple, simple, heterogeneous few, complex, homogeneous cache one level, small 2-3 levels, extensive memory embedded, on chip very large, off chip functionality special purpose general purpose interconnect wide, high bandwidth often through cache power, cost both low both high operation largely stand-alone need other chips

iPhone: has System-on-Chip Source: UC Berkeley

iPhone SOC Memory Processor I/O 1 GHz ARM Cortex A8 Source: UC Berkeley 7

AMD’s Barcelona Multicore Processor 4 out-of-order cores 1.9 GHz clock rate 65nm technology 3 levels of caches integrated Northbridge Core 1 Core 2 512KB L2 512KB L2 2MB shared L3 Cache Northbridge Core 3 512KB L2 512KB L2 Core 4 http://www.techwarelabs.com/reviews/processors/barcelona/

SOC design: key ideas to design and evaluate an SOC, designers need to understand: its components: processors, memory, interconnect applications that it targets SOC economics heavily dependent on: costs: initial design, marginal production volume: applicability, lifetime reducing design complexity Intellectual Property (IP) reconfigurable technology

SOC processors usually a mix of special and general purpose (GP) can be proprietary design or purchased IP commonly GP processor is purchased IP includes OS and compiler support GP processor optimized for an application additional instructions vector units

Some processors for SOCs Basic ISA Processor description Freescale c600: signal processing PowerPC Superscalar with vector extension ClearSpeed CSX600: general Proprietary Array processor with 96 processing elements PlayStation 2: gaming MIPS Pipelined with 2 vector coprocessors ARM VFP11: general ARM Configurable vector coprocessor

Processor types: overview Architecture / Implementation approach SIMD Single instruction applied to multiple functional units Vector Single instruction applied to multiple pipelined registers VLIW Multiple instructions issued each cycle under compiler control Superscalar Multiple instructions issued each cycle under hardware control

Adding instructions additional instructions to support specialized resources exception: superscalar, with hardware control instructions can be added to base processor for coprocessor control VLIW: Very Large Instruction Word Array Vector

Sequential and parallel machines basic single stream processors pipelined: basic sequential superscalar: transparently concurrent VLIW: compiler generated concurrency multiple stream array processors vector processors multiprocessors

Sequential processors operation generally transparent to sequential programmer appear as in order instruction execution pipeline processor execution in order limited to one instruction execution / cycle superscalar processor multi instructions / cycle, managed by hardware VLIW multi op execution / cycle, managed by compiler

Pipelined processor IF: Instruction Fetch ID: Instruction Decode AG: Address Generation DF: Data Fetch EX: Execution WB: Write Back

Superscalar and VLIW processors

Superscalar VLIW

Superscalar VLIW

Parallel processors execution managed by programmer array processors single instruction stream, multiple data streams: SIMD vector processors SIMD multiprocessors multiple instruction streams, multiple data streams: MIMD

Array processors perform op if condition = mask operand can come from neighbour mask op dest sr1 sr2 n PEs, each with memory; neighbour communications one instruction issued to all PEs

Vector processors vector registers, eg 8 regs x 64 words x 64 bits vector instructions: VR3 <- VR2 VOP VR1

Array Processors Vector Processors

SOC multiprocessors

Memory and addressing many SOC memory designs use simple embedded memory a single level cache real (rather than virtual) addressing as SOC become more complex their designs are expected to use more complex memory and addressing configurations

Three levels of addressing

User view of memory: addressing a program: process address (offset + base + index) virtual address: process address + process id a process: assigned a segment base and bound system address: segment base + process address pages: active localities in main/real memory virtual address: translated by table lookup to real address page miss: virtual pages not in page table TLB (translation look-aside buffer): recent translations TLB entry: corresponding real and (virtual, id) address a few hashed virtual address bits address TLB entries if virtual, id = TLB (virtual, id) then use translation

The TLB and The MMU

SOC interconnect interconnecting multiple active agents requires bus bandwidth: capacity to transmit information (bps) protocol: logic for non-interfering message transmission bus AMBA (adv. Microcontroller bus architecture) from ARM, widely used for SOC bus performance: can determine system performance network on chip array of switches statically switched: eg mesh dynamically switched: eg crossbar

Bus based SOC

Network on a Chip

SOC design approach understand application (compiler, OS, memory and real time constrains) select initial die area, power, performance targets; select initial processors, memory, interconnect assume target processor and interconnect performance, design and evaluate memory evaluate and redesign processors with memory design interconnect to support processors and memory repeat and iterate to optimize

SOC design approach

Processor optimization example given embedded ARM processor in an SOC chip 1 IALU vs 2 IALU vs 3 IALU vs 4 IALU instructions per cycle? 16k L1 instruction cache vs 32k L1 i-cache how much improvement? less power? branch predictor: taken vs not-taken misprediction rate? aim: explore this large design space

Design cost: product economics increasingly product cost determined by design costs, including verification not marginal cost to produce manage complexity in die technology by engineering effort engineering cleverness

Design complexity

Cost: product program vs engineering

Two scenarios fixed costs Kf, support costs 0.1 x fct(n), and variable costs Kv*n, so design get more complex while production costs decrease K1 increases while K2 decreases implicitly requires higher volumes to break even when compared with 1995, in 2005 K1 increased by 10 times K2 decreased by the same amount

Two scenarios

Product volume dictates design effort

Reduce complexity: use IP IP: Intellectual Property

Reduce complexity: reconfig. tech. reconfigurable technology: no fabrication costs lower non-recurring engineering (NRE) costs reconfigurable design: faster and cheaper improve time-to-market reconfigurability in-system upgrade: improve time-in-market run-time adaptation: respond to run-time conditions compile-time reconfiguration: retarget accelerator overhead: performance, area, energy efficiency less effective than ASIC (application-specific IC)

Summary to design and evaluate an SOC, designers need to understand: its components: processors, memory, interconnect applications that it targets SOC economics heavily dependent on: costs: initial design, marginal production volume: applicability, lifetime reducing design complexity Intellectual Property (IP) reconfigurable technology