Physical Design of FabScalar Generated Superscalar Processors EE6052 Class Project Wei Zhang.

Slides:



Advertisements
Similar presentations
Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Advertisements

NC STATE UNIVERSITY FabScalar Center for Efficient, Scalable and Reliable Computing (CESR) Department of Electrical and Computer Engineering North Carolina.
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
1 U NIVERSITY OF M ICHIGAN 11 1 SODA: A Low-power Architecture For Software Radio Author: Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scott Mahlke, Trevor.
Instruction Level Parallelism (ILP) Colin Stevens.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Introduction to Systems Architecture Kieran Mathieson.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
CS252 Project Presentation Optimizing the Leon Soft Core Marghoob Mohiyuddin Zhangxi TanAlex Elium Dept. of EECS University of California, Berkeley.
Synthesis of Custom Processors based on Extensible Platforms Fei Sun +, Srivaths Ravi ++, Anand Raghunathan ++ and Niraj K. Jha + + : Dept. of Electrical.
Verification of Configurable Processor Cores Marines Puig-Medina, Gulbin Ezer, Pavlos Konas Design Automation Conference, 2000 Page(s): 426~431 presenter:
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
CSCE101 – Database Intro, CPU and Memory October 24, 2006.
Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Basics and Architectures
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
1 Conservation Cores: Reducing the Energy of Mature Computations Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin,
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Chapter One Introduction to Pipelined Processors.
1 Superscalar Pipelines 11/24/08. 2 Scalar Pipelines A single k stage pipeline capable of executing at most one instruction per clock cycle. All instructions,
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Teaching VLSI Design Considering Future Industrial Requirements Matthias Hanke
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
Advanced VLSI Design Unit 06: SRAM
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
VLIW Digital Signal Processor Michael Chang. Alison Chen. Candace Hobson. Bill Hodges.
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
Core-Selectability in Chip-Multiprocessors Hashem H. Najaf-abadi Niket K. Choudhary Eric Rotenberg.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
Module2: System Architecture for Reconfigurable Platform
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
What is a Microprocessor ? A microprocessor consists of an ALU to perform arithmetic and logic manipulations, registers, and a control unit Its has some.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
EKT303/4 Superscalar vs Super-pipelined.
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
Physical Design of FabScalar Generated Cores EE6052 Class Project Wei Zhang.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
Array Multiplier Haibin Wang Qiong Wu. Outlines Background & Motivation Principles Implementation & Simulation Advantages & Disadvantages Conclusions.
1 Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture Henry Wong Vaughn Betz, Jonathan Rose.
A Case for Standard-Cell Based RAMs in Highly-Ported Superscalar Processor Structures Sungkwan Ku, Elliott Forbes, Rangeen Basu Roy Chowdhury, Eric Rotenberg.
Physical Design of FabScalar Generated Cores
Application-Specific Customization of Soft Processor Microarchitecture
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
An Automated Design Flow for 3D Microarchitecture Evaluation
A High Performance SoC: PkunityTM
Measuring the Gap between FPGAs and ASICs
Arithmetic Building Blocks
Figure 7-1: Non-Pipelined Instruction Execution vs. 2-stage Pipeline
Application-Specific Customization of Soft Processor Microarchitecture
Instruction Level Parallelism
Presentation transcript:

Physical Design of FabScalar Generated Superscalar Processors EE6052 Class Project Wei Zhang

Outline  Heterogeneous Multi-Core Processors  FabScalar  Physical design of FabScalar generated cores  Things we can do  Conclusion 2Wei

Heterogeneous Multi-Core Processors Heterogeneous multi-Core processor  Contains multiple, differently-designed superscalar core types that can streamline the execution of diverse programs.  The core types differ from each other and target at different applications. Superscalar processor  Utilizes instruction level parallelism. Executes more than one instruction during a clock-cycle.  Dispatch instructions to redundant functional units such as ALU, multiplier, bit shifter, etc on the processor. “Achilles’ heel” of heterogeneous multi-core processor design  Design and verification effort is multiplied by the number of different core types, which limits the amount of architectural diversity that can be practically implemented. 3Wei

FabScalar Wei4  A toolset developed to automatically compose RTL designs of arbitrary cores within a canonical superscalar template.  Frames superscalar processors in a canonical template which defines canonical pipeline stages and interfaces among them.  A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage that differs in their superscalar dimensions.  An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration.

Canonical Pipeline Stages Wei5  The pipeline stages and interfaces are the same for all the superscalar processors.  Each pipeline stage is composed from the Canonical Pipeline Stage Library.

Canonical Pipeline Stage Library Wei6

FabMem: A Multiported RAM and CAM Compiler Wei7  Estimates read/write delays, read/write energies, and areas of user-specified multi-ported RAMs/CAMs.  Generates layouts of desired RAMs/CAMs. Limitations  FabMem is tied to a specific technology (FreePDK45).  The largest supported RAM is 512 words.  FabMem can generate RAMs for only 2XR-XW and XR-XW configurations. The maximum number of read ports is 16. The maximum number of write ports is 8.  The degree of column muxing in RAMs is limited to 1, 2, and 4.  The largest supported CAM is 256 words.  FabMem can generate CAMs for only XR-XW configurations. The maximum number of read ports is 8. The maximum number of write ports is 8.

Physical Design – by FabScalar Group Wei8  Use FabMem to generate custom designs of critical memory structures.  Course-grained floorplanning.  Little consideration of power, area and performance issues.  Use FreePDK45, a 45nm based standard cell library, for logic synthesis and place-and-route.

Physical Design – Things We Can Do Wei9  Floorplanning  Data path design  Memory design – comparison between FabMem and other memory compilers  Cadence vs Synopsys  Power planning

Conclusion Wei10  Heterogeneous Multi-Core Processors have many advantages  FabScalar uses canonical pipeline stages for superscalar processor generation.  Physical design of FabScalar generated cores remains to be further investigated.