1June 9, 2006Connections 2006 FPGA-based Prototyping of the Multi-Level Computing Architecture presented by Davor Capalija Supervisor: Prof. Tarek S. Abdelrahman.

Slides:



Advertisements
Similar presentations
Vector Processing as a Soft-core CPU Accelerator Jason Yu, Guy Lemieux, Chris Eagleston {jasony, lemieux, University of British Columbia.
Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines.
Nios implementation in CCD Camera for "Pi of the Sky" experiment Photonics and Web Engineering Research Group Institute of Electronics Systems Warsaw University.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
Application of Instruction Analysis/Synthesis Tools to x86’s Functional Unit Allocation Ing-Jer Huang and Ping-Huei Xie Institute of Computer & Information.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
CSCE 430/830 Course Project Guidelines By Dongyuan Zhan Feb. 4, 2010.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Graphics on Key by Eyal Sarfati and Eran Gilat Supervised by Prof. Shmuel Wimer, Amnon Stanislavsky and Mike Sumszyk 1.
1 Nios II Processor Architecture and Programming CEG 4131 Computer Architecture III Miodrag Bolic.
Computer Architecture Computer Architecture Superscalar Processors Ola Flygt Växjö University +46.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hybrid Prototyping of MPSoCs Samar Abdi Electrical and Computer Engineering Concordia University Montreal, Canada
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Lab 2 Parallel processing using NIOS II processors
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
1 Packet Network Simulator-on-Chip Henry Wong Danyao Wang University of Toronto Connections 2009 ECE Graduate Symposium.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
Optimizations for the Multi-Level Computing Architecture Presented by: Utku Aydonat Kirk Stewart Ahmed Abdelkhalek Ivan Matosevic Supervisor: Prof. Tarek.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
15-740/ Computer Architecture Lecture 2: ISA, Tradeoffs, Performance Prof. Onur Mutlu Carnegie Mellon University.
Exploring SOPC Performance Across FPGA Architectures Franjo Plavec June 9, 2006.
15-740/ Computer Architecture Lecture 3: Performance
ECE354 Embedded Systems Introduction C Andras Moritz.
Application-Specific Customization of Soft Processor Microarchitecture
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
CS161 – Design and Architecture of Computer Systems
Course Overview.
Hardware Support for Embedded Operating System Security
Computer Architecture Lecture 4 17th May, 2006
Simultaneous Multithreading in Superscalar Processors
Interconnect with Cache Coherency Manager
Coe818 Advanced Computer Architecture
Overview Prof. Eric Rotenberg
Application-Specific Customization of Soft Processor Microarchitecture
Research: Past, Present and Future
Presentation transcript:

1June 9, 2006Connections 2006 FPGA-based Prototyping of the Multi-Level Computing Architecture presented by Davor Capalija Supervisor: Prof. Tarek S. Abdelrahman Connections 2006

2June 9, 2006Connections 2006 A modern processor Superscalar, out-of-order and speculative execution XU Control Unit Instruction Queue XU Register File Memory Execution units

3June 9, 2006Connections 2006 Multi-level Computing Architecture while(…) { Allocate (out frame) Preprocess(…) Analyze(…) Output(…) } PU Control Processor Task Scheduler PU Allocate() Preprocess() Analyze() Shared Memory Universal Register File Tasks Control Program Task instruction

4June 9, 2006Connections 2006 Previous work in the MLCA group Automatic task formation –Kirk Stewart Compile-time optimizations to extract parallelism –Utku Aydonat Task memory management –Ahmed Abdelkhalek Power optimization using dynamic voltage scaling –Ivan Matosevic Work done using a high-level functional simulator

5June 9, 2006Connections 2006 Motivation and goal Realistic cycle-accurate evaluation using an FPGA-based prototype –Feasibility of hardware implementation Deliver scalable performance –The control processor is expected to be a bottleneck Custom hardware design of the control processor –Contribution: microarchitecture of the control processor

6June 9, 2006Connections 2006 Challenges Mapping the architecture to FPGA device resources High requirements for on-chip memory: blocks, capacity & ports –System: shared memory, URF –PUs: caches, private and instruction memories –CP: renaming tables, task queues Control processor microarchitecture design space –Performance vs. area trade-offs Support for speculative execution of tasks

7June 9, 2006Connections 2006 Status Initial FGPA-based prototype –Nios II Development Board, Stratix Pro Edition (1S40) –Based on initial implementation by David Han PUs - Altera Nios II/f processors Interconnect - Altera Avalon interconnect Memory - both on-chip & off-chip Software-based control processor –Emulated on one Nios II/f processor Determining and removing bottlenecks Next step: microarchitecture of the Control Processor

8June 9, 2006Connections 2006 Bonus I$ D$ PU1 Ins1 M Priv1 M Shared memory Universal Register File I$ D$ CP CP’s mem I$ D$ PU2 Ins2 M Priv2 M CP TQRT I$ D$ PU3 Ins3 M Priv3 M I$ D$ PU4 Ins4 M Priv4 M Comm1Comm2Comm3 Comm4 FPGA device