Mikael Collin Mälardalen University 1 SoCrates -A Multiprocessor SoC in 40 days Mikael Collin Co-authors: Raimo Haukilahti, Mladen Nikitovic, Joakim Adomat.

Slides:



Advertisements
Similar presentations
Multiprocessor Architecture for Image processing Mayank Kumar – 2006EE10331 Pushpendre Rastogi – 2006EE50412 Under the guidance of Dr.Anshul Kumar.
Advertisements

Multiple Processor Systems
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen1 B.D. Theelen Architecture Design of a Scalable Single-Chip Multi-Processor.
WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s: Computer Hardware 1 WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s Computer Hardware Department of.
1 Scaleable Architecture for Real-Time Applications, SARA Lennart Lindh, Tommy Klevin and Johan Furunäs, Department of Computer Engineering (IDT), Mälardalens.
© DEEDS – OS Course WS11/12 Lecture 10 - Multiprocessing Support 1 Administrative Issues  Exam date candidates  CW 7 * Feb 14th (Tue): * Feb 16th.
NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Types of Parallel Computers
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Designed by Mikael Collin, June 2001 Dynamic Arbitration through OS Controlled Processor Priority ” Reducing Occurrences of Priority Inversion in MSoC’s.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Reducing Occurrences of Priority Inversion in MSoC's using Dynamic Processor Priority Assignment Mikael Collin Mladen Nikitovic Christer Norström Research.
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Introduction to Systems Architecture Kieran Mathieson.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Configurable System-on-Chip: Xilinx EDK
Figure 1.1 Interaction between applications and the operating system.
Reconfigurable Computing in the Undergraduate Curriculum Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Content Project Goals. Term A Goals. Quick Overview of Term A Goals. Term B Goals. Gantt Chart. Requests.
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
SOC Consortium Course Material ASIC Logic National Taiwan University Adopted from National Chiao-Tung University IP Core Design.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Introduction CSE 410, Spring 2008 Computer Systems
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Configurable, reconfigurable, and run-time reconfigurable computing.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
Types of Operating Systems
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Why do you need to study computer organization and architecture? Computer science and IT.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Lab 2 Parallel processing using NIOS II processors
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
VLSI Algorithmic Design Automation Lab. THE TI OMAP PLATFORM APPROACH TO SOC.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Asymmetric FPGA-loaded hardware accelerators for FPGA- enhanced CPU systems with Linux Performed by:Avi Werner William Backshi Instructor:Evgeny Fiksman.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
CMSC 611: Advanced Computer Architecture
A High Performance SoC: PkunityTM
Presentation transcript:

Mikael Collin Mälardalen University 1 SoCrates -A Multiprocessor SoC in 40 days Mikael Collin Co-authors: Raimo Haukilahti, Mladen Nikitovic, Joakim Adomat. Computer Architecture Lab (CAL) MRTC Mälardalen University Västerås Sweden

Mikael Collin Mälardalen University 2 Outline Introduction & Motivation System overview Platform description Data prefetch functionality Application development flow Results & Conclusions Future work

Mikael Collin Mälardalen University 3 Introduction & Motivation Introduction Parameterizable MSoC platform implemented within a master thesis conducted by three students. Motivation Challenges of SoC design –Design time –Verification time –Time-to-market Predictability (real-time aspects) Scalability

Mikael Collin Mälardalen University 4 System overview Hardware Distributed shared memory (DSM) Hardware OS support (RTU) Single FPGA implementation Software Thread level parallelism (TLP) Software initated data prefetch All GNU design flow

Mikael Collin Mälardalen University 5 Platform description Interconnect RTU I/O PE Generic VHDL description Interchangeble components Scalable number of processing elements

Mikael Collin Mälardalen University 6 RTU I/O Processing Element (CPU-node) CPU/ DSP Memory Network Interface Processor types CPU/DSP Other Local memory Fast access No coherence problem Network interface Hides architectural complexity Acting as a MMU Interconnect

Mikael Collin Mälardalen University 7 Processor Synthesizable VHDL ARM7TDMI clone, due to its popularity and wide industrial use Runs a subset of the ARM-instruction set Predictability enhancement (no cache or pipeline) Prefetch mechanism –Software initated –Prefetch instruction added to instruction set –Increases predictability

Mikael Collin Mälardalen University 8 Data prefetch functionality extern int d; int main(void){ int var1, var2, sum; prefetch(&d); var1=read_sensor( ); var2=read_sensor2( ); sum=var1+var2+d; } Memory CPU NI pre(&d) &ddata Interconnect

Mikael Collin Mälardalen University 9 Application development flow createThread(..) Node1 Thread code createThread(..) Node2 gcc ld io.o OSkernel.o ld scripts Thread code

Mikael Collin Mälardalen University 10 SoCrates system today Configuration 2 CPU (ARM-clone) Shared bus (round robin arbitration) 8192 bytes RAM/node Technology Xilinx XCV1K 1.124,022 gates 16,384 bytes shared bus I/O RTU CPU node CPU node Thread

Mikael Collin Mälardalen University 11 Results & Conclusions Results In just 40 days a multiprocessor SoC on a single FPGA has been constructed System makes use of 58% of the XCV1000 Test application running threads on two CPUs Conclusions It is possible to implement a MSoC on a single FPGA A tight group working closely together can achieve great results due to the total system view

Mikael Collin Mälardalen University 12 Future work A more scalable interconnect (switches/p2p) Support several CPU architectures also DSPs Enhanced prefetch functionality Allowing task migration GUI-style platform generator (compiler) More Information