Mikael Collin Mälardalen University 1 SoCrates -A Multiprocessor SoC in 40 days Mikael Collin Co-authors: Raimo Haukilahti, Mladen Nikitovic, Joakim Adomat. Computer Architecture Lab (CAL) MRTC Mälardalen University Västerås Sweden
Mikael Collin Mälardalen University 2 Outline Introduction & Motivation System overview Platform description Data prefetch functionality Application development flow Results & Conclusions Future work
Mikael Collin Mälardalen University 3 Introduction & Motivation Introduction Parameterizable MSoC platform implemented within a master thesis conducted by three students. Motivation Challenges of SoC design –Design time –Verification time –Time-to-market Predictability (real-time aspects) Scalability
Mikael Collin Mälardalen University 4 System overview Hardware Distributed shared memory (DSM) Hardware OS support (RTU) Single FPGA implementation Software Thread level parallelism (TLP) Software initated data prefetch All GNU design flow
Mikael Collin Mälardalen University 5 Platform description Interconnect RTU I/O PE Generic VHDL description Interchangeble components Scalable number of processing elements
Mikael Collin Mälardalen University 6 RTU I/O Processing Element (CPU-node) CPU/ DSP Memory Network Interface Processor types CPU/DSP Other Local memory Fast access No coherence problem Network interface Hides architectural complexity Acting as a MMU Interconnect
Mikael Collin Mälardalen University 7 Processor Synthesizable VHDL ARM7TDMI clone, due to its popularity and wide industrial use Runs a subset of the ARM-instruction set Predictability enhancement (no cache or pipeline) Prefetch mechanism –Software initated –Prefetch instruction added to instruction set –Increases predictability
Mikael Collin Mälardalen University 8 Data prefetch functionality extern int d; int main(void){ int var1, var2, sum; prefetch(&d); var1=read_sensor( ); var2=read_sensor2( ); sum=var1+var2+d; } Memory CPU NI pre(&d) &ddata Interconnect
Mikael Collin Mälardalen University 9 Application development flow createThread(..) Node1 Thread code createThread(..) Node2 gcc ld io.o OSkernel.o ld scripts Thread code
Mikael Collin Mälardalen University 10 SoCrates system today Configuration 2 CPU (ARM-clone) Shared bus (round robin arbitration) 8192 bytes RAM/node Technology Xilinx XCV1K 1.124,022 gates 16,384 bytes shared bus I/O RTU CPU node CPU node Thread
Mikael Collin Mälardalen University 11 Results & Conclusions Results In just 40 days a multiprocessor SoC on a single FPGA has been constructed System makes use of 58% of the XCV1000 Test application running threads on two CPUs Conclusions It is possible to implement a MSoC on a single FPGA A tight group working closely together can achieve great results due to the total system view
Mikael Collin Mälardalen University 12 Future work A more scalable interconnect (switches/p2p) Support several CPU architectures also DSPs Enhanced prefetch functionality Allowing task migration GUI-style platform generator (compiler) More Information