Download presentation
Presentation is loading. Please wait.
Published byAlyson Marshall Modified over 8 years ago
1
Multicore CPU with Multi-Threading Operating Environment
2
Overview Team Goals Job Distribution Multicore IRQ WriteBackBuffer BlockRAM CLK4x Generic Memory Arbiter
3
Marcel Schaal Benedikt Weber Bastian Reitschuster Rudolf Netzel Team
4
Goals Dual core CPU WriteBackBuffer double Mandelbrot in split screen
5
Job Distribution Bastian + Rudolf WriteBackBuffer Interrupts Marcel + Benedikt Multicore CPU implementation OS + hase research
6
Problems Not enough Block RAMs Overclocking register file Number of read/write ports Timing is everything Clock skew and routing delay Only 24h a day Softwareenvironment
7
Multicore Architecture
8
WriteBackBuffer Why? Speedup Store-Instructions How? Buffer data and addresses in FIFO writing buffered elements into memory whenever possible
9
WBB implementation Using one BlockRAM for the WriteBackBuffer Two read and write ports needed Two read and write operations per cycle Internal read- and writepointer
10
WBB – first design
11
WBB - FSM
12
BlockRAM 2W2R Why? 2*32 bit data to write/read but one BlockRAM has only 2*16 bit How? Overclocking BlockRAM four times write A, write B, read A, read B
13
ERROR Because of limited BlockRAMs and even harder timing constraints Implementation as Distributed RAM with far less entries
14
BlockRAM 2R1W Why? Simple regfile needs four BlockRAMs Too many for more than four cores How? Again overclocking BlockRAM four times Register Input, Read A, Read B and Write
15
Generic Memory Arbiter Every core needs access to memory GMA handles memory request Generic in number of ports Similar to memory arbiter of task 2 needed for instruction fetch and load/store unit Round-Robin implementation
16
System stuff Only the master core can handle interrupts Every Core is able to execute other core Brancher handles execute-opcode (in/out)
17
Hase Adding new opcodes Don't let anybody see sourcecode Hase counts destination for labels wrong
18
Sample Video Please wait a moment
19
Surprise: Multi-Threading Why? Multiplicators need ~10% of boardarea SIMD doesn't utilize multiplicators enough Load-/Store-unit utilization < 1% (Mandelbrot) Less usage of boardarea than Multi-Core Still two weeks of time available It is possible so why not?
20
Features Generic number of cores, threads and SIMD synchronization unit (lock, unlock, access data) Shared multiplicators and load-/store unit Load-/store scheduler - similar to memory arbiters
21
Multi-Threading-Core
22
No final version Everything went wrong Too much timing problems 10ns are hard to achieve (somebody said impossible ) Two weeks are shorter than thought Strange behaviour of XST, Modelsim, boards Old working example doesn't work anymore
23
Thank You Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.