Download presentation
Presentation is loading. Please wait.
Published bySolomon York Modified over 9 years ago
1
Christian Steinle, University of Heidelberg, Institute of Computer Engineering1 L1 Hough Tracking on a Sony Playstation III Christian Steinle, Andreas Kugel, Reinhard Männer Computer Engineering, University of Heidelberg Contents –Motivation for the Playstation implementation –Algorithm adaption –Job restrictions –Algorithm implementation –Status
2
Christian Steinle, University of Heidelberg, Institute of Computer Engineering2 Hough Space: 1 dimension for 1 parameter of a track –bending 1/P z, angle (P x /P z ) and P y /P z ) –Hough Space dimensions: 95 x 31 (cells) x 191 (layers) –Detector slice with constant angle corresponds to one 2D Hough histogram –Detector slices are overlapping (multiple scattering) Motivation for the Playstation implementation
3
Christian Steinle, University of Heidelberg, Institute of Computer Engineering3 Motivation for the Playstation implementation L1 Hough Tracking algorithm development in HDL for FPGA HDL descriptions for all critical functional units Synthesis result of a single histogram layer implemented with registers shows the requirement of about 30.000 logic cells 50% of FPGA (XC4VLX60: 59,904) registers used for layer Conclusion: Develop a multi-chip solution
4
Christian Steinle, University of Heidelberg, Institute of Computer Engineering4 Motivation for the Playstation implementation Conclusion: Use the CellBE of a Sony Playstation III as cheap and flexible rapid prototyping system Multi-chip algorithmMulti-core platform
5
Christian Steinle, University of Heidelberg, Institute of Computer Engineering5 Algorithm adaption Conclusion: Two parallelism levels: multiple processors work on input hit data in parallel vector capable ALUs process multiple layers in parallel Creation of input data packages as so-called jobs
6
Christian Steinle, University of Heidelberg, Institute of Computer Engineering6 Job restrictions 1.Balance workload for the processors Number of histogram layers: 191 Number of processors for a Sony Playstation III: 6 –Cell BE: 8 but 1 disabled and 1 reserved for OS
7
Christian Steinle, University of Heidelberg, Institute of Computer Engineering7 Job restrictions 2.Optimal use of the processor‘s vector capable ALU Each histogram cell contains a 5 bit signature (1bit/detector) The vector capable ALU is 128 bit 128 bit / 5 bit / signature = 25 signatures in parallel Speed gain by using an implicite type instead of bit manipulation unsigned char (8 bit) is ALU and memory applicable Each histogram cell contains actually 3 unused bits 128 bit / 8 bit / signature = 16 signatures in parallel Random access requires the composing of the processing ALU vector with 16 elements from the memory
8
Christian Steinle, University of Heidelberg, Institute of Computer Engineering8 Job restrictions Hough transformed Hit contains γ min and γ max Hit is inserted into γ max - γ min + 1 consecutive layers Speed gain by optimizing the histogram memory structure while using the consecutive histogram layer entries for a single hit Direct memory access on up to 16 consecutive layers in parallel Each hit in a job is systolically processed just once in parallel A single hit can contribute to more than one job (γ max - γ min > 16, γ max of hit > γ max of actual job)
9
Christian Steinle, University of Heidelberg, Institute of Computer Engineering9 Job restrictions 3.Optimal use of the processor‘s local storage Local storage contains program code, static data, heap and stack Conclusion: Used memory for job = codingTable + histogram + input data Available memory = stackPointer – heapPointer - securityRegion
10
Christian Steinle, University of Heidelberg, Institute of Computer Engineering10 Algorithm implementation Configuration part on the PPU
11
Christian Steinle, University of Heidelberg, Institute of Computer Engineering11 Algorithm implementation Configuration part on the SPU
12
Christian Steinle, University of Heidelberg, Institute of Computer Engineering12 Algorithm implementation Different versions of the job creation are possible –CODEVERSION 1: Create all jobs with a loop over all hits before the SPU processing –CODEVERSION 2: Create all jobs with a loop over all jobs before the SPU processing –CODEVERSION 3: Create the jobs parallely pipelined with the SPU processing –CODEVERSION 4: Create the jobs for dedicated SPUs when they are ready
13
Christian Steinle, University of Heidelberg, Institute of Computer Engineering13 Algorithm implementation Processing part one on the PPU
14
Christian Steinle, University of Heidelberg, Institute of Computer Engineering14 Algorithm implementation Processing part two on the PPU
15
Christian Steinle, University of Heidelberg, Institute of Computer Engineering15 Algorithm implementation Processing part three on the PPU
16
Christian Steinle, University of Heidelberg, Institute of Computer Engineering16 Algorithm implementation Different versions of the histogram memory usage –MEMORYVERSION 1: Use vector type for memory and computation –MEMORYVERSION 2: Use smallest type for a single histogram cell in the memory and build vector type for computation –MEMORYVERSION 3: Use smallest type for histogram cells in parallel layers and build vector type for computation
17
Christian Steinle, University of Heidelberg, Institute of Computer Engineering17 Algorithm implementation Proccessing part on the SPU
18
Christian Steinle, University of Heidelberg, Institute of Computer Engineering18 Status Algorithm adaption is implemented Hough processing has to be implemented: –Actual step in scalar version: peakfinding2D –Missing steps in scalar version: peakfinding3D –Actual step in vector version: diagonalization –Missing steps in vector version: peakfinding2D, peakfinding3D Measure the timing Compare the timing with –the timing of the C++ framework implementation –the estimated timing of the FGPA implementation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.