“Low-Power, Real-Time Object- Recognition Processors for Mobile Vision Systems”, IEEE Micro Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung Park ; Seungjin Lee ; Joo-Young Kim ; Jeong-Ho Woo ; Hoi-Jun Yoo Presenter: Juseong Lee,
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 2 Object Recognition by Juseong Lee
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 3 Object Recognition by Juseong Lee
Introduction 4 Source by MBN News
Introduction 5 Object recognition system –Require real-time operation High performance Low power in mobile system How can implement? –Find suitable algorithm SIFT algorithm –Hardware optimization Algorithm optimization Make exclusive processor –Parallel computation Multi-threading NoC SIFT - Scale Invariant Feature Transform NoC - Network on Chip Source by VOLVO
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 6 Object Recognition by Juseong Lee
Background Knowledge 7 What is SIFT algorithm? –Scale Invariant Feature Transform –The most popular candidate For how to extract some interest points out of the object and describe them – Robust against changes in translation, scaling, and rotation. Image matching by SIFT
Background Knowledge 8 What’s the problem in SIFT-based object recognition? –Consumes a lot of power Owing to the heavy computation required in descriptor Gen. and matching –Today’s high-resolution image sensors & tight power budgets Make real-time SIFT implementation in mobile device even harder Scare resources problem
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 9 Object Recognition by Juseong Lee
Main Idea 10 How can we solve the problem? –Make an object-recognition processor Using an attention-based recognition algorithm –For energy efficiency A heterogeneous multicore architecture –For data and thread parallelism Network-on-Chip(NoC) communication –For high bandwidth The processor determines Regions of Interest(ROI) part of image –For minimizing unnecessary computations Heterogeneous multicore architecture –provides several types of parallelism –achieves high throughput –low power consumption High-bandwidth NoC plays a role as the communications backbone
Why find ROI? 11 Image processing algorithm has no regard throughput Image size 480 x 360 Objects have feature! 172,800 computations! Example) Edge detection You can select part for reducing computation!
Main Idea – BONE V 12 Using Conventional method Using Main Idea
Main Idea – Algorithm 13 Attention-based object recognition
Main Idea – Architecture 14 Pixel level parallel Very long instruction word 3 stage task level pipeline 1.5x↓ power consumption 5 stage fine-grained pipeline 3.45x↑ pipeline throughput
SMT-enabled heterogeneous multicore processor 15 Throughput-optimized SFEC –Find ROI tile for energy efficiency –Memory locality with high bandwidth utilization Latency-optimized FMP –ROI tile and NoC help latency Power-optimized MLE –Changes the core’s thread allocation –and operating voltage and frequency dynamically BONE-V5: SFEC: SMT-enabled Feature Extraction Cluster FMP: Feature Matching Processor MLE: Machine Learning Engine
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 16 Object Recognition by Juseong Lee
Implementation 17
Implementation - Comparing 18
19 Implementation - Comparing
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 20 Object Recognition by Juseong Lee
Conclusion Energy efficient system is important to improve performance Algorithm and architecture have to optimize at the same time BONE-V multicore processors can apply real- time object recognition system Future BONE-V processors will further lower the power consumption. 21
Outline Introduction Background Main Idea Implementation Conclusion Evaluation 22 Object Recognition by Juseong Lee
Evaluation Table 3 has to contain the result that comparing other recognition processor When hardware optimization, Not only overall algorithm but particular algorithm block optimization are needed –CORDIC based gradient and magnitude computation 23
Thanks for Ur listening! Thanks! 24