Download presentation
Presentation is loading. Please wait.
Published byChloe Sanders Modified over 9 years ago
1
Lu Hao Profiling-Based Hardware/Software Co- Exploration for the Design of Video Coding Architectures Heiko Hübert and Benno Stabernack
2
Contents 1. Background 2. MEMTRACE profiler 3. Software/Hardware Optimization 4. Conclusion
3
Background -- profiling Profiling is used to understand the run- time behavior of applications
4
Efficient profiling approaches Software profiling Sampling, Instrumentation Flexible but have high overhead Hardware profiling Performance counter inexpensive but more rigid and may not be universally available Hybrid Combinations of the above Hold great potential since they combine the advantages of both without the drawbacks
5
An example of hardware profiling PC – Performance Counter
6
Background – system analysis Why we need profiling? It is very important to adapt the system to the application in order to find an efficient solution. Video coding
7
Contents 1. Background 2. MEMTRACE profiler 3. Software/Hardware Optimization 4. Conclusion
8
MEMTRACE profiler MEMTRACE delivers cycle-accurate profiling results on a C function level. The results include clock cycles, various memory access statistics, and optionally energy consumption estimation for reduced instruction set computer (RISC)-based processors. A focus is placed on memory access analysis, as for data-intensive applications this aspect has a high potential for increasing system efficiency.
9
MEMTRACE profiling toolflow
10
MEMTRACE -- Initialization
11
MEMTRACE – Performance Analysis
12
MEMTRACE – Post Processing
13
MEMTRACE backend
14
MEMTRACE -- Profiling data acquisition
15
init() Initialize the profiler. Creates a list of all functions and global variables nextInstruction() Checks if the program execution has changed from one function to another If so, the cycle count of the previous function is recalculated and the call count of the new function is incremented memoryAccess() It is decided if a load or store access was performed, and which bit-width (8, 16, or 32-bit) was used.
16
MEMTRACE -- Profiling data acquisition busActivity() Identifies the bus status (idle cycle, core access or DMA access) and increments the appropriate counter of the current function cacheMiss() Is called each time a cache miss occurs finish() When the ISS terminates the simulation
17
Processor model generator
18
Interconnection
19
What can we do by using the result of MEMTRACE profiler?
20
Contents 1. Background 2. MEMTRACE profiler 3. Software/Hardware Optimization 4. Conclusion
21
System partitioning Computationally intensive functions are well- suited for hardware acceleration in a coprocessor Control-intensive functions are better suited for software implementation on ASIPs (Application Specific Instruction set Processors)
22
Software Optimization Loop unrolling For computational intensive parts, arithmetic optimizations or SIMD instructions can be applied, if such instructions are available in the processor Video applications
23
Hardware Optimization Memory Subsystem Optimizations External memory Cache (Cache miss) The data areas with the most cache misses and the smallest size should be stored in on-chip memory SRAM Instruction Set Architecture Optimizations Frequently used instructions should be considered as targets for optimization during the processor architecture development.
24
Conclusion Profiling and system analysis MEMTRACE architecture Initialization Performance analysis Post processing Hardware/Software optimization Software Hardware
25
Lu Hao And questions?
26
References [1] H Hübert, B Stabernack. Profiling-based hardware/software co-exploration for the design of video coding architectures. IEEE Transactions on Circuits and Systems for Video Technology, 2009, Pages: 1680-1691 [2]ST Microelectronics: Nomadik STn8820 Mobile Multimedia Application Processor (2008, Feb.). Data brief. [Online]. Available: www.st.com [3] Broadcom: BCM2820 Low Power, High Performance Application Processor (2006, Sep.). Product brief. [Online]. Available: www.broadcom.com [4] G. de Micheli and L. Benini, Network on Chips. San Francisco, CA: Morgan Kaufmann, 2006. [5] H. H¨ubert, “MEMTRACE: A memory, performance and energy profiler targeting RISC-based embedded systems for dataintensive applications,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., Tech. Univ. Berlin, Germany, 2009. [Online]. Available: http://opus.kobv.de/tuberlin/volltexte/2009/2261
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.