Download presentation
Presentation is loading. Please wait.
Published byDella Lyons Modified over 9 years ago
1
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015
2
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015 Presenter: Dr. Abu Asaduzzaman, Assistant Professor Prepared by: Mr. Kishore K. Chidella, PhD Student Computer Architecture and Parallel Programming Laboratory (CAPPLab) Department of Electrical Engineering and Computer Science (EECS) Wichita State University (WSU), USA
3
Dr. Zaman3 “Early Estimation of Cache Properties for Multicore Embedded Processors” Outline► ■Introduction Embedded systems with multicore processors Pros and cons due to cache ■Background and Motivation Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio ■Proposed Cache Modeling Strategy Multicore architecture for embedded systems Work-flow diagram ■Experimental Results ■Discussion QUESTIONS? Any time!
4
Dr. Zaman4 Authors ■Kishore K. Chidella, PhD Student EECS Department, Wichita State University (WSU), USA ■Muhammad F. Mridha, Assistant Professor CSE Department, University of Asia Pacific (UAP), Bangladesh ■Abu Asaduzzaman, Assistant Professor EECS Department, Wichita State University (WSU), USA Director, Computer Arch & Parallel Prog Lab (CAPPLab) “Early Estimation of Cache Properties for Multicore Embedded Processors”
5
Dr. Zaman5 Introduction ■Multicore Embedded Systems Future embedded systems should have multicore processors. Currently available single-core based simulation techniques are not adequate to design multicore embedded systems [1-4]. Software applications are having more and more threads to take advantage of the available cores [5-8]. Multicore processors are frequently deployed with multilevel cache memories [9]. Parallel thread execution to achieve the best performance in such a multicore system is difficult as it relates to cache sharing. Complex embedded systems design methodology needs supports from early estimation techniques. “Early Estimation of Cache Properties for Multicore Embedded Processors”
6
Dr. Zaman6 Background and Motivation ■Some Early Work The technical challenges associated with the integration of homogeneous and heterogeneous multiple cores in embedded systems is elucidated in [1]. However, a viable way to make early estimation on future embedded systems design is not provided. According to the experimental results published in [4], cache parameters and the application code size have impact on total power consumption and mean delay per task. This approach is not focused on designing embedded systems and does not cover the cache locking aspect. “Early Estimation of Cache Properties for Multicore Embedded Processors”
7
Dr. Zaman7 Background and Motivation (+) ■Some Early Work Issues related to cache locking at level-1 and level-2 caches are discussed in [11, 12]. In [14], various algorithms to select a set of instructions to be locked in cache are compared. Cache locking may improve performance. Entire (100% of the cache size) level-1 cache locking is not efficient for some applications, especially when the data size to be locked is smaller compared to the cache size. Worst-case performance with locked caches may degrade with large cache lines due to cache pollution [12]. “Early Estimation of Cache Properties for Multicore Embedded Processors”
8
Dr. Zaman8 Background and Motivation (+) ■Some Early Work These techniques are developed for single-core systems and not suitable for contemporary multicore embedded systems. Also, these techniques are not useful to estimate power consumption, a crucial design factor for embedded systems. Therefore, an early estimation technique to evaluate cache properties for multicore embedded systems is required. “Early Estimation of Cache Properties for Multicore Embedded Processors”
9
Dr. Zaman9 “Early Estimation of Cache Properties for Multicore Embedded Processors” Outline► ■Introduction Embedded systems with multicore processors Pros and cons due to cache ■Background and Motivation Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio ■Proposed Cache Modeling Strategy Multicore architecture for embedded systems Work-flow diagram ■Experimental Results ■Discussion QUESTIONS?Any time!
10
Dr. Zaman10 Proposed Cache Modeling Strategy ■Multicore Cache Organization Level-1 Private Split into I1 and D1 Level-2 Private or Shared Unified Level-3 Optional (or Shared) “Early Estimation of Cache Properties for Multicore Embedded Processors”
11
Dr. Zaman11 Proposed Cache Modeling Strategy (+) ■Cache Locking Private first level cache? Shared last level cache? Entire locking or partial/way locking? “Early Estimation of Cache Properties for Multicore Embedded Processors”
12
Dr. Zaman12 Proposed Cache Modeling Strategy (+) ■Work-Flow Master Core Select jobs Assign jobs Pre-load cache memory Mean delay; Total power Core x Select cache size Lock? (Yes or No) Assign task “Early Estimation of Cache Properties for Multicore Embedded Processors”
13
Dr. Zaman13 Simulation ■Simulation Tool VisualSim tool to develop the modeling platform ■Applications to Run the Simulation Program FFT (Fast Fourier Transform) GIF (Graphics Interchange Format) JPEG (Joint Photographic Experts Group) MPEG (Moving Picture Experts Group)-3 MPEG-4 Here, FFT is the smallest application (with code size 2.34 KB) and MPEG-4 is the biggest application (with code size 91.83 KB). “Early Estimation of Cache Properties for Multicore Embedded Processors”
14
Dr. Zaman14 Input / Output Parameters ■Inputs Number of cores: 4 (fixed) I1 / D1 size (KB): 2 / 2 (fixed) Line size (Byte): 128 (fixed) Associativity level (n-way): 8 (fixed) CL2 cache size (KB): 32, 64, 128, 256, or 512 Locked CL2 cache size (%): 0.0, 12.5, 25.0, 37.5, 50.0 ■Outputs Mean delay per task Total power consumption “Early Estimation of Cache Properties for Multicore Embedded Processors”
15
Dr. Zaman15 “Early Estimation of Cache Properties for Multicore Embedded Processors” Outline► ■Introduction Embedded systems with multicore processors Pros and cons due to cache ■Background and Motivation Impact of cache on performance and power consumption Optimized cache improves the performance to power ratio ■Proposed Cache Modeling Strategy Multicore architecture for embedded systems Work-flow diagram ■Experimental Results ■Discussion QUESTIONS?Any time!
16
Dr. Zaman16 Experimental Results ■Shared L2 Cache Size JPEG behaves almost like GIF and MPEG-3 behaves almost like MPEG-4. For CL2 cache size 32 KB to 128 KB, mean delay per task and total power consumption for MPEG-4 decrease significantly when we increase cache size and/or move from no locking to 25% locking. It should be noted that the impact of shared CL2 on power consumption is more significant than that on delay. “Early Estimation of Cache Properties for Multicore Embedded Processors”
17
Dr. Zaman17 Experimental Results (+) ■Shared L2 Cache Size Only for CL2 cache size 32 KB, mean delay per task and total power consumption for GIF decrease when 25% locking is applied. However, CL2 cache size/locking has no positive impact on mean delay per task and total power consumption for FFT. Increasing CL2 size beyond 128 KB has no positive impact (consumes more power without reducing the delay). “Early Estimation of Cache Properties for Multicore Embedded Processors”
18
Dr. Zaman18 Experimental Results (+) ■Shared L2 Cache Locking Cache locking at shared CL2 has significant impact on mean delay per task and total power consumption for large applications (like MPEG-4) than small applications (like FFT). According to shared CL2 cache locking results, the optimal performance (delay)/power ratio is obtained for 25% cache locking for all the workloads. “Early Estimation of Cache Properties for Multicore Embedded Processors”
19
Dr. Zaman19 Conclusions ■A simulation methodology is presented to early estimate the effective cache properties (parameters and locked cache size) for multicore embedded systems. ■A quad-core system with shared CL2 is simulated using FFT, GIF, JPEG, MPEG-3, and MPEG-4 workloads. ■Albeit both mean delay per task and total power consumption decrease when shared CL2 cache size is increased and/or cache locking is applied, it is noted that the impact of shared CL2 on power consumption is more significant than that on delay. “Early Estimation of Cache Properties for Multicore Embedded Processors”
20
Thank You! QUESTIONS? Contact: Abu Asaduzzaman E-mail: abuasaduzzaman@ieee.org Phone: +1-316-978-5261 CAPPLab: http://www.cs.wichita.edu/~capplab/ ISERD ICETM 2015 in Bangkok, Thailand “Early Estimation of Cache Properties for Multicore Embedded Processors”
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.