Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Slides:



Advertisements
Similar presentations
4. Workload directed adaptive SMP multicores
Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
1 Memory Performance and Scalability of Intel’s and AMD’s Dual-Core Processors: A Case Study Lu Peng 1, Jih-Kwon Peir 2, Tribuvan K. Prakash 1, Yen-Kuang.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.
International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Techniques for Multicore Thermal Management Field Cady, Bin Fu and Kai Ren.
Paul D. Bryan, Jason A. Poovey, Jesse G. Beu, Thomas M. Conte Georgia Institute of Technology.
Project Overview 2014/05/05 1. Current Project “Research on Embedded Hypervisor Scheduler Techniques” ◦ Design an energy-efficient scheduling mechanism.
1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
CPU Scheduling - Multicore. Reading Silberschatz et al: Chapter 5.5.
Scaling and Packing on a Chip Multiprocessor Vincent W. Freeh Tyler K. Bletsch Freeman L. Rawson, III Austin Research Laboratory.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Performance Tuning on Multicore Systems for Feature Matching within Image Collections Xiaoxin Tang*, Steven Mills, David Eyers, Zhiyi Huang, Kai-Cheung.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
This module created with support form NSF under grant # DUE Module developed Spring 2013 by Apan Qasem Task Orchestration : Scheduling and Mapping.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Multi-Core Architectures
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.
CMT OS scheduling summary Yipkei Kwok 03/18/2008.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
E-MOS: Efficient Energy Management Policies in Operating Systems
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Bolt : Faster Reconfiguration in Operating Systems Sankaralingam Panneerselvam Michael M. Swift Nam Sung Kim University of Wisconsin, Madison, WI ATC 2015.
“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Lucas De Marchi sponsors: co-authors: Liria Matsumoto Sato
Guy Martin, OSLab, GNU Fall-09
REAL-TIME OPERATING SYSTEMS
lecture 5: CPU Scheduling
Thread & Processor Scheduling
Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Resource Aware Scheduler – Initial Results
Chapter 4: Multithreaded Programming
Milad Hashemi, Onur Mutlu, and Yale N. Patt
CSCE 212 Chapter 4: Assessing and Understanding Performance
Department of Computer Science University of California,Santa Barbara
Improved schedulability on the ρVEX polymorphic VLIW processor
Chapter 4: Threads.
Process scheduling Chapter 5.
CPU SCHEDULING.
Multithreaded Programming
Intel Core I7 Pipeline Wei-Tse Sun.
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture with Hardware Support

Contents Introduction Hardware support for LLC-miss latency LA-ACMP scheduling algorithm Evaluation and analysis

Introduction Heter-CMP: Heterogeneous Chip Multi-Processor − Composed with some big cores and some small cores Big cores: large area, high power, high performance Adapted to CPU-bound programs, serial programs, …… Small cores: Small area, low power, low performance Adapted to memory-bound programs, parallel programs, …… − Advantage Make good use of chip resources Reduce power and performance waste − Challenge Identify applications’ behaviors when executing Schedule proper programs to proper cores

Hardware Support (1) Identify programs ’ behaviors − Last level cache (LLC) miss latency LLC miss  Memory access Memory accesses induce high latency Affect programs’ efficiency when executed Can not make full use of cores’ performance Schedule rules Programs with high LLC miss latency should be scheduled to small cores Programs with low LLC miss latency should be scheduled to big cores

Hardware Support (2) Identify programs ’ behaviors − Last level Cache (LLC) miss latency Mechanism LLC miss delay is the period between miss request and miss response –UN-Overlapped, Overlapped Record LLC miss latency for each core, with hardware support

Hardware Support (3) − Implemented based on Godson-3A Record LLC miss request and response for each core, with hardware support

Hardware Support (4)

LA-ACMP Schedule Algorithm(1) LA-ACMP:Latency-Aware Asymmetry CMP − Identify heterogeneity of cores Based on Linux kernel Calculate BogoMIPS value of each core, evaluate each core’s performance − Workload assignment balance Using Scaled Load method L=N/P: each core’s scaled load –N: number of workloads being in queue –P: processor’s performance If Lmax – Lmin <= 1, workload assignment balance

LA-ACMP Schedule Algorithm(2) − LLC-delay buffer Append each run-queue with a LLC-delay buffer save each task ’ s LLC miss latency

LA-ACMP Schedule Algorithm(3) − Update LLC-delay buffer When running, clear thread’s LLC-delay value When exhausting time slice, save thread’s LLC-delay value When migrate thread from queue-A to queue-B, also migrate LLC-delay value

LA-ACMP Schedule Algorithm(4) − LA-ACMP algorithm Executed when judging balance Don’t destroy balance

Evaluate and analysis(1) Platform − Godson-3A-heter Four cores: one works with 1GHz, three work with 500MHz Using asynchronous FIFO for synchronization Benchmark − SPEC CPU2000

Evaluate and analysis(2) Applications ’ executing speedup − Compared to original OS − LLC miss rate: with 15.4% performance improvement − LLC miss delay: with 19.8% performance improvement − Application groups with higher heterogeneity get higher performance improvement The third group, with highest improvement The second group, with lowest improvement

Thanks !