Scheduling Issues on a Heterogeneous Single ISA Multicore IRISA, France Robert Guziolowski, André Seznec. Contact: 1. M. Becchi and P.

Slides:



Advertisements
Similar presentations
Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.
Advertisements

DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Hikmet Aras
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Lecture 6: Multicore Systems
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.
Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures J. Winter and D. Albonesi, Cornell University International Conference on Dependable.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems (m, k)-firm tasks and QoS enhancement.
1 OS Driven Core Selection for HCMP Systems Anand Bhatia, Rishkul Kulkarni.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter Hardwired vs Microprogrammed Control Multithreading
Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.
EECC722 - Shaaban #1 Lec # 2 Fall Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1996.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,
CS3350B Computer Architecture Winter 2015 Performance Metrics I Marc Moreno Maza
SYNAR Systems Networking and Architecture Group Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures Daniel Shelepov and Alexandra.
Energy saving in multicore architectures Assoc. Prof. Adrian FLOREA, PhD Prof. Lucian VINTAN, PhD – Research.
Multi-core architectures. Single-core computer Single-core CPU chip.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.
Variation Aware Application Scheduling in Multi-core Systems Lavanya Subramanian, Aman Kumar Carnegie Mellon University {lsubrama,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
CMT OS scheduling summary Yipkei Kwok 03/18/2008.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
2013/12/09 Yun-Chung Yang Partitioning and Allocation of Scratch-Pad Memory for Priority-Based Preemptive Multi-Task Systems Takase, H. ; Tomiyama, H.
Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Single-ISA Heterogeneous Multi-Core Architecture Zvika Guz November, 2004.
Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
EKT303/4 Superscalar vs Super-pipelined.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Performance Model for Future Multicore Process Designs Yipkei Kwok 02/06/2008.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
CS Lecture 20 The Case for a Single-Chip Multiprocessor
Simultaneous Multithreading
Multi-core processors
Computer Structure Multi-Threading
Standards and Patterns for Dynamic Resource Management
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Improved schedulability on the ρVEX polymorphic VLIW processor
Computer Architecture Lecture 4 17th May, 2006
OverView of Scheduling
Simulation of computer system
Coe818 Advanced Computer Architecture
Overview Prof. Eric Rotenberg
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Presentation transcript:

Scheduling Issues on a Heterogeneous Single ISA Multicore IRISA, France Robert Guziolowski, André Seznec. Contact: 1. M. Becchi and P. Crowley. Dynamic thread assignment on heterogeneous multiprocessor architectures. Proceedings of the 3rd conference on Computing frontiers, pages 29–40, T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. Proceedings of the 30 th Annual International Symposium on Computer Architecture, IEEE CS Press, pages 336–349, S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 111–122, SESC, References This work was partially supported by an Intel research grant, an Intel research equipment and by the European Commission in the context of SARC integrated project. We investigate the following conditions as implemented as different exchangers: Scheduling processes on a heterogeneous multicore processor is a much more complicated task, as the differences in the cores can appear in: number of Functional Units, L1 data and instruction caches, hierarchy of the lower level caches, issue width, and many more. Thus, a simple round-robin scheduling may not be sufficient. Additional dynamic mechanisms have to be used in order to better utilize all the available cores. Our objective is to investigate possible scheduling mechanisms for single ISA heterogeneous multicore. CPU core A core B RR P1P1 P2P2 PnPn … ? Future We plan to investigate other exchangers (presented above) as well as use a mix of exchangers between the core classes. We also want to introduce SMT cores into our research. Moreover, to have more clear view of the shared caches effects, we plan to intro- duce fair cache sharing and partitioning into our simulations, as proposed in [3]. We plan to investigate scheduling issues with the use of parallel workloads. Results Testbed Exchangers ExchangerCompared values IPCValues of IPC of the processes [1] IPC%Percent usage of the theoretically achievable IPC of the core (exploiting phase-behavior [2]) ONEILRatios of the oldest not-executed instructions of type load which cause the core to stall (on-going work) othersBranch predictor characteristics, utilization of FUs in SMT cores, composition of above parameters, etc. Scheduling processes on a single core or homogeneous multicore processors is a relatively easy task, using for instance a round-robin (RR) algorithm. CPU core RR P1P1 P2P2 PnPn … CPU core RR P1P1 P2P2 PnPn … Known issues and objective Processes are scheduled on the available cores using a global round-robin scheduler with the period schedule_period. By exchanger we define the mechanisms which migrates the processes between the core classes when specific conditions are met. Exchangers check the conditions and migrate selected processes between the core classes with the period exchange_period. Of course, exchange_period < schedule_period. Proposed mechanism is based on [1], but it allows investigating architectures with higher heterogeneity more easily. Proposed mechanism We define a core class as a non- empty set of the cores having the same characteristics. RR P1P1 P2P2 PnPn … CPU … core A CoreClass0 core N CoreClassN core B CoreClass1 Exchanger0Exchanger1 ExchangerN All simulations are conducted with the use of SESC simulator [4] and SPEC2000 benchmarks. 3 types of cores (A, B, and C) with different characteristics gathered into 3 core classes (depending on the configuration). L3 cache AA L2 cache AA L3 cache AA L2 cache BBBB L3 cache AA L2 cache BB CCCC L3 cache A L2 cache CCCC BBBB 4 test configuration of cores 4A 1A4B4C 2A2B4C 2A4B