The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 6: Multicore Systems

2013/06/10 Yun-Chung Yang Kandemir, M., Yemliha, T. ; Kultursay, E. Pennsylvania State Univ., University Park, PA, USA Design Automation Conference (DAC),

Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

Multithreading processors Adapted from Bhuyan, Patterson, Eggers, probably others.

Thoughts on Shared Caches Jeff Odom University of Maryland.

Computer Systems/Operating Systems - Class 8

1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.

Colorado Computer Architecture Research Group Architectural Support for Enhanced SMT Job Scheduling Alex Settle Joshua Kihm Andy Janiszewski Daniel A.

CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.

1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)

1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)

Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.

Chapter Hardwired vs Microprogrammed Control Multithreading

Chapter 17 Parallel Processing.

EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.

1 Lecture 12: ILP Innovations and SMT Today: ILP innovations, SMT, cache basics (Sections 3.5 and supplementary notes)

How Multi-threading can increase on-chip parallelism

Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

EECC722 - Shaaban #1 Lec # 4 Fall Operating System Impact on SMT Architecture The work published in “An Analysis of Operating System Behavior.

Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.

Computer System Architectures Computer System Software

Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen.

Jiang Lin 1, Qingda Lu 2, Xiaoning Ding 2, Zhao Zhang 1, Xiaodong Zhang 2, and P. Sadayappan 2 Gaining Insights into Multi-Core Cache Partitioning: Bridging.

Multi-core architectures. Single-core computer Single-core CPU chip.

Multi-Core Architectures

1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.

Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.

Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.

Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.

1 Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Processor Level Parallelism 1

Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.

COMP 740: Computer Architecture and Implementation

Electrical and Computer Engineering

Microarchitecture.

Jonathan Walpole Computer Science Portland State University

Simultaneous Multithreading

Simultaneous Multithreading

Multi-core processors

Computer Structure Multi-Threading

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

Presented by Sarath Kumar T S

Hyperthreading Technology

Lecture: SMT, Cache Hierarchies

Threads, SMP, and Microkernels

Levels of Parallelism within a Single Processor

Computer Architecture Lecture 4 17th May, 2006

Hardware Multithreading

Lecture: SMT, Cache Hierarchies

Symmetric Multiprocessing (SMP)

Simulation of computer system

Lecture: SMT, Cache Hierarchies

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

CSC3050 – Computer Architecture

Lecture: SMT, Cache Hierarchies

Levels of Parallelism within a Single Processor

Hardware Multithreading

Lecture 22: Multithreading

Presentation transcript:

The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University of Cambridge

Simultaneous Multithreading (SMT) An extension to dynamic out-of-order superscalar processors Very fine grain hardware multithreading Converts thread level parallelism into instruction level parallelism Latency hiding Functional unit utilisation Mutual effect of threads

Intel® Hyper-Threading® First commercial implementation of SMT 2 heavyweight threads Combination of static and dynamic sharing of resources Now standard on desktop and newer mobile Pentium 4 chips Originally marketed as two processors for the price of one

Multiprogramming Performance SPEC CPU2000 benchmark Run cross-product of pairs Compare to non-SMT and to SMP Use processor performance counters

Memory hierarchy interactions Better latency hiding => more tolerant of cache misses Cooperating threads can share data in a common cache  Multiple threads, shared cache => more cache contention Want to avoid thrashing No explicit partitioning

How can the OS help? Avoid inter-thread aliasing Linux offset user stacks Use cunning page placement Try to ensure core “critical” working set stays resident in the (L2) cache [Lo ISCA 98] Know when you’re fighting a losing battle Back off to single threaded execution

The Plan Currently using processor performance counters to influence scheduling Want to use performance counters and statistics from OS memory management to influence page placement Keep virtual addressing abstraction for programmer Support legacy applications