Hyperthread Support in OpenVMS V8.3

Slides:



Advertisements
Similar presentations
1 Operating Systems and Protection CS Goals of Today’s Lecture How multiple programs can run at once  Processes  Context switching  Process.
Advertisements

Avishai Wool lecture Priority Scheduling Idea: Jobs are assigned priorities. Always, the job with the highest priority runs. Note: All scheduling.
Threads vs. Processes April 7, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Chapter 8 Operating System Support
Chapter 1 and 2 Computer System and Operating System Overview
Computer Organization and Architecture
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Processor Architecture
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
Threads. Readings r Silberschatz et al : Chapter 4.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
Processor Level Parallelism 1
Tutorial 2: Homework 1 and Project 1
Getting the Most out of Scientific Computing Resources
Getting the Most out of Scientific Computing Resources
Introduction to Operating Systems
Operating System Overview
Memory Management.
Jonathan Walpole Computer Science Portland State University
lecture 5: CPU Scheduling
Cache Memory and Performance
Memory COMPUTER ARCHITECTURE
CS 6560: Operating Systems Design
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Scheduler activations
Mechanism: Limited Direct Execution
Simultaneous Multithreading
Computer Structure Multi-Threading
INTEL HYPER THREADING TECHNOLOGY
Lecture 21 Concurrency Introduction
Chapter 4 Threads.
Chapter 4: Threads 羅習五.
Introduction to Operating System (OS)
Swapping Segmented paging allows us to have non-contiguous allocations
Architecture Background
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Hyperthreading Technology
Computer Architecture: Multithreading (I)
Introduction to Operating Systems
Levels of Parallelism within a Single Processor
Processor Management Damian Gordon.
Chapter 15, Exploring the Digital Domain
Processor Fundamentals
Direct Memory Access Disk and Network transfers: awkward timing:
Operating Systems.
Lecture Topics: 11/1 General Operating System Concepts Processes
Threads Chapter 4.
Chapter 6: CPU Scheduling
/ Computer Architecture and Design
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Levels of Parallelism within a Single Processor
Thomas E. Anderson, Brian N. Bershad,
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
Threads vs. Processes Hank Levy 1.
Lecture 3: Main Memory.
Virtual Memory: Working Sets
CSE 153 Design of Operating Systems Winter 2019
CS703 – Advanced Operating Systems
Processor Management Damian Gordon.
COMP755 Advanced Operating Systems
Cache writes and examples
COT 5611 Operating Systems Design Principles Spring 2014
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
CS Introduction to Operating Systems
Presentation transcript:

Hyperthread Support in OpenVMS V8.3 9/17/2018 Hyperthread Support in OpenVMS V8.3 What to do about Montecito? HP_presentation_template

Pre-Summary We added some features to help you manage hyperthreads SHOW CPU/BRIEF displays thread info SET CPU/NOCOTHREAD [SYSTEST]HTHREAD.EXE We added some features to reduce hyperthreads hurting or confusing you Scheduler change Accounting change You need to experiment with your own application mix to see if hyperthreads help you September 17, 2018

Definitions of terms Processor Core Hyperthread CPU A chip or package Core A ‘thing’ within a processor that physically executes programs Hyperthread A ‘thing’ within a core that logically executes programs CPU The OpenVMS abstraction for a ‘thing’ that executes programs Thread of execution Software concept of what a CPU executes September 17, 2018

What is “Hyperthreading” vs “Dual Core”? Both are features of new “Montecito” Itanium chips Both abstracted as CPUs on OpenVMS Very different in implementation September 17, 2018

Dual Core Two (nearly) complete CPUs on one chip Think two older CPU chips glued together :-) Separate cache, separate processing units, separate state. (Share bus interface) Both cores executing simultaneously September 17, 2018

Montecito Micrograph 2 Way 1MB L2I Multi-threading Power Management/ 9/17/2018 Montecito Micrograph 2 Way Multi-threading 1MB L2I Dual- core Power Management/ Frequency Boost (Foxton) 2x12MB L3 caches with Pellston Soft Error Detection/ Correction For people who like pictures and chip micrographs. Thing to note: left and right halves are flipped mirror images of each other. Arbiter September 17, 2018 HP_presentation_template

Dual Cores 9/17/2018 HP_presentation_template If you prefer block diagrams…thing to note is that everything is duplicated except system interface logic. September 17, 2018 HP_presentation_template

Hyperthreading Hyperthread: A set of state (e.g. user registers, control registers, IP, etc) in a core Shares execution resources with other threads Only one hyperthread active (i.e. executing a program) at once on Montecito When hyperthread blocks, other hyperthread activates Also swaps on a timer September 17, 2018

Montecito Multi-threading 9/17/2018 Montecito Multi-threading Serial Execution Ai Idle Ai+1 Bi Idle Bi+1 Montecito Multi-threaded Execution Ai Idle Ai+1 “A” is one thread of execution, “B” is another thread of execution. On the Montecito multithread section, the top is one hyperthread the bottom line is another hyperthread. The serial execution section assumes that there is only a single hyperthread (or an un-threaded core). Thread A executes both part I and part i+1 and the OS swaps in thread B and it executes its part i and part i+1. The Montecito multithread section assumes that one hyperthread is ready to execute thread of execution A and the other is ready to execute thread of execution B. Since A’s stall time can overlap B’s execution time, we get increased performance. Bi Bi+1 Multi-threading decreases stalls and increases performance September 17, 2018 HP_presentation_template

Dynamic Thread Switching 9/17/2018 Dynamic Thread Switching Speculate that a long latency event will stall execution L3 miss Uncached accesses Time outs ensure fairness hint@pause gives software control OS has no knowledge or control of hyperthread switches September 17, 2018 HP_presentation_template

Hyperthread Abstraction in VMS 9/17/2018 Hyperthread Abstraction in VMS Reminder: 1 processor (or package or chip) has 2 Cores 4 Threads Each hyperthread appears in OpenVMS as a CPU CPUs that share the same cores are called “Cothread CPUs” Note: Cores that share a processor (or package or chip) are not named or treated differently September 17, 2018 HP_presentation_template

Identifying CoThread CPUs on OpenVMS $ show cpu/brief System: XXXXXX, HP rx4640 (1.40GHz/12.0MB) CPU 0 State: RUN CPUDB: 8202A000 Handle: 00005D70 Owner: 000004C8 Current: 000004C8 Partition 0 Cothd: 8 CPU 1 State: RUN CPUDB: 820FDF80 Handle: 00005E80 Cothd: 9 CPU 2 State: RUN CPUDB: 820FFC80 Handle: 00005F90 Cothd: 10 CPU 3 State: RUN CPUDB: 82101A80 Handle: 000060A0 Cothd: 11 September 17, 2018

Tradeoffs with Hyperthreads: Basics One core with two threads MAY perform better than one core with one thread (but not always) One core with two threads NEVER performs as well as two cores September 17, 2018

Montecito Multi-threading 9/17/2018 Montecito Multi-threading Serial Execution Ai Idle Ai+1 Bi Idle Bi+1 Montecito Multi-threaded Execution Ai Idle Ai+1 “A” is one thread of execution, “B” is another thread of execution. On the Montecito multithread section, the top is one hyperthread the bottom line is another hyperthread. The serial execution section assumes that there is only a single hyperthread (or an un-threaded core). Thread A executes both part I and part i+1 and the OS swaps in thread B and it executes its part i and part i+1. The Montecito multithread section assumes that one hyperthread is ready to execute thread of execution A and the other is ready to execute thread of execution B. Since A’s stall time can overlap B’s execution time, we get increased performance. Bi Bi+1 Multi-threading decreases stalls and increases performance September 17, 2018 HP_presentation_template

Montecito Multi-threading (No Stalls) 9/17/2018 Montecito Multi-threading (No Stalls) Serial Execution Ai Ai+1 Bi Bi+1 Montecito Multi-threaded Execution Ai Ai+1 But suppose A and B don’t have much stall time. For example they are carefully-designed to stay within their cache. In that case, the two hyperthreads swap because of the timer rather than because of stalling. Since the swap takes some time (the yellow), then serial execution could be faster. (Note that we are not assuming any time for the OS to swap the threads in the serial execution case. Not realistic, but illustrative only) Bi Bi+1 September 17, 2018 HP_presentation_template

Multi-threading vs Two Cores 9/17/2018 Multi-threading vs Two Cores Execution on Two Cores Ai Ai+1 Bi Bi+1 Montecito Multi-threaded Execution Ai Ai+1 We said two cores is always faster. That’s because two cores can execute A and B simultaneously. (There is less difference between these cases if we used the stalling version of the threads) Bi Bi+1 September 17, 2018 HP_presentation_template

VMS support for Hyperthreading Three categories of support Managing/getting info Reducing “waste” of hyperthread cycles Scheduling September 17, 2018

Managing/Getting Info Hyperthread to CPU mapping First thread of all cores followed by second threads Ex: 2 processor system. CPU 0,1,2,3 are all separate cores. CPU 4,5,6,7 are cothreads of 0,1,2,3 SHOW CPU/BRIEF and /FULL Notes CPU that is the Cothread of the displayed CPU SET CPU/[NO]COTHREAD Stops one of the cothreads on the core associated with this CPU Accounting Only charge a process ½ the CPU time if CPUs cothread is busy September 17, 2018

Managing Efi command: cpuconfig threads on/off [systest]hthread.exe Supported part of efi Requires two resets: one to get to efi; one to make thread command take effect. [systest]hthread.exe Like RADCHECK, an unsupported but helpful little utility Check and modify firmware state of hyperthreading $hthread –show $hthread –on $ hthread –off Change after next reboot (i.e. only a single reset) September 17, 2018

Reducing Hyperthread Cycle Waste 9/17/2018 Reducing Hyperthread Cycle Waste Main point: A hyperthread spinning in halt or idle still uses cycles that its cothread might have used Idle loop hint@pause between each check for busy Power saver mode as usual STOP/CPU hint@pause while halted Future possibilities: hint@pause while spinning on locks? Tradeoffs abound! Reduce waste? Yes because if a hyperthread is doing something, even spinning in a loop, its cothread can not be doing useful work. Thus we want to reduce spinning. September 17, 2018 HP_presentation_template

Scheduler Changes Two cores always better than two hyperthreads on the same core so: Attempt to schedule processes on CPUs without a busy cothread Ties in with waste reduction since an idle hyperthread will give up its cycles to its cothread September 17, 2018

Question you are too polite to ask 9/17/2018 Question you are too polite to ask Why didn’t you change the scheduler to make good use of hyperthreads? Answer: We don’t know how. Seriously, it is VERY application mix dependent. Caution! This is a build/animation slide. For humor… September 17, 2018 HP_presentation_template

Tradeoffs with Hyperthreading Imagine you want to make best use of hyperthreads What threads of execution do you run on same core? September 17, 2018

Who shares a core? Threads that share the same memory space (e.g. kernel threads within a process) They might share some cache and require fewer cache fills and thus perform better! But if they stall less, hyperthreads are less advantageous! Threads that have nothing to do with each other More cache misses so threads help more But more cache misses means poorer individual performance! Clearly there is a tradeoff somewhere, but we can’t make it automatically September 17, 2018

My recommendation Even without threads, Montecito works well Try it with threads off; you will likely be happy Experiment with processes on threads Use affinity to group different processes on cothreads, or to avoid cothreads Experiment with fastpath CPUs on threads. Do you get better throughput spreading I/O across all threads or only using one thread per core? September 17, 2018

Other features - Soon ar.ruc NUMA Power control September 17, 2018

Other features further out User mode rfi Might allow one to go to an instruction within a bundle Useful for AST returns (maybe?) September 17, 2018

9/17/2018 HP_presentation_template