Download presentation
Presentation is loading. Please wait.
1
CS 286 Computer Organization and Architecture
Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Fall, 2018 Dr. Hiroshi Fujinoki New_Technologies/001
2
CS 286 Computer Organization and Architecture
Technologies in the recent processors New_Technologies/002
3
CS 286 Computer Organization and Architecture
Hyper-Threading (HT) A technology that makes one processor look as if it were multiple processors Using unutilized function-units in a pipeline datapath Invented by Intel and used for the first time in Pentium-4 (3.0Ghz of faster) New_Technologies/003
4
CS 286 Computer Organization and Architecture
The problem in multi function-unit and super-scalar pipeline processors Problem Super-Scalar Resource utilization is low (“up to 35%” by Intel) Needed to increase clock-cycle rate Why? Multi Function-Unit “Depth” of pipeline increased (20 stages in Pentium III) Pipeline flashes by branches Data dependency Number of pipes increased (e.g., 6 pipes) New_Technologies/004
5
CS 286 Computer Organization and Architecture
The problem in multi function-unit and super-scalar pipeline processors However, low resource utilization really does not make a sense A large number of processes running (more than 50 processes) We have low resource utilization while a large number of processes need it! New_Technologies/005
6
CS 286 Computer Organization and Architecture
Concept of HT Utilization = 35/96 = 36.4% Time FU-1 FU-2 FU-3 FU-4 Process A A Process B B Process C C Process D D All processes completed FU-1 FU-2 FU-3 FU-4 New Utilization =35/48 = 72.9% New_Technologies/006
7
CS 286 Computer Organization and Architecture
Concept of HT Why not is this technology called “Hyper Processing”? Two (virtual) processors from OS view point FU-1 FU-2 FU-3 FU-4 Physical Processor FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 New_Technologies/007
8
CS 286 Computer Organization and Architecture
Hardware Implementation in HT Bus L1 Cache Processor Core Processor Core L1 Cache Process A Process B Bus L1 Cache Virtual Processor L1 Cache is shared! Bus New_Technologies/008
9
CS 286 Computer Organization and Architecture
Concept of HT Utilization = 35/96 = 36.4% Time FU-1 FU-2 FU-3 FU-4 Process A A Process B B Process C C Process D D FU-1 FU-2 FU-3 FU-4 New Utilization =35/48 = 72.9% New_Technologies/009
10
CS 286 Computer Organization and Architecture
The problem in multi function-unit and super-scalar pipeline processors Memory Address Space Memory Address Space A process Data Code Data A process Thread 1 Thread 2 Thread 3 Thread 4 Data Code A process New_Technologies/010
11
CS 286 Computer Organization and Architecture
Concept of HT Why not is this technology called “Hyper Processing”? Thread 1 Thread 2 Thread 3 Thread 4 FU-1 FU-2 FU-3 FU-4 Physical Processor FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 Data Thread 1 Thread 2 Thread 3 Thread 4 New_Technologies/011
12
CS 286 Computer Organization and Architecture
Hardware Implementation in HT Bus L1 Cache Processor Core Processor Core L1 Cache Process A Process B Thread A Thread B Bus L1 Cache Virtual Processor L1 Cache is shared! Bus New_Technologies/012
13
CS 286 Computer Organization and Architecture
Problems in HT Low performance gain Security is still a problem - After HT is used, only 5 ~ 30% improvement - Intel explained that this is still a good improvement, relative to the cost of HT implementation (HT requires only 5% more transistors) - HT requires a new chip set (i.e., new motherboard) and faster main memory module (Intel doesn’t have to pay for this cost, but you do) - Some network applications use each thread to process each different client (Multithreaded network server) New_Technologies/013
14
CS 286 Computer Organization and Architecture
Problems in HT Multithreaded web servers (e.g., “Apache”) Browser void main (void) { while (TRUE) { accept ( ……. ); beginthread (…… ); } Browser Web Server Browser T1 T3 T2 Data New_Technologies/014
15
CS 286 Computer Organization and Architecture
The problem in multi function-unit and super-scalar pipeline processors Monitor access frequency to memory address owned by a process executing SSL encryption Not easy to decode this information for actual encryption cracking Proven to be logically possible At least to understand what is going on in your neighbor threads New_Technologies/015
16
CS 286 Computer Organization and Architecture
Other two technologies used in Intel’s processor SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol (first introduced in Pentium processor) - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) Motherboard Uniform Memory Access (UMA) parallel architecture Processor 1 Processor 2 Main Memory L1 cache (Dual-Processor Motherboard) New_Technologies/016
17
CS 286 Computer Organization and Architecture
Other two technologies used in Intel’s processor SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol (first introduced in Pentium processor) - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) Motherboard Processor 1 Processor 2 Main Memory L1 cache (Dual-Processor Motherboard) Read New_Technologies/017
18
CS 286 Computer Organization and Architecture
Other two technologies used in Intel’s processor SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol (first introduced in Pentium processor) - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) Motherboard Modified Processor 1 Processor 2 - MESI cache coherence protocol is a solution for this problem Main Memory L1 cache (Dual-Processor Motherboard) Cache Coherency Problem Read New_Technologies/018
19
CS 286 Computer Organization and Architecture
SIMD Vector Computer: Cray (multiple parallel processors on a mother board) New_Technologies/019
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.