Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki New_Technologies/001 CS 312 Computer Organization and Architecture
New_Technologies/002 Technologies in the recent processors CS 312 Computer Organization and Architecture
New_Technologies/003 A technology that makes one processor look as if it were multiple processors Using unutilized function-units in a pipeline datapath Invented by Intel and used for the first time in Pentium-4 (3.0Ghz of faster) Hyper-Threading (HT) CS 312 Computer Organization and Architecture
New_Technologies/004 The problem in multi function-unit and super-scalar pipeline processors Super-Scalar Multi Function-Unit Problem Number of pipes increased (e.g., 6 pipes) Resource utilization is low (“up to 35%” by Intel) Why? “Depth” of pipeline increased (20 stages in Pentium III) Pipeline flashes by branches Data dependency Needed to increase clock-cycle rate CS 312 Computer Organization and Architecture
New_Technologies/005 The problem in multi function-unit and super-scalar pipeline processors However, low resource utilization really does not make a sense We have low resource utilization while a large number of processes need it! A large number of processes running (more than 50 processes) CS 312 Computer Organization and Architecture
New_Technologies/006 Concept of HT Process A A Process B B Time FU-1 FU-2 FU-3 FU-4 Process C C Process D D Utilization = 35/96 = 36.4% FU-1 FU-2 FU-3 FU-4 New Utilization =35/48 = 72.9% All processes completed CS 312 Computer Organization and Architecture
New_Technologies/007 Concept of HT FU-1 FU-2 FU-3 FU-4 Physical Processor FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 Two (virtual) processors from OS view point Why not is this technology called “Hyper Processing”? CS 312 Computer Organization and Architecture
Bus L1 Cache New_Technologies/008 Hardware Implementation in HT Bus L1 Cache Processor Core Bus L1 Cache Processor Core L1 Cache Processor Core Virtual Processor Virtual Processor L1 Cache is shared! Process A Process B CS 312 Computer Organization and Architecture
New_Technologies/009 Concept of HT Process A A Process B B Time FU-1 FU-2 FU-3 FU-4 Process C C Process D D Utilization = 35/96 = 36.4% FU-1 FU-2 FU-3 FU-4 New Utilization =35/48 = 72.9% CS 312 Computer Organization and Architecture
Memory Address Space A process Memory Address Space New_Technologies/010 The problem in multi function-unit and super-scalar pipeline processors Data Code Data Code A process Data A process Thread 1 Thread 2 Thread 3 Thread 4 CS 312 Computer Organization and Architecture
New_Technologies/011 Concept of HT FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 Physical Processor Data Thread 1 Thread 2 Thread 3 Thread 4 Thread 1 Thread 2 Thread 3 Thread 4 Why not is this technology called “Hyper Processing”? CS 312 Computer Organization and Architecture
Bus L1 Cache New_Technologies/012 Hardware Implementation in HT Bus L1 Cache Processor Core Bus L1 Cache Processor Core L1 Cache Processor Core Virtual Processor Virtual Processor L1 Cache is shared! Process A Process B Thread A Thread B CS 312 Computer Organization and Architecture
New_Technologies/013 Problems in HT - After HT is used, only 5 ~ 30% improvement - Intel explained that this is still a good improvement, relative to the cost of HT implementation (HT requires only 5% more transistors) - HT requires a new chip set (i.e., new motherboard) and faster main memory module (Intel doesn’t have to pay for this cost, but you do) Low performance gain Security is still a problem - Some network applications use each thread to process each different client (Multithreaded network server) CS 312 Computer Organization and Architecture
New_Technologies/014 Problems in HT Multithreaded web servers (e.g., “Apache”) Web Server Browser void main (void) { while (TRUE) { accept ( ……. ); beginthread (…… ); } T1T1 T2T2 T3T3 Data CS 312 Computer Organization and Architecture
New_Technologies/015 The problem in multi function-unit and super-scalar pipeline processors Monitor access frequency to memory address owned by a process executing SSL encryption Not easy to decode this information for actual encryption cracking Proven to be logically possible At least to understand what is going on in your neighbor threads CS 312 Computer Organization and Architecture
Motherboard New_Technologies/016 Other two technologies used in Intel’s processor - MMX (Multiple Math or Matrix Math eXtension) (improved from MMX, first introduced in Pentium III) SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol Processor 1 Processor 2 (Dual-Processor Motherboard) Main Memory L1 cache (first introduced in Pentium processor) Uniform Memory Access (UMA) parallel architecture - SSE (Streaming SIMD Extension) parallel instructions CS 312 Computer Organization and Architecture
Motherboard New_Technologies/017 Other two technologies used in Intel’s processor - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol Processor 1 Processor 2 (Dual-Processor Motherboard) Main Memory L1 cache (first introduced in Pentium processor) Read CS 312 Computer Organization and Architecture
Motherboard New_Technologies/018 Other two technologies used in Intel’s processor - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol Processor 1 Processor 2 (Dual-Processor Motherboard) Main Memory L1 cache (first introduced in Pentium processor) Cache Coherency Problem - MESI cache coherence protocol is a solution for this problem Read Modified CS 312 Computer Organization and Architecture
New_Technologies/019 SIMD Vector Computer: Cray (multiple parallel processors on a mother board) CS 312 Computer Organization and Architecture