Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

Slides:



Advertisements
Similar presentations
Streaming SIMD Extension (SSE)
Advertisements

EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Pentium microprocessors CAS 133 – Basic Computer Skills/MS Office CIS 120 – Computer Concepts I Russ Erdman.
Department of Computer Science Southern Illinois University Edwardsville Spring, 2010 Dr. Hiroshi Fujinoki CS 547/490 Network.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
7-Aug-15 (1) CSC Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32.
Lect 13-1 Lect 13: and Pentium. Lect Microprocessor Family  Microprocessor  Introduced in 1989  High Integration  On-chip 8K.
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Computer System Architectures Computer System Software
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
Parallelism Processing more than one instruction at a time. Pipelining
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Chapter 2 (PART 1) Light-Weight Process (Threads) Department of Computer Science Southern Illinois University Edwardsville Summer, 2004 Dr. Hiroshi Fujinoki.
Datapath Architecture Department of Computer Science Southern Illinois University Edwardsville Fall, 2015 Dr. Hiroshi Fujinoki
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
History of Microprocessor MPIntroductionData BusAddress Bus
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Parallel Computers Organizations and Architecture Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
CS 312 Computer Architecture Memory Basics Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki
CS 447 Networks and Data Communication ARP (Address Resolution Protocol) for the Internet Department of Computer Science Southern Illinois University Edwardsville.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Department of Computer Science Southern Illinois University Edwardsville Spring, 2010 Dr. Hiroshi Fujinoki IPC1.PPT/001 Inter-Process.
Hyper Threading Technology. Introduction Hyper-threading is a technology developed by Intel Corporation for it’s Xeon processors with a 533 MHz system.
Shashwat Shriparv InfinitySoft.
Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.
Computer performance issues* Pipelines, Parallelism. Process and Threads.
EKT303/4 Superscalar vs Super-pipelined.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
The Pentium Series CS 585: Computer Architecture Summer 2002 Tim Barto.
Lecture 3 Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
OS Boot Sequence and File System (implication to “Boot Sector Viruses”) Department of Computer Science Southern Illinois University Edwardsville Spring,
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Processor Level Parallelism 1
COMP 740: Computer Architecture and Implementation
Multi-core processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
CS 286: Memory Paging and Virtual Memory
CS 286 Computer Organization and Architecture
Hyperthreading Technology
Operating Systems (CS 340 D)
Computer Architecture Lecture 4 17th May, 2006
Coe818 Advanced Computer Architecture
Operating Systems (CS 340 D)
Computer Evolution and Performance
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
EE 4xx: Computer Architecture and Performance Programming
CS 286 Computer Organization and Architecture
CS 286 Computer Architecture & Organization
CS 286 Computer Organization and Architecture
Department of Computer Science
Department of Computer Science
Lecture 3 (Microprocessor)
Light-Weight Process (Threads)
CSE 502: Computer Architecture
Presentation transcript:

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki New_Technologies/001 CS 312 Computer Organization and Architecture

New_Technologies/002 Technologies in the recent processors CS 312 Computer Organization and Architecture

New_Technologies/003 A technology that makes one processor look as if it were multiple processors Using unutilized function-units in a pipeline datapath Invented by Intel and used for the first time in Pentium-4 (3.0Ghz of faster) Hyper-Threading (HT) CS 312 Computer Organization and Architecture

New_Technologies/004 The problem in multi function-unit and super-scalar pipeline processors Super-Scalar Multi Function-Unit Problem Number of pipes increased (e.g., 6 pipes) Resource utilization is low (“up to 35%” by Intel) Why? “Depth” of pipeline increased (20 stages in Pentium III) Pipeline flashes by branches Data dependency Needed to increase clock-cycle rate CS 312 Computer Organization and Architecture

New_Technologies/005 The problem in multi function-unit and super-scalar pipeline processors However, low resource utilization really does not make a sense We have low resource utilization while a large number of processes need it! A large number of processes running (more than 50 processes) CS 312 Computer Organization and Architecture

New_Technologies/006 Concept of HT Process A A Process B B Time FU-1 FU-2 FU-3 FU-4 Process C C Process D D Utilization = 35/96 = 36.4% FU-1 FU-2 FU-3 FU-4 New Utilization =35/48 = 72.9% All processes completed CS 312 Computer Organization and Architecture

New_Technologies/007 Concept of HT FU-1 FU-2 FU-3 FU-4 Physical Processor FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 Two (virtual) processors from OS view point Why not is this technology called “Hyper Processing”? CS 312 Computer Organization and Architecture

Bus L1 Cache New_Technologies/008 Hardware Implementation in HT Bus L1 Cache Processor Core Bus L1 Cache Processor Core L1 Cache Processor Core Virtual Processor Virtual Processor L1 Cache is shared! Process A Process B CS 312 Computer Organization and Architecture

New_Technologies/009 Concept of HT Process A A Process B B Time FU-1 FU-2 FU-3 FU-4 Process C C Process D D Utilization = 35/96 = 36.4% FU-1 FU-2 FU-3 FU-4 New Utilization =35/48 = 72.9% CS 312 Computer Organization and Architecture

Memory Address Space A process Memory Address Space New_Technologies/010 The problem in multi function-unit and super-scalar pipeline processors Data Code Data Code A process Data A process Thread 1 Thread 2 Thread 3 Thread 4 CS 312 Computer Organization and Architecture

New_Technologies/011 Concept of HT FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 FU-1 FU-2 FU-3 FU-4 Physical Processor Data Thread 1 Thread 2 Thread 3 Thread 4 Thread 1 Thread 2 Thread 3 Thread 4 Why not is this technology called “Hyper Processing”? CS 312 Computer Organization and Architecture

Bus L1 Cache New_Technologies/012 Hardware Implementation in HT Bus L1 Cache Processor Core Bus L1 Cache Processor Core L1 Cache Processor Core Virtual Processor Virtual Processor L1 Cache is shared! Process A Process B Thread A Thread B CS 312 Computer Organization and Architecture

New_Technologies/013 Problems in HT - After HT is used, only 5 ~ 30% improvement - Intel explained that this is still a good improvement, relative to the cost of HT implementation (HT requires only 5% more transistors) - HT requires a new chip set (i.e., new motherboard) and faster main memory module (Intel doesn’t have to pay for this cost, but you do) Low performance gain Security is still a problem - Some network applications use each thread to process each different client (Multithreaded network server) CS 312 Computer Organization and Architecture

New_Technologies/014 Problems in HT Multithreaded web servers (e.g., “Apache”) Web Server Browser void main (void) { while (TRUE) { accept ( ……. ); beginthread (…… ); } T1T1 T2T2 T3T3 Data CS 312 Computer Organization and Architecture

New_Technologies/015 The problem in multi function-unit and super-scalar pipeline processors Monitor access frequency to memory address owned by a process executing SSL encryption Not easy to decode this information for actual encryption cracking Proven to be logically possible At least to understand what is going on in your neighbor threads CS 312 Computer Organization and Architecture

Motherboard New_Technologies/016 Other two technologies used in Intel’s processor - MMX (Multiple Math or Matrix Math eXtension) (improved from MMX, first introduced in Pentium III) SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol Processor 1 Processor 2 (Dual-Processor Motherboard) Main Memory L1 cache (first introduced in Pentium processor) Uniform Memory Access (UMA) parallel architecture - SSE (Streaming SIMD Extension) parallel instructions CS 312 Computer Organization and Architecture

Motherboard New_Technologies/017 Other two technologies used in Intel’s processor - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol Processor 1 Processor 2 (Dual-Processor Motherboard) Main Memory L1 cache (first introduced in Pentium processor) Read CS 312 Computer Organization and Architecture

Motherboard New_Technologies/018 Other two technologies used in Intel’s processor - MMX (Multiple Math or Matrix Math eXtension) - SSE (Streaming SIMD Extension) parallel instructions (improved from MMX, first introduced in Pentium III) SIMD (Single Instruction stream over Multiple Data stream) parallel instructions UMA multiprocessor architecture and MESI Cache Coherence protocol Processor 1 Processor 2 (Dual-Processor Motherboard) Main Memory L1 cache (first introduced in Pentium processor) Cache Coherency Problem - MESI cache coherence protocol is a solution for this problem Read Modified CS 312 Computer Organization and Architecture

New_Technologies/019 SIMD Vector Computer: Cray (multiple parallel processors on a mother board) CS 312 Computer Organization and Architecture