Is SC + ILP = RC? Chris Gniady, Babak Falsafr, and T.N. Vijaykumar

Slides:

Advertisements

Similar presentations

The University of Adelaide, School of Computer Science

Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

INSTRUCTION-LEVEL PARALLEL PROCESSORS

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.

Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.

A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.

PUMA 2 : Bridging the CPU/Memory Gap through Prediction & Speculation Babak Falsafi Team Members: Chi Chen, Chris Gniady, Jangwoo Kim, Tom Wenisch, Se-Hyun.

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,

Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.

Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University

From Sequences of Dependent Instructions to Functions An Approach for Improving Performance without ILP or Speculation Ben Rudzyn.

The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.

CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.

CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.

Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.

Multiscalar processors

Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.

Requirements Determine processor core Determine the number of hardware profiles and the benefits of each profile Determine functionality of each profile.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Previously Fetch execute cycle Pipelining and others forms of parallelism Basic architecture This week we going to consider further some of the principles.

Performance of the Shasta distributed shared memory protocol Daniel J. Scales Kourosh Gharachorloo 創造情報学専攻 M グェントアンドゥク.

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:

Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.

Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.

MULTIPLEX: UNIFYING CONVENTIONAL AND SPECULATIVE THREAD-LEVEL PARALLELISM ON A CHIP MULTIPROCESSOR Presented by: Ashok Venkatesan Chong-Liang Ooi, Seon.

Cache (Memory) Performance Optimization. Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the miss rate.

1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.

An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Multiprocessors – Locks

DO NOW` 1. Write the equation for the speed of light. 2. What two ways does light behave? DO NOW` 1. Write the equation for the speed of light. 2. What.

Translation Lookaside Buffer

Processor support devices Part 2: Caches and the MESI protocol

Software Coherence Management on Non-Coherent-Cache Multicores

Multilevel Memories (Improving performance using alittle “cash”)

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Atomic Operations in Hardware

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Atomic Operations in Hardware

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

RIC: Relaxed Inclusion Caches for Mitigating LLC Side-Channel Attacks

Challenges in Concurrent Computing

Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

The University of Adelaide, School of Computer Science

Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.

The University of Adelaide, School of Computer Science

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CMPT 886: Computer Architecture Primer

Hardware Multithreading

Presented to CS258 on 3/12/08 by David McGrogan

Chapter 6 Memory System Design

Cooperative Caching, Simplified

EXAMPLE TEXT CORE DIAGRAMS EXAMPLE TEXT EXAMPLE TEXT EXAMPLE TEXT

Adapted from slides by Sally McKee Cornell University

Translation Lookaside Buffer

Multiprocessor Highlights

Hybrid Transactional Memory

CS 3410, Spring 2014 Computer Science Cornell University

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 17 Multiprocessors and Thread-Level Parallelism

Problems with Locks Andrew Whitaker CSE451.

The University of Adelaide, School of Computer Science

Is SC + ILP = RC? C. Gniady, B. Falsafi, and T.N. Vijaykumar - Purdue

Lecture 17 Multiprocessors and Thread-Level Parallelism

Virtual Memory 1 1.

Presentation transcript:

Is SC + ILP = RC? Chris Gniady, Babak Falsafr, and T.N. Vijaykumar Presented By Jacob Harer

Idea Use large amounts of memory ILP to increase speed in SC Relax all memory order speculatively in each core. Appear to all other cores to be non speculative

Implementation Need Speculate on both loads and stores Large speculative state No additional overhead Well behaved programs Store all instructions in a Speculative History Que (SHiQ) Roll back data if speculative data is accessed before it commits

Roll Back On Invalidation of speculatively loaded or stored data. On read of speculatively stored data On replacement due to a miss Stored in Block Lookup Table (BLT) Roll back by restoring from the SHiQ No speculation until store completes

Example Processor 1 speculative load to Block Processor 1 Does some other work Processor 1 speculative Store to Block Processor 2 load to Block Get shared Get Exclusive Roll Back from SHiQ, Send old non speculative data

Conclusions Good results Potential for lots of pathological cases. Where blocks are loaded way ahead of time. Reducing effectiveness of speculation This is reduced by only speculatively storing once.

Questions? How many workloads are “well behaved” Could RC benefit from the same ILP exploitation? Can you speculatively load across cores? Slow down in processor due to additional hardware.