Caltech CS184 Spring2005 -- DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 25: May 27, 2005 Transactional Computing.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

The University of Adelaide, School of Computer Science
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
SE-292 High Performance Computing
Thoughts on Shared Caches Jeff Odom University of Maryland.
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 8: Transactional Memory – TCC Topics: “lazy” implementation (TCC)
Configurable System-on-Chip: Xilinx EDK
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
DDM - A Cache-Only Memory Architecture Erik Hagersten, Anders Landlin and Seif Haridi Presented by Narayanan Sundaram 03/31/2008 1CS258 - Parallel Computer.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 8: April 26, 2001 Simultaneous Multi-Threading (SMT)
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
LAB1 Summary Zhaofeng SJTU.SOME. Embedded Software Tools CPU Logic Design Tools I/O FPGA Memory Logic Design Tools FPGA + Memory + IP + High Speed IO.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
Introduction to Operating Systems and Concurrency.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 22: May 20, 2005 Synchronization.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 9: May 3, 2001 Distributed Shared Memory.
Execution Replay and Debugging. Contents Introduction Parallel program: set of co-operating processes Co-operation using –shared variables –message passing.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
Evaluating System-wide Monitoring Capsule Design Using Xilinx Virtex-II Pro FPGA Taeweon Suh Hsien-Hsin S. Lee Sally A. Mckee Taeweon Suh §, Hsien-Hsin.
CS5102 High Performance Computer Systems Thread-Level Parallelism
Transactional Memory : Hardware Proposals Overview
Memory Consistency Models
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Memory Consistency Models
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Transactional Memory Coherence and Consistency
The University of Adelaide, School of Computer Science
Changing thread semantics
Taeweon Suh §, Hsien-Hsin S. Lee §, Sally A. Mckee †,
Lecture 22: Consistency Models, TM
/ Computer Architecture and Design
Design and Implementation Issues for Atomicity
High Performance Computing
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 23: Virtual Memory, Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 25: May 27, 2005 Transactional Computing

Caltech CS184 Spring DeHon 2 Today Transactional Model Hardware Support Course Wrapup

Caltech CS184 Spring DeHon 3 Transactional Compute Model Computation is a collection of atomic transactions Transactions are preformed in some ordering –Specified sequential ordering –Some consistent sequential interleaving Looks like sequential (multithreaded) computation

Caltech CS184 Spring DeHon 4 Atomic Blocks Indivisible Cannot get interleaving of operations among blocks Each block –reads state at some point in time; –provides all its updates (writes) at that point

Caltech CS184 Spring DeHon 5 How Get? How do we get atomic blocks in our current models? Get parallelism with this model?

Caltech CS184 Spring DeHon 6 Simple/Sequential Version Acquire a mutex Perform transaction Release mutex

Caltech CS184 Spring DeHon 7 Parallel/Locking Version Read inputs, locking each as encounter Lock all outputs Perform operations Write outputs Unlock inputs/outputs OneChip Vector case is a single source/sink version of this

Caltech CS184 Spring DeHon 8 Parallel Locking Dangers What do we have to worry about in this version? Lock acquisition order  deadlock Parallelism?

Caltech CS184 Spring DeHon 9 Optimistic (Speculative) Transactions Pattern: speculate/rollback –Common case: non-interacting Read inputs Perform computations Check no one changed inputs –Got correct inputs?  write outputs atomicly –Inputs changed  rollback and try again

Caltech CS184 Spring DeHon 10 Optimistic Transactions Compare –advanced load –ll/sc –branch prediction Doesn’t need to be sequentiallized Just need to look sequential It is adequate to –detect sequentialization violation

Caltech CS184 Spring DeHon 11 Hardware Microarchitecture

Caltech CS184 Spring DeHon 12 How support in hardware? What need? –Keep/mark inputs –Hold on to outputs –Check inputs –Commit outputs

Caltech CS184 Spring DeHon 13 Microarchitecture Mark inputs in processor-local cache Writes into processor-local cache Write buffer for outputs –Hold until end of transaction Commit bus Snoop on bus for invalidations Dump write buffer to commit bus when turn to commit

Caltech CS184 Spring DeHon 14 Stanford TCC uArch [Hammond et al./ISCA2004]

Caltech CS184 Spring DeHon 15 uArch Concerns? Impact of commit sequentialization? Time spent rolling back? Size of cache/write-buffer needed?

Caltech CS184 Spring DeHon 16 Commit Traffic [Hammond et al./ISCA2004]

Caltech CS184 Spring DeHon 17 Where time go? [Hammond et al./ISCA2004]

Caltech CS184 Spring DeHon 18 Read State [Hammond et al./ISCA2004]

Caltech CS184 Spring DeHon 19 Write State Fig 7 [Hammond et al./ISCA2004]

Caltech CS184 Spring DeHon 20 Execution Time [Hammond et al./ASPLOS2004]

Caltech CS184 Spring DeHon 21 Other Patterns? Double Buffer Parallel Prefix for Commits

Caltech CS184 Spring DeHon 22 Programming Explicit –Parallel/transactional looping constructs –forks Add like futures –Force ordering to match sequential language spec and is “safe” Automatically add –Support “Mostly Functional” programming –“Make the fast case common”

Caltech CS184 Spring DeHon 23 Performance Issues Too many violations? –Force some transactions to wait on others Explicit/pragma/program construct Feedback-based –Emprical dataflow discovery?

Caltech CS184 Spring DeHon 24 Stanford TCC Prototyping (from Workshop on Architecture Research using FPGA Platforms at HPCA 2004)Workshop on Architecture Research using FPGA Platforms

25 C. Kozyrakis, WARFP, Feb ATLAS Overview A multi-board emulator for transactional parallel systems A multi-board emulator for transactional parallel systems Goals Goals 16 to 64 CPUs (8 to 32 boards) 16 to 64 CPUs (8 to 32 boards) 50 to 100MHz 50 to 100MHz Stand-alone full-feature system Stand-alone full-feature system OS, IDE disks, 100Mb Ethernet, … OS, IDE disks, 100Mb Ethernet, … ATLAS architecture space ATLAS architecture space Small, medium, and large-scale CMPs and SMPs Small, medium, and large-scale CMPs and SMPs UMA and NUMA UMA and NUMA Flexible transactional memory hierarchy & protocol Flexible transactional memory hierarchy & protocol Flexible network model Flexible network model Flexible clocking, latency, and bandwidth settings Flexible clocking, latency, and bandwidth settings

26 C. Kozyrakis, WARFP, Feb Building Block: Xilinx ML310 Board XC2VP30 FPGA features XC2VP30 FPGA features 2 PowerPC 405 cores 2 PowerPC 405 cores 2.4Mb dual-ported SRAM 2.4Mb dual-ported SRAM 30K logic cells 30K logic cells 8 RocketIO 3.125Gbps transceivers 8 RocketIO 3.125Gbps transceivers System features System features 256MB DDR, 512MB CompactFlash 256MB DDR, 512MB CompactFlash Ethernet, PCI, USB, IDE, … Ethernet, PCI, USB, IDE, … Design and development tools Design and development tools Foundation ISE for design entry, synthesis, … Foundation ISE for design entry, synthesis, … For the transactional memory hierarchy and network For the transactional memory hierarchy and network Chipscope Pro logic analyzer for debugging Chipscope Pro logic analyzer for debugging EDK for system simulation, system SW development, configuration, … EDK for system simulation, system SW development, configuration, … Montavista Linux 3.1 Pro Montavista Linux 3.1 Pro

27 C. Kozyrakis, WARFP, Feb Example: 2-way bus-based transactional CMP PowerPC 405 BRAM OCM BRAM OCM Logic BRAM PLB Logic Macro PLB

Caltech CS184 Spring DeHon 28 Big Ideas [Transactional] Model/Implementation –Only needs to appear sequentialized Speculation/Rollback Common Case

Caltech CS184 Spring DeHon 29 Model Review

Caltech CS184 Spring DeHon 30 Models Threads single multiple Single data Conv. Processors Multiple data Data Parallel (SIMD, Vector) No SM Shared Memory Side Effects No Side Effects (S)MTDataflow SM MP DSM Fine-grained threading Pure MP Streaming Dataflow (e.g. SCORE) Trans- actional

Caltech CS184 Spring DeHon 31 Big Ideas Model Implementation support model semantics –But may be very different, optimized Scaling and change Common case fast; fast case common –Measure to understand