ECE/CS 752: Midterm 2 Review ECE/CS 752 Fall 2017

Slides:



Advertisements
Similar presentations
Advanced Caches Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen and Mark Hill Updated by Mikko Lipasti.
Advertisements

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
1 MacSim Tutorial (In ISCA-39, 2012). Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache.
Scalable Load and Store Processing in Latency Tolerant Processors Amit Gandhi 1,2 Haitham Akkary 1 Ravi Rajwar 1 Srikanth T. Srinivasan 1 Konrad Lai 1.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
SMT Parallel Applications –For one program, parallel executing threads Multiprogrammed Applications –For multiple programs, independent threads.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
CS Lecture 24 Exceeding the Dataflow Limit via Value Prediction M.H. Lipasti, J.P. Shen Proceedings of MICRO-29 December 1996.
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
ECE/CS 752: Midterm 2 Review Instructor:Mikko H Lipasti Spring 2012 University of Wisconsin-Madison.
Advanced Microarchitecture Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by Ilhyun Kim Updated by Mikko Lipasti.
CPE731: Advanced Computer Architecture Course Introduction Dr. Gheith Abandah د. غيث علي عبندة.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Niagara: a 32-Way Multithreaded SPARC Processor
YEAR 2006 The University of Auckland | New Zealand PRESENTATION Computer Science 703 Advance Computer Architecture 2006 Semester 1 Preparation for Test.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.
Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University.
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015.
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
Elec/Comp 526 Spring 2015 High Performance Computer Architecture Instructor Peter Varman DH 2022 (Duncan Hall) rice.edux3990 Office Hours Tue/Thu.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
18-447: Computer Architecture Lecture 30B: Multiprocessors
Prof. Onur Mutlu Carnegie Mellon University
Microarchitecture.
Computer Architecture: Parallel Processing Basics
Prof. Onur Mutlu Carnegie Mellon University 10/12/2012
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Multiscalar Processors
ECE/CS 757: Advanced Computer Architecture II Midterm 2 Review
Prof. Onur Mutlu Carnegie Mellon University
Simultaneous Multithreading
Simultaneous Multithreading
CS 704 Advanced Computer Architecture
Computer Structure Multi-Threading
Main Memory ECE/CS 752 Fall 2017
The University of Adelaide, School of Computer Science
CPE731: Advanced Computer Architecture Course Introduction
Lecture 18: Coherence and Synchronization
ECE 4100/ Advanced Computer Architecture Sudhakar Yalamanchili
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Why we use banked Instruction Cache
Advanced Microarchicture ECE/CS 752 Fall 2017
Executing Multiple Threads ECE/CS 752 Fall 2017
Gokul Ravi, Mikko H. Lipasti Electrical and Computer Engineering
Advanced Caches ECE/CS 752 Fall 2017
Computer Architecture: Multithreading (I)
Instruction scheduling
Lecture 11: Memory Data Flow Techniques
ECE/CS 757: Advanced Computer Architecture II
Coe818 Advanced Computer Architecture
Advanced Microarchitecture
15-740/ Computer Architecture Lecture 14: Runahead Execution
/ Computer Architecture and Design
Prof. Onur Mutlu Carnegie Mellon University
Lecture 25: Multiprocessors
EE 4xx: Computer Architecture and Performance Programming
15-740/ Computer Architecture Lecture 14: Prefetching
The Vector-Thread Architecture
Overview Prof. Eric Rotenberg
Advanced Architecture +
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Transparent Control Independence (TCI)
Handling Stores and Loads
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

ECE/CS 752: Midterm 2 Review ECE/CS 752 Fall 2017 Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti

Midterm 2 Review Topics Advanced Caches (Lect 11) Main Memory (Lect 12) Advanced Microarchitecture (Lect 13) Multiple Threads (Lect 14) ECE/CS 752: Advanced Computer Architecture I

Readings-Advanced Caches Read on your own: Review: Shen & Lipasti Chapter 3 W.-H. Wang, J.-L. Baer, and H. M. Levy. “Organization of a two-level virtual-real cache hierarchy,” Proc. 16th ISCA, pp. 140-148, June 1989 (B6) Online PDF Read Sec. 1, skim Sec. 2, read Sec. 3: Bruce Jacob, “The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It,” Synthesis Lectures on Computer Architecture 2009 4:1, 1-77. Online PDF To be discussed in class: Review #1 due 11/1/2017: Andreas Sembrant, Erik Hagersten, David Black-Schaffer, “The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup,” Proc. ISCA 2014, June 2014.. Online PDF Review #2 due 11/3/2017: Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: closing the performance gap between systems with and without persistence support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421-432. Online PDF Review #3 due 11/6/2017: T. Shaw, M. Martin, A. Roth, “NoSQ: Store-Load Communication without a Store Queue,” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. Online PDF

Advanced Memory Hierarchy Coherent Memory Interface Evaluation methods Analytical, trace-driven, execution-driven. Simpoint selection Better miss rate: skewed associative caches, victim caches Reducing miss costs through software restructuring Beyond simple blocks Two level caches

Readings-Main Memory Read on your own: To be discussed in class: Review: Shen & Lipasti Chapter 3 W.-H. Wang, J.-L. Baer, and H. M. Levy. “Organization of a two-level virtual-real cache hierarchy,” Proc. 16th ISCA, pp. 140-148, June 1989 (B6) Online PDF Read Sec. 1, skim Sec. 2, read Sec. 3: Bruce Jacob, “The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It,” Synthesis Lectures on Computer Architecture 2009 4:1, 1-77. Online PDF To be discussed in class: Review #1 due 11/1/2017: Andreas Sembrant, Erik Hagersten, David Black-Schaffer, “The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup,” Proc. ISCA 2014, June 2014.. Online PDF Review #2 due 11/3/2017: Jishen Zhao, Sheng Li, Doe Hyun Yoon, Yuan Xie, and Norman P. Jouppi. 2013. Kiln: closing the performance gap between systems with and without persistence support. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 421-432. Online PDF Review #3 due 11/6/2017: T. Shaw, M. Martin, A. Roth, “NoSQ: Store-Load Communication without a Store Queue,” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. Online PDF

Outline: Main Memory DRAM chips Memory organization Interleaving Banking Memory controller design Hybrid Memory Cube Persistent Memory Virtual memory TLBs Interaction of caches and virtual memory (Wang et al.) Large pages, virtualization

Readings-Adv. Microarch. Read on your own: I. Kim and M. Lipasti, “Understanding Scheduling Replay Schemes,” in Proceedings of the 10th International Symposium on High-performance Computer Architecture (HPCA-10), February 2004. Srikanth Srinivasan, Ravi Rajwar, Haitham Akkary, Amit Gandhi, and Mike Upton, “Continual Flow Pipelines”, in Proceedings of ASPLOS 2004, October 2004. Ahmed S. Al-Zawawi, Vimal K. Reddy, Eric Rotenberg, Haitham H. Akkary, “Transparent Control Independence,” in Proceedings of ISCA-34, 2007. To be discussed in class: Review by 11/6/2017: T. Shaw, M. Martin, A. Roth, “NoSQ: Store-Load Communication without a Store Queue, ” in Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, 2006. Review by 11/8/2017: Andreas Sembrant et al., “Long term parking (LTP): criticality-aware resource allocation in OOO processors,” Proc of MICRO-48, December 2015. Review by 11/10/2017: Arthur Perais and André Seznec. 2014. EOLE: paving the way for an effective implementation of value prediction. In Proceeding of the 41st Annual International Symposium on Computer Architecture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 481-492.  Online PDF ECE/CS 752: Advanced Computer Architecture I

Advanced Microarchitecture Memory Data Flow Scalable Load/Store Queues Memory-level parallelism (MLP) Register Data Flow Instruction scheduling overview Scheduling atomicity Speculative scheduling Scheduling recovery EOLE: Effective Implementation of Value Prediction Instruction Flow Revolver: efficient loop execution Transparent Control Independence ECE/CS 752: Advanced Computer Architecture I

Readings-Multiple Threads Read on your own: Shen & Lipasti Chapter 11 G. S. Sohi, S. E. Breach and T.N. Vijaykumar. Multiscalar Processors, Proc. 22nd Annual International Symposium on Computer Architecture, June 1995. Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, Proc. 23rd Annual International Symposium on Computer Architecture, May 1996 (B5) To be discussed in class: Review #6 due 11/17/2017: Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," International Symposium on Computer Architecture (ISCA), pp. 367-379, June 2016. Online PDF ECE/CS 752: Advanced Computer Architecture I

Executing Multiple Threads Thread-level parallelism Synchronization Multiprocessors: coherence, consistency Explicit multithreading Data parallel architectures Multicore interconnects Implicit multithreading: Multiscalar Niagara case study ECE/CS 752: Advanced Computer Architecture I