SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.

Slides:

Advertisements

Similar presentations

Instruction Level Parallelism and Superscalar Processors

Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.

Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.

Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.

Multithreading Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.

Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)

Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.

1 Lecture 12: ILP Innovations and SMT Today: ILP innovations, SMT, cache basics (Sections 3.5 and supplementary notes)

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)

Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.

1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.

CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.

Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.

University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

CS25212 Coarse Grain Multithreading Learning Objectives: – To be able to describe a coarse grain multithreading implementation – To be able to estimate.

Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Niagara: a 32-Way Multithreaded SPARC Processor

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

Classic Model of Parallel Processing

.1 Multiprocessor on a Chip & Simultaneous Multi-threads [Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005]

SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.

Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University.

CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.

Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.

CSC 7080 Graduate Computer Architecture Lec 8 – Multiprocessors & Thread- Level Parallelism (3) – Sun T1 Dr. Khalaf Notes adapted from: David Patterson.

The Alpha – Data Stream Matt Ziegler.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.

1 Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)

Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal

Computer Organization CS224 Fall 2012 Lessons 39 & 40.

Jason Jong Kyu Park1, Yongjun Park2, and Scott Mahlke1

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

My Coordinates Office EM G.27 contact time:

UltraSparc IV Tolga TOLGAY. OUTLINE Introduction History What is new? Chip Multitreading Pipeline Cache Branch Prediction Conclusion Introduction History.

CS203 – Advanced Computer Architecture Cache. Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors:

Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.

Effect of Instruction Fetch and Memory Scheduling on GPU Performance Nagesh B Lakshminarayana, Hyesoon Kim.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Microbenchmarking the GT200 GPU

The University of Adelaide, School of Computer Science

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

Lecture: SMT, Cache Hierarchies

CMPT 886: Computer Architecture Primer

Lecture: SMT, Cache Hierarchies

Lecture 20: OOO, Memory Hierarchy

Lecture: SMT, Cache Hierarchies

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

CS 286 Computer Architecture & Organization

Lecture: SMT, Cache Hierarchies

CS 286 Computer Organization and Architecture

Chip&Core Architecture

CIS 6930: Chip Multiprocessor: Parallel Architecture and Programming

Presentation transcript:

SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group Overview 8 cores 4 threads per core 3MB L2 cache (4-banks) 12-way, write-back One FPU per chip © David Yen BUS

SYNAR Systems Networking and Architecture Group Memory Latency Limits Performance © David Yen

SYNAR Systems Networking and Architecture Group Hardware Multithreading © David Yen While one thread is blocked on memory, others continue computing – results in higher number of instructions per cycle

SYNAR Systems Networking and Architecture Group Eight Multithreaded Cores © David Yen

SYNAR Systems Networking and Architecture Group Niagara Chip © Poonacha Kongetira

SYNAR Systems Networking and Architecture Group Niagara Core 4 threads per core Multithreading increases core area by 20% 6 stage single-issue in-order pipeline IFU – instruction fetch unit LSU – load/store unit EXU – execution unit L1 D-cache: 4-way, 8KB, 16 byte line L1 I-cache: 4-way, 16KB, 32 byte line Why simple in-order core? Why small caches?

SYNAR Systems Networking and Architecture Group Switching Threads Switch between available threads every cycle giving priority to least recently executed thread Fine-grained multithreading Threads become unavailable due to: – Long latency ops like loads, branch, mul, div. – Pipeline stalls such as cache misses, traps, and resource conflicts