Conference title 1 A Research-Oriented Advanced Multicore Architecture Course Julio Sahuquillo, Salvador Petit, Vicent Selfa, and María E. Gómez May 25,

Slides:

Advertisements

Similar presentations

Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.

Advertisements

Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.

Understanding a Problem in Multicore and How to Solve It

Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs Onur Kayıran, Adwait Jog, Mahmut Kandemir, Chita R. Das.

Improving Cache Performance by Exploiting Read-Write Disparity

1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.

SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.

1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.

Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,

Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Parallel Application Memory Scheduling Eiman Ebrahimi * Rustam Miftakhutdinov *, Chris Fallin ‡ Chang Joo Lee * +, Jose Joao * Onur Mutlu ‡, Yale N. Patt.

1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.

1 Coordinated Control of Multiple Prefetchers in Multi-Core Systems Eiman Ebrahimi * Onur Mutlu ‡ Chang Joo Lee * Yale N. Patt * * HPS Research Group The.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.

1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.

Scalable Many-Core Memory Systems Lecture 4, Topic 3: Memory Interference and QoS-Aware Memory Systems Prof. Onur Mutlu

Conference title1 A New Methodology for Studying Realistic Processors in Computer Science Degrees Crispín Gómez, María E. Gómez y Julio Sahuquillo DISCA.

Lecture 1: Welcome Computer Architecture Kai Bu

Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.

Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.

Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.

Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter.

CPE731: Advanced Computer Architecture Course Introduction Dr. Gheith Abandah د. غيث علي عبندة.

Niagara: a 32-Way Multithreaded SPARC Processor

Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.

Prefetching Challenges in Distributed Memories for CMPs Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC – BarcelonaTech.

Lecture 01: Welcome Computer Architecture! Kai Bu

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research.

4/25/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 22: Putting it All Together Krste Asanovic Electrical Engineering and.

CMP-MSI Feb. 11 th 2007 Core to Memory Interconnection Implications for Forthcoming On-Chip Multiprocessors Carmelo Acosta 1 Francisco J. Cazorla 2 Alex.

Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative

Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University.

Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.

The Evicted-Address Filter

Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.

1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.

IMPROVING THE PREFETCHING PERFORMANCE THROUGH CODE REGION PROFILING Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture Department UPC.

1 An Execution-Driven Simulation Tool for Teaching Cache Memories in Introductory Computer Organization Courses Salvador Petit, Noel Tomás Computer Engineering.

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

Effect of Instruction Fetch and Memory Scheduling on GPU Performance Nagesh B Lakshminarayana, Hyesoon Kim.

Lecture 01: Welcome Computer Architecture! Kai Bu

“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.

Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.

Reducing Memory Interference in Multicore Systems

Prof. Onur Mutlu Carnegie Mellon University

CPE731: Advanced Computer Architecture Course Introduction

740: Computer Architecture First Assignments to Complete

Application Slowdown Model

Computer Architecture: Multithreading (I)

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Address-Value Delta (AVD) Prediction

ECE/CS 757: Advanced Computer Architecture II

Embedded Computer Architecture 5SIA0 Overview

Samira Khan University of Virginia Sep 12, 2018

15-740/ Computer Architecture Lecture 14: Prefetching

Overview Prof. Eric Rotenberg

Presentation transcript:

Conference title 1 A Research-Oriented Advanced Multicore Architecture Course Julio Sahuquillo, Salvador Petit, Vicent Selfa, and María E. Gómez May 25, 2015 Turku, India

EduPar Outline Introduction Proposed course Teaching methodologies Course contents Conclusions

EduPar Introduction Typical Computer Architecture Courses organization – 1 introductory course – >= 1 advanced courses –E.g. Parallel Computer Architectures, Memory subsystems,… Teaching up-to-date topics in advanced courses  key issue to motivate students Main problem: vertiginous technological advances  huge effort is required by the instructor 3

EduPar Introduction Interesting courses are being currently proposed – Research in Parallel Computer Architecture by Onur Mutlu at Carnegie Mellon Instructor profile: wide and active research experience –Key point: some instructor’s papers (recent or not) are discussed at classroom  Contents are being continuously being updated  The effort of preparing the course is not so hard 4

EduPar Proposed Course Advanced Multicore Architectures (AMA) – Inspired in the previous approach – Fully-research oriented approach that tries to –1) provide updated contents –2) capture the students’ interest –3) enable students to research – 2015, UPV, Spain – 16 classes x 2.5 hours = 40h, classroom equipped with PCs

EduPar Proposed course: 5 commandments – 1. Concentrate on few topics and cover them in detail –2. Topics should focus on key components (on multicores) – Cores, caches, and main memory –3. Performance evaluation (for multicores) –4. Instructors highlight for each studied component the hot research topics from both the academia and the industry –5. Select the proper methodology

EduPar Choosing the teaching methodologies Should help to achieve the course goals – Make the course attractive for students – Provide a sound understanding of the studied topics – Enable the students in the research  study few topics Used Teaching methods –Research-oriented lectures – Practical exercises – Realistic labs – Course work 60% 15% 25%

EduPar Teaching methods Lectures –Review & study theoretical concepts  to enable students to the discussion of papers Research-based exercises: problems that students will likely face after graduation –e.g. energy consumption (CACTI), confidence intervals, … –½ hour to 1 hour Lab sessions provide the students with the skills to work on a research simulation framework (academia and industry) –Guided by instructors –2.5 hours Course work: complex implementation in the simulator –Students should show their autonomy  Not guided –20% over the final grade

EduPar Course contents MODULE 0. Course Presentation MODULE 1. Core review and multicores MODULE 2. Performance MODULE 3. Caching MODULE 4. Main Memory

EduPar Module 1. Core review and multicores Topic 1.1. Advanced Microarchitectural Concepts (review) Topic 1.2. Multicore Processors Why Topic 1.3. Multicore Evolution and Design IBM Cell BE 8+1 cores Intel Core i7 8 cores Tilera TILE Gx 100 cores, networked IBM POWER7 8 cores Intel SCC 48 cores, networked Nvidia Fermi 448 “cores” AMD Barcelona 4 cores Sun Niagara II 8 cores Topic Commercial processors Alpha Pipeline: physical register file, ROB, load/store unit -SMT processors (focus: where stalls can rise) -Paper assignment to be discussed the next class -The case of a single-chip multiprocessor -Complexity-effective superscalar processors Topic Bigger cores vs more cores; superscalar complexity -Alternative architectures -Paper discussion Topic 3.3 -Different multicores simple cores, complex cores, etc. -Amdahl’s Law for multicores based on Mark Hill’ talk: Amdahl’s Law in the Multicore Era

EduPar Module 2. Performance Topic 2.1. Performance Evaluation Metrics Topic 2.2. Performance Accounting Architectures Individual Slowdown: quantifies the performance degradation due to multicore execution Unfairness Topic Multicores are not evaluated like single cores  specific performance metrics are required -S. Eyerman and L. Eeckhout, “Restating the case for weighted-IPC metrics to evaluate multiprogram workload performance,” IEEE Comput. Archit. Lett., vol. 99, p. 1, P. Michaud, “Demystifying multicore throughput metrics,” IEEE Com- put. Archit. Lett., vol. 12, no. 2, pp. 63–66, Topic 2.2 -Accounting architectures allow achieve a sound understanding about where performance can be lost -Focus: single-thread, multicores and SMT cores -S. Eyerman et al, “A performance counter architecture for computing accurate CPI components,” in ASPLOS, 2006

EduPar Module 3. Caching Topic 3.1. Advanced Caching: Concepts and Problems Topic 3.2. Advanced Caching: Papers Topic 3.1 -Concepts: outstanding misses support, lock-up free caches, MLP,… -Problems: shared caches, QoS management, sharing vs partitioning Topic 3.2 -Cache partitioning: Utility-based partitioning paper [19] -Insertion policies: The evicted- address filter: A unified mechanism to address both cache pollution and thrashing [20] -Replacement : A case for MLP-aware cache replacement [21] CORE 0CORE 1CORE 2 CORE 3 DRAM MEMORY CONTROLLER L2 cache

EduPar Module 4. Main memory Topic 4.1. Main Memory Organization Topic 4.2. Main Memory Scheduling Topic Bottom-up approach -Cell  cell array  bank  chip  rank  DIMM Topic 4.2 -Row buffer & memory requests queue at the MC -Policies: FCFS, FR-FCFS

EduPar Conclusions The fully-research oriented AMA course has been presented –Methodology: lectures, exercises, labs, and course work It is important to train students on exercises and simulators We should reduce the time devoted to theoretical concepts but... theoretical concepts should be studied in detail Results: some of the course works (o extended versions) have been published in top research conferences like IPDS or PACT Most students of the course have then followed their PhD with us However … it is only applicable for a relatively low number of students

Conference title 15 May 25, 2015 Thanks Gràcies