MV5: A RECONFIGURABLE SIMULATOR FOR HETEROGENEOUS MULTICORE ARCHITECTURES Jiayuan Meng, Kevin Skadron University of Virginia Now at Argonne National.

Slides:

Advertisements

Similar presentations

Chapter 3 Embedded Computing in the Emerging Smart Grid Arindam Mukherjee, ValentinaCecchi, Rohith Tenneti, and Aravind Kailas Electrical and Computer.

Advertisements

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.

Lecture 6: Multicore Systems

Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.

1 MacSim Tutorial (In ISCA-39, 2012). Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache.

System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)

XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.

Background Computer System Architectures Computer System Software.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.

University of Kansas Research Interests David Andrews Rm. 324 Nichols

Parallel Computer Architectures

Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

HPArch Research Group. |Part 2. Overview of MacSim Introduction For black box approach users |Part 3: Details of MacSim For computer architecture researchers.

GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.

Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.

Chapter 18 Multicore Computers

Computer System Architectures Computer System Software

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.

University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

Enabling Multi-threaded Applications on Hybrid Shared Memory Manycore Architectures Tushar Rawat and Aviral Shrivastava Arizona State University, USA CML.

TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.

Niagara: a 32-Way Multithreaded SPARC Processor

Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.

A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.

RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.

HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.

Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.

Understanding Parallel Computers Parallel Processing EE 613.

1 Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.

My Coordinates Office EM G.27 contact time:

Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.

Corse Overview Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Trends in Multicore Architecture.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.

Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.

Lynn Choi School of Electrical Engineering

Multi-core processors

Parallel Computing Lecture

Mattan Erez The University of Texas at Austin

Lecture 26: Multiprocessors

Presented by: Isaac Martin

Complexity effective memory access scheduling for many-core accelerator architectures Zhang Liang.

NVIDIA Fermi Architecture

Symmetric Multiprocessing (SMP)

ECE/CS 757: Advanced Computer Architecture II

Coe818 Advanced Computer Architecture

Embedded Computer Architecture 5SIA0 Overview

Trends in Multicore Architecture

Introduction to Heterogeneous Parallel Computing

Lecture 27: Multiprocessors

GPU baseline architecture and gpgpu-sim

CS 286 Computer Organization and Architecture

Chip&Core Architecture

Graphics Processing Unit

6- General Purpose GPU Programming

CSE 502: Computer Architecture

Multicore and GPU Programming

Presentation transcript:

MV5: A RECONFIGURABLE SIMULATOR FOR HETEROGENEOUS MULTICORE ARCHITECTURES Jiayuan Meng*, Kevin Skadron University of Virginia * Now at Argonne National Laboratory Single-Instruction, Multiple-Threads Do you need it? Simulators for Today’s Architectures Break it, Use it! GPU-like SIMD (SIMT) Hardware thread scheduler API for SIMD threads Directory-based coherence (MESI, MSI) On-chip Network (Mesh) Simulation Management for space exploration M5 based So what is MV5? A B C D (b) Branch divergence and re-convergence time A / 11111111 B / 11111001 D / 11111111 C / 00000110 (a) The example program Post-dominator If you want to explore: SIMD + IO/OO SIMD + coherent caches SIMD + OCN Simple Banked Cache Underlying middleware If you are OK with System emulation Kernels Out-of-Order (OO) core: SimpleScalar Simultaneous Multithreading: SMTSIM Chip-multiprocessor w/t OO cores: SESC Chip-multiprocessor w/t In-Order (IO), OO cores: Simics+Gems+Garnet, SimFlex Intel’s Microarchitecture: PTLSim GPU: GPGPUSim But Future is Unpredictable… Core L1 Cache DRAM L2 Cache BlkState DirState Separate basic cache functionalities with coherence protocols General purpose / Heterogeneous / Integrated Accelerators? Diversity Modularity Scalability Co-design M5 provides such a platform MV5 is based on M5 Potential Configurations MESI/MSI Dual Core Tiled Cores OO+SIMD Note: SIMD cores can share the same address space with other cores over coherent caches! OO SIMD On-chip Network In-order MV5 Website https://sites.google.com/site/mv5sim/home MV5 Mailing list: http://groups.google.com/group/mv5sim caches DRAM Acknowledgements This work was supported in part by SRC grant No. 1607, NSF grant nos. IIS-0612049 and CNS-0615277, a grant from Intel Research, and a professor partnership award from NVIDIA Research. We would like to thank Jeremy W. Sheaffer, David Tarjan, Shuai Che, and Jiawei Huang for their helpful inputs in power modeling, area estimation, and benchmark implementations.