SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Ocelot and the SST-MacSim Simulator Genie.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
©2009 HP Confidential template rev Ed Turkel Manager, WorldWide HPC Marketing 4/7/2011 BUILDING THE GREENEST PRODUCTION SUPERCOMPUTER IN THE.
1 MacSim Tutorial (In ISCA-39, 2012). Thread fetch policies Branch predictor Thread fetch policies Branch predictor Software and Hardware prefetcher Cache.
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
HPArch Research Group. |Part 2. Overview of MacSim Introduction For black box approach users |Part 3: Details of MacSim For computer architecture researchers.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes Maria Athanasaki, Evangelos Koukis, Nectarios Koziris National Technical.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Technion - Israel institute of technology department of Electrical Engineering High speed digital systems laboratory Super Computer System Midterm presentation.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
MacSim Tutorial (In ICPADS 2013) 1. |The Structural Simulation Toolkit: A Parallel Architectural Simulator (for HPC) A parallel simulation environment.
(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Accelerating Simulation of Agent-Based Models on Heterogeneous Architectures.
Extracted directly from:
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
(1) ECE 3056: Architecture, Concurrency and Energy in Computation Lecture Notes by MKP and Sudhakar Yalamanchili Sudhakar Yalamanchili (Some small modifications.
HPC Business update HP Confidential – CDA Required
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
GPU Architecture and Programming
HPArch Research Group. |Part III: Overview of MacSim Features of MacSim Basic MacSim architecture How to simulate architectures with MacSim |Part IV:
1 Latest Generations of Multi Core Processors
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY HPCDB Satisfying Data-Intensive Queries Using GPU Clusters November.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.
An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory.
EECS 583 – Class 21 Research Topic 3: Compilation for GPUs University of Michigan December 12, 2011 – Last Class!!
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Update IDC HPC Forum.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
SSQSA present and future Gordana Rakić, Zoran Budimac Department of Mathematics and Informatics Faculty of Sciences University of Novi Sad
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
CS203 – Advanced Computer Architecture Computer Architecture Simulators.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
课程名 编译原理 Compiling Techniques
Structural Simulation Toolkit / Gem5 Integration
An Overview of the ITTC Networking & Distributed Systems Laboratory
Coe818 Advanced Computer Architecture
Introduction to Heterogeneous Parallel Computing
ECE 8823: GPU Architectures
Presentation transcript:

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Ocelot and the SST-MacSim Simulator Genie Hsieh §, Andrew Kerr, Hyesoon Kim, Jaekyu Lee, Nagesh Lakshminarayana, Arun Rodrigues §, Sudhakar Yalamanchili School of Computer Science and School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA § Scalable Computer Architecture Department Sandia National Laboratories Albuquerque, NM

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY System Diversity Keeneland System Tianhe-1A Amazon EC2 GPU Instances Heterogeneity is Mainstream Mobile Platforms 2

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Heterogeneity On-Chip Vector Extensions AES Instructions Programmable Pipeline (GEN6) Sandy Bridge Programmable Accelerator PowerEN 16, PowerPC cores Accelerators Crypto Engine RegEx Engine XML Engine ARM Style Memory Denver Multiple models of Computation Multi-ISA 3

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Heterogeneous Systems: Keeneland 201 TFLOPS in 7 racks (90 sq ft incl service area)677 MFLOPS per watt on HPL (#9 on Green500, Nov 2010)Final delivery system planned for early 2012 Keeneland System (7 Racks) ProLiant SL390s G7 (2CPUs, 3GPUs) S6500 Chassis (4 Nodes) Rack (6 Chassis) M2070 Xeon Series Director Switch Integrated with NICS Datacenter GPFS and TG Full PCIe X16 bandwidth to all GPUs 67 GFLOPS 515 GFLOPS 1679 GFLOPS 24/18 GB 6718 GFLOPS GFLOPS GFLOPS Courtesy J. Vetter (GT/ORNL) 4

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Heterogeneous Architecture & Systems Research Lexical Analyzer Parser Semantic analysis Optimization Code generation Post pass optimization VLIW (Caymen)SIMT (Fermi)New Designs Microarchitecture Memory systems Network on Chip Power Management + Many more Memory Optimizations Program Transformations Control Flow Optimizations + Many more Common Research Themes Instruction set architecture Focus on explicitly data parallel languages – bulk synchronous models 5

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Research Infrastructure Challenges 6 Microarch Simulator Power & Thermal Models Compiler Open source Compiler infrastructures for GPU computing Microarchitecture cycle-level timing simulators for heterogeneous architectures Integration between compiler, simulators, and models Scalable simulation infrastructures Simulation wall! Ability to integrate point tools Tile

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Tutorial Overview 7 Low level Compiler Infrastructure for GPU Computing Ocelot Dynamic Execution Infrastructure Andrew Kerr, Sudhakar Yalamanchili Heterogeneous Cycle-level Architecture Models Parallel Simulation Infrastructure MacSim Heterogeneous Architecture Simulator SST: Structural Simulation Toolkit J. Lee, N. Lakshminarayana, H. Kim G. Hsieh, A. Rodrigues

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY Tutorial Schedule 8 Topical Description Part 1 (90 min.)Ocelot Overview: Architecture Ocelot: Supported Devices Part II (90 min.)Structural Simulation Toolkit MacSim: Overview Lunch Part III (90 min)MacSim: Simulator Architecture MacSim: Configuration Part IV (90 min.)Case Studies using Ocelot and SST-MacSIm