6/14/2015 How to measure Multi- Instruction, Multi-Core Processor Performance using Simulation Deepak Shankar Darryl Koivisto Mirabilis Design Inc.

Slides:

Advertisements

Similar presentations

Network II.5 simulator ..

Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters

SE-292 High Performance Computing

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Computer Abstractions and Technology

Super computers Parallel Processing By: Lecturer \ Aisha Dawood.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

ECE 526 – Network Processing Systems Design

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 1.

A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

Chapter 18 Multicore Computers

Unit VI. Keil µVision3/4 IDE for 8051 Tool for embedded firmware development Steps for using keil.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.

An Analytical Performance Model for Co-Management of Last-Level Cache and Bandwidth Sharing Taecheol Oh, Kiyeon Lee, and Sangyeun Cho Computer Science.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Multi-Core Architectures

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.

A Methodology for Architecture Exploration of heterogeneous Signal Processing Systems Paul Lieverse, Pieter van der Wolf, Ed Deprettere, Kees Vissers.

COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Why do you need to study computer organization and architecture? Computer science and IT.

© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.

Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.

Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,

OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.

CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Data Management for Decision Support Session-4 Prof. Bharat Bhasker.

Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.

Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.

Data Structures and Algorithms in Parallel Computing Lecture 1.

Object Oriented Analysis and Design 1 Chapter 9 From Design to Implementation  Implementation Model  Forward, Reverse, and Round-Trip Engineering  Mapping.

Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.

6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Parallel Processing Chapter 9. Problem: –Branches, cache misses, dependencies limit the (Instruction Level Parallelism) ILP available Solution:

Lecture 3: Computer Architectures

Parallel Processing Presented by: Wanki Ho CS147, Section 1.

McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 OS 1.

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

Computer Architecture Organization and Architecture

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

9/30/2016 Distributed System Exploration Mirabilis Design Inc.

The Post Windows Operating System

CS203 – Advanced Computer Architecture

Lecture 5 Approaches to Concurrency: The Multiprocessor

Multi-Processing in High Performance Computer Architecture:

Chapter 1 Introduction.

Good Morning/Afternoon/Evening

Computer Evolution and Performance

Chapter 4 Multiprocessors

Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.

COT 4600 Operating Systems Fall 2009

Presentation transcript:

6/14/2015 How to measure Multi- Instruction, Multi-Core Processor Performance using Simulation Deepak Shankar Darryl Koivisto Mirabilis Design Inc.

Slide 2 6/14/2015 Mirabilis Design  Inc  Confidential Mirabilis Design™ Concept Engineering software and services Founded in 2003 & based in Sunnyvale, USA 19 customers and 18 projects completed Product Two major product-lines: Architect™ and Explorer Third Generation Applications High-performance systems, custom semiconductors and real-time software Comprehensive System Design Software Provider

Slide 3 6/14/2015 Mirabilis Design  Inc  Confidential Trends Semiconductor companies are migrating to 2, 4, 8 multi-instruction, multi-core processors to improve performance, reduce power Processors are migrating from SIMD to MIMD architectures Compute system vendors are packaging multiprocessor systems to improve performance, reduce cost Scaling from single processor, single core to multiprocessor, multi-core requires more inter-processor communication: Amdahl’s Law of Parallel Computing Many benchmarks are limited to single processor, single core, there is a need for a new methodolgy

Slide 4 6/14/2015 Mirabilis Design  Inc  Confidential Benchmark Challenges Different configurations lead to remarkably different performance and power metrics Multi-Processors vs. Multi-Cores Shared vs. Distributed Caches Communication Bandwidth between Processors/Cores Multi-Threaded applications make benchmarking more difficult How are Applications partitioned: thread per node or application per node How are Applications distributed, controlled Speed of Processors, Memory, Communication is not scaleable in all cases

Slide 5 6/14/2015 Mirabilis Design  Inc  Confidential Multi-Core/Processor Benchmarking Solution Speedup equations that take into account parallel execution Speedup_Factor, Multi_Instr_Mhz, Multi_Core_Mhz Modeling Platform that supports multi-instruction, multi-core topologies using routing tables Create models for different processor memory, and connectivity configurations Scaleable infrastructure for end-users to experiment with their specific variations Processor Models that support published benchmarks Key parameters, multi-instruction per cycle, shared cache Multiple instances, scriptable pipeline Ability to correlate baseline results to hardware tests Interactive and graphical demonstration of the benchmark results Hundreds of built-in statistics

Slide 6 6/14/2015 Mirabilis Design  Inc  Confidential Speedup Equations Multi-Instruction Single_Instruction_App_Time Speedup_Factor = Multi_Instruction_App_Time Multi_Instr_Mhz = Single_Instruction_Mhz * Speedup_Factor Multi-Core Single_Core_App_Time Speedup_Factor = Multi_Core_App_Time Multi_Core_Mhz = Single_Core_Mhz * Speedup_Factor Note: Multi_Core_App_Time assumes time to complete Single_Core_App_Time distributed among N cores plus communication time.

Slide 7 6/14/2015 Mirabilis Design  Inc  Confidential Performance, Power and Functional Exploration Idea Discussion Need to design Traffic Mgr -Variable sizes and priority -Over 1000 concurrent port processing Customer Requirements Analysis Bottlenecks Throughput Capacity Golden Reference Architecture Defn. Component Size Function mapping Concept Engineering and Design Optimization

Slide 8 6/14/2015 Mirabilis Design  Inc  Confidential VisualSim Provides Graphical and Hierarchical Modeling based on Ptolemy Simulation Engine Modeling blocks to quickly construct a custom/ platform model Mixed abstractions and mixed-signal modeling Over 2000 power, buffering and performance statistics generators built into architecture blocks Interface and Documentation Built-in documentation capability Embedding models in documents for remote execution Wizard for native code (C/C++/Java/RTL/SystemC) Modeling libraries, mixed abstraction and hierarchical development

Slide 9 6/14/2015 Mirabilis Design  Inc  Confidential Multi-Processor Benchmark Model

Slide 10 6/14/2015 Mirabilis Design  Inc  Confidential Detailed Benchmark Results

Slide 11 6/14/2015 Mirabilis Design  Inc  Confidential Benchmark Drives Business Use Benchmark for marketing, Product Specification and Early System Prototyping Knowledge Base Auto-Generate Specification Communicate Requirements VisualSim is the Media for Universal Communication Architect Field Support Field Support Customer Input Customer Input Map Requests Graphically Marketing Involved Early in the Design Test & Diagnostics Test & Diagnostics Application Driven selection Customer Decision Customer Decision

6/14/2015 Functional, Performance vs. Power Concept Engineering Modeling Simulation Hardware SoC/IC/FPGA Embedded Software Network