Structural Simulation Toolkit / Gem5 Integration

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Kjeld v.d. Schaaf DS3-T2 DS3 T2: Data Handling, Control and Distributed Computing Kjeld v.d. Schaaf 4 September 2006.
Our approach! 6.9% Perfect L2 cache (hit rate 100% ) 1MB L2 cache Cholesky 47% speedup BASE: All cores are used to execute the application-threads. PB-GS(PB-LS)
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Andreas Sandberg ARM Research
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
MIT Lincoln Laboratory XYZ 3/11/2005 Automatic Extraction of Software Models for Exascale Hardware/Software Co-Design Damian Dechev 1,2, Amruth.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Peter S. Magnusson, Magnus Crhistensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Högberg, Frederik larsson, Anreas Moestedt. Presented.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
© 2008 The MathWorks, Inc. ® ® Parallel Computing with MATLAB ® Silvina Grad-Freilich Manager, Parallel Computing Marketing
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
MacSim Tutorial (In ICPADS 2013) 1. |The Structural Simulation Toolkit: A Parallel Architectural Simulator (for HPC) A parallel simulation environment.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Active Network Node in Silicon-Based L3 Gigabit Routing Switch Active Network Node in Silicon-Based L3 Gigabit Routing Switch 1 UC Berkeley Engineering.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
HPC F ORUM S EPTEMBER 8-10, 2009 Steve Rowan srowan at conveycomputer.com.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Introduction to Operating Systems Concepts
The Post Windows Operating System
These slides are based on the book:
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
Presented by: Nick Kirchem Feb 13, 2004
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Multiprocessing.
Ph.D. in Computer Science
Welcome: Intel Multicore Research Conference
Alternative system models
THE PROCESS OF EMBEDDED SYSTEM DEVELOPMENT
Assembly Language for Intel-Based Computers, 5th Edition
CMPE419 Mobile Application Development
Derek Chiou The University of Texas at Austin
Many-core Software Development Platforms
Cray Announces Cray Inc.
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Shadow: Scalable and Deterministic Network Experimentation
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
Scalable Parallel Interoperable Data Analytics Library
Embedded Computer Architecture 5SIA0 Overview
MPJ: A Java-based Parallel Computing System
Wide Area Workload Management Work Package DATAGRID project
BigSim: Simulating PetaFLOPS Supercomputers
Co-designed Virtual Machines for Reliable Computer Systems
Chip&Core Architecture
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
On the Role of Burst Buffers in Leadership-Class Storage Systems
Objective Understand the concepts of modern operating systems by investigating the most popular operating system in the current and future market Provide.
CMPE419 Mobile Application Development
Presentation transcript:

Structural Simulation Toolkit / Gem5 Integration + ARM Research, Sandia National Labs

View of the Simulation Problem Scale..... Many Cores + Memory Nodes Threads X Application writers purchasers designers system procurement algorithm co-design architecture research language research Multiple Audiences..... Network Processor System present systems future systems X Simulation needed to address this. But Simulation is hard. The problem Starts with scale... ...Goes beyond... Multi-Physics Apps Informatics Apps Complexity..... Communication Libraries Run-Times OS Effects Existing Languages New Languages X Constraints..... Performance Cost Power Reliability Cooling Usability Risk Size

SST Simulation Project Overview Goals Status Parallel core, standard components Current Release (4.0) Improved documentation More component models Become the standard architectural simulation framework for HPC Be able to evaluate future systems on DOE/DOD workloads Use supercomputers to design supercomputers Technical Approach Consortium “Best of Breed” simulation suite Combine Lab, academic, & industry Parallel Parallel Discrete Event core with conservative optimization over MPI Multiscale Detailed and simple models for processor, network, and memory Interoperability gem5, DRAMSim, cache models routers, NICs, schedulers, GPGPU Open Open Core, non viral, modular

Detailed Execution-based Core Model SST Components Processors Network drivers Ariel – PIN-based Ember – Pattern-based Prospero – Trace-based Firefly – communication protocols Miranda – Pattern-based Hermes - MPI-like driver interface Memory MemHierarchy – Caches, memory Zodiac – trace-based Network models VaultSimC - Stacked memory Merlin – Network simulator Cassini – Cache prefetchers Scheduler Missing: Detailed Execution-based Core Model

Integration Goals Provide gem5 functionality in the SST Framework Interoperability w/ other components Parallelism Previous integration (2011) was with a branch of Gem5 Emulation mode only Not sustainable New integration is with the main Gem5 stable release Ability to run full-system Testing of the new integration is underway

Integration Issues gem5 Clock / Events Untimed (functional) accesses Can’t “cheat” - everything must be timed Memory events Forwarding snoop requests to core Compilation issues (the usual)

Studies Processing-in-memory Multi-Level Memory Better Processing-in-memory Multi-Level Memory HW Tradeoffs: capacity ratios, prefetching/transfer SW Tradeoffs: application, runtime, OS, HW control Scalable Network Studies Network on Chip Coherent system interconnect NIC Mixed Mode Simulation

Summary Gem5 will be able to use SST’s infrastructure and components Provides SST components with excellent core model Provides gem5 with a parallel discrete event framework Sustainable Current code committed Testing underway