JavaTile: CMP-simulation with a twist Dan Greenfield Computer Architecture Group Internal Presentation, 16 th February 2007.

Slides:



Advertisements
Similar presentations
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
Advertisements

Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Java: History and Introduction (Lecture # 1). History… Java – Based on C and C++ – Developed in 1991 for intelligent consumer electronic devices – Green.
Instruction Level Parallelism (ILP) Colin Stevens.
Introduction to Java Programming
Trevor Burton6/19/2015 Multiprocessors for DSP SYSC5603 Digital Signal Processing Microprocessors, Software and Applications.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
JETT 2003 Java.compareTo(C++). JAVA Java Platform consists of 4 parts: –Java Language –Java API –Java class format –Java Virtual Machine.
Chapter Hardwired vs Microprogrammed Control Multithreading
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
February 21, 2008 Center for Hybrid and Embedded Software Systems Mapping A Timed Functional Specification to a Precision.
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 The Component Interaction Domain: Modeling Event-Driven and Demand- Driven Applications.
Introduction to Java.
1 Compiling with multicore Jeehyung Lee Spring 2009.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
Computer System Architectures Computer System Software
CSCI 224 Introduction to Java Programming. Course Objectives  Learn the Java programming language: Syntax, Idioms Patterns, Styles  Become comfortable.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
COMP25212: Virtualization Learning Objectives: a)To describe aims of virtualization - in the context of similar aims in other software components b)To.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
CSC 253 Lecture 2. Some differences between Java and C  Compiled C code is machine specific, whereas Java compiles for a virt. machine.  Virtual machines.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction to Programming 1 1 2Introduction to Java.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Applications Active Web Documents Active Web Documents.
What Do Computers Do? A computer system is
Before You Begin Nahla Abuel-ola /WIT.
A Closer Look at Instruction Set Architectures
Introduction Enosis Learning.
Improving java performance using Dynamic Method Migration on FPGAs
Hyperthreading Technology
Multi-Processing in High Performance Computer Architecture:
Introduction Enosis Learning.
Precision Timed Machine (PRET)
Summary Background Introduction in algorithms and applications
5.6 Real-World Examples of ISAs
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Research: Past, Present and Future
Run time performance for all benchmarked software.
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

JavaTile: CMP-simulation with a twist Dan Greenfield Computer Architecture Group Internal Presentation, 16 th February 2007

Aim of Talk Introduce JavaTile Show benefits and problems of approach Spark interest in collaboration Invite expertise from multiple areas to solve CMP problems

Quick Background: Exciting Times! Intel 80-core (1+ TFLOPS) [1] Cisco 188-core (50 BIPS) [2]

Parts of a CMP Q: How well do each of the components run? Q: How well does the network run? From Pestata et al 2004 [3]

Parts of a CMP: continued Real Q: How well do Applications run?

Motivations Need more realistic NoC traffic –Current methods: synthetic, limited applications, low PE count, course-grain, OO Superscalar internals –How is the network used? –What is needed in NoC for future CMP? Want System-level view of performance, power and fault-tolerance –Most current metrics concern the NoC and 'guess' what this means for the system-level Want to explore solutions at all levels

Some Existing CMP Approaches SimpleScalar-based CMP simulator –Hydra 4 MIPS-core CMP simulator –CMP-SIM (extension of SimpleScalar) SESC Superscalar (1.5MIPS on 3GHz P4) GEMS (commercial SIMICS-based) ML-RSIM (Sparc RSIM-based)

Java Virtual Machine Platform with standard library Virtual Processor executing Java instruction set 'bytecode' Compilable to native platform

Java Advantages A widely deployed standard platform Its 'machine code' is itself Object Oriented with type information Amenable to static code analysis Tools to run efficiently, or compile to native executable

JavaTile Processing Element

JavaTile System

Bytecode Instrumentation Hook into all instructions that may cause NoC traffic Fibonacci2(); Code: 0: bipush 0 2: bipush -33 4: invokestatic #23; //Method monitor/Monitor.methodStart:(II)V 7: sipush : sipush 0 13: invokestatic #26; //Method monitor/Monitor.jumpMarker:(II)V 16: aload_0 17: sipush 1 20: invokestatic #30; //Method monitor/Monitor.syncCycleCount:(I)V 23: invokespecial #32; //Method java/lang/Object." ":()V 26: sipush : sipush 4 32: invokestatic #35; //Method monitor/Monitor.postMethodCall:(II)V 35: return

Current Flow

Problems Garbage Collection Local memory vs global memory allocation Passing by pointers (ownership) Push versus Pull No Inlining Auto-Parallelization Debugging

Auto-Parallelization Software Pipelining –e.g. MIT RAW Compiler [4] –e.g. Princeton DSWP (Decoupled SWP) [5] Thread-Level Speculation –Loop-level (e.g. Stanford Jrpm) [6] –Method-level (e.g. SableSpMT) [7] Affine Partitioning –e.g. Incorporated in Stanford SUIF [8]

References [1] Intel Polaris, from IDF 2006 slides, photo at [2] W. Eatherton, “The Push of Network Processing to the Top of the Pyramid,” Keynote Slides at: [3] Pestata et al, Cost-Performance Trade-Offs in Networks on Chip: A Simulation- Based Approach, DATE 2004 [4] Waingold et al, Baring it All to Software: Raw Machines, Computer Vol 30, 9, 1997 [5] Ottoni et al, Automatic Thread Extraction with Decoupled Software Pipelining. MICRO 2005 [6] Chen et al, The Jrpm System for Dynamically Parallelizing Sequential Java Programs, IEEE Micro Vol 23, No 6, Nov/Dec 2003 [7] Pickett et al, SableSpMT: a software framework for analysing speculative multithreading in Java, PASTE Workshop 2006 [8] Lim et al, An affine partitioning algorithm to maximize parallelism and minimize communications, ACM SIGARCH 1999