Java Flowpaths: Efficiently Generating Circuits for Embedded Systems from Java WorldComp ESA 2006 Las Vegas, Nevada EXCERPT Darrin Hanna, Michael DuChene,

Slides:



Advertisements
Similar presentations
More Intel machine language and one more look at other architectures.
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
Maciej Gołaszewski Tutor: Tadeusz Sondej, PhD Design and implementation of softcore dual processor system on single chip FPGA Design and implementation.
1 1 Lecture 14 Java Virtual Machine Instructors: Fu-Chiung Cheng ( 鄭福炯 ) Associate Professor Computer Science & Engineering Tatung Institute of Technology.
JAVA Processors and JIT Scheduling. Overview & Literature n Formulation of the problem n JAVA introduction n Description of Caffeine * Literature: “Java.
CSE 378 Computer Hardware Design Prof. Richard E. Haskell – –Tel: –Web site: Follow.
A Massively Parallel Architecture for Bioinformatics Presented by Md Jamiul Jahid.
Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho Luigi Carro Instituto.
Lab 9 Java Bytecode & The Jasmin Assembler
Aug. 24, 2007ELEC 5200/6200 Project1 Computer Design Project ELEC 5200/6200-Computer Architecture and Design Fall 2007 Vishwani D. Agrawal James J.Danaher.
Embedded System Design Using FPGAs Module F1-1. What is an Embedded System It is not a PC! Most computers in the world do not have a keyboard and screen.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved The Microarchitecture Level.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
CSE 378 Computer Hardware Design Prof. Richard E. Haskell – –Tel: –Web site: Follow.
5/6/99 Ashish Sabharwal1 JVM Architecture n Local storage area –Randomly accessible –Just like standard RAM –Stores variables (eg. an array) –Have to specify.
1 Memory Model of A Program, Methods Overview l Memory Model of JVM »Method Area »Heap »Stack.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Processor Types And Instruction Sets Barak Perelman CS147 Prof. Lee.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
Code Generation Introduction. Compiler (scalac, gcc) Compiler (scalac, gcc) machine code (e.g. x86, arm, JVM) efficient to execute i=0 while (i < 10)
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
Programming Languages Generations
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Arpit Jain Mtech1. Outline Introduction Dalvik VM Java VM Examples Comparisons Experimental Evaluation.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schöberl.
Krakatoa: Decompilation in Java “Does Bytecode Reveal Source?” Todd A. Proebsting Scott A. Watterson The University of Arizona Presented by Karl von Randow.
Microcontrollers in FPGAs Tomas Södergård University of Vaasa.
Implement High-level Program Language on JVM CSCE 531 ZHONGHAO LIU ZHONGHAO LIU XIAO LIN.
Specification and Design of Quasi- Delay-Insensitive Java Card Microprocessor Fu-Chiung Cheng & Chuin-Ren Wang Dept. of Computer Science and Engineering.
Java Bytecode What is a.class file anyway? Dan Fleck George Mason University Fall 2007.
ITEC 352 Lecture 20 JVM Intro. Functions + Assembly Review Questions? Project due today Activation record –How is it used?
CS1Q Computer Systems Lecture 14 Simon Gay. Lecture 14CS1Q Computer Systems - Simon Gay2 Where we are Global computing: the Internet Networks and distributed.
Lecture 10 : Introduction to Java Virtual Machine
Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl.
1 Introduction to JVM Based on material produced by Bill Venners.
Roopa.T PESIT, Bangalore. Source and Credits Dalvik VM, Dan Bornstein Google IO 2008 The Dalvik virtual machine Architecture by David Ehringer.
An Efficient Stack Machine Martin Schöberl. JOP Stack Architecture2 Overview JVM stack machine Parameter passing Stack access patterns Common stack caches.
CS 147 June 13, 2001 Levels of Programming Languages Svetlana Velyutina.
Java Virtual Machine Case Study on the Design of JikesRVM.
The Java profiler based on byte code analysis and instrumentation for many-core hardware accelerators Marcin Pietroń 1,2, Michał Karwatowski 1,2, Kazimierz.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
More on MIPS programs n SPIM does not support everything supported by a general MIPS assembler. For example, –.end doesn’t work Use j $ra –.macro doesn’t.
Design of a 8-bit RISC Micro controller Core By Ayush Mittal( ) Rakesh Kumar Sahoo( ) Under Guidance of Dr. M.B.Srinivas.
CS 8625 High Performance Computing Dr. Hoganson Copyright © 2003, Dr. Ken Hoganson CS8625 Class Will Start Momentarily… CS8625 High Performance.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
On Implementing Sorting Network Machines with FPGAs
Review on Program Challenge CSc3210 Yuan Long.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
RealTimeSystems Lab Jong-Koo, Lim
George Mason University Finite State Machines Refresher ECE 545 Lecture 11.
The Java Virtual Machine (JVM)
Topic: Difference b/w JDK, JRE, JIT, JVM
Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 4 – The Instruction Set Architecture.
Improving java performance using Dynamic Method Migration on FPGAs
CSc 453 Interpreters & Interpretation
M S COLLEGE ART’S, COMM., SCI. & BMS
CSc 453 Interpreters & Interpretation
Presentation transcript:

Java Flowpaths: Efficiently Generating Circuits for Embedded Systems from Java WorldComp ESA 2006 Las Vegas, Nevada EXCERPT Darrin Hanna, Michael DuChene, Girma Tewolde, Jay Sattler Oakland University, Rochester, Michigan Kettering University, Flint, Michigan June 27, 2006

Overview Motivation Background Some examples Class Instantiation in Flowpaths Implementing Parallel Flowpaths Results

Background Flowpaths – a type of SPP Generated using a Particular Method for Translating Stack-based Programs Directly to FPGAs –Java  Java Byte Codes (Stack-based IR) –Forth Words as Flowpaths: Ops, Connections, and State Machines Converting Flowpaths to VHDL –Euclid’s Greatest Common Divisor Algorithm Sieve of Eratosthenes: A performance comparison

Executing the algorithm as an SPP without a Microprocessor Flowpaths – a type of SPP Generated using a Particular Method for Translating Stack-based Programs Directly to FPGAs

Java Byte Codes as Flowpaths: Ops, Connections, and State Machines FRAME Operand Stack Local Variable Array Constant Pool from Class invoking the method Each thread has a JVM stack that stores frames A frame is created each time a method is invoked

Java Byte Codes as Flowpaths: Ops, Connections, and State Machines FRAME Operand Stack Local Variable Array LOAD-EXECUTE-STORE STACK MANIPULATION Flowpath Datapath Controller OP MUX Operand Stack Local Variables

Java Byte Codes as Flowpaths: Ops, Connections, and State Machines TRADITIONAL LOAD-EXECUTE-STORE STACK MANIPULATION FLOWPATHS Operations – Sequential isub, iadd, etc… Data Manipulation – Connections iload, istore ZERO clock cycles … iload_1 iload_2 iadd istore_1 … 4 clock cycles Only 1 clock cycle

Converting Flowpaths to VHDL Euclid’s GCD Algorithm:

Converting Flowpaths to VHDL Euclid’s GCD Algorithm: Methods that implements each variable as a register results in over-crowded routing

Converting Flowpaths to VHDL Euclid’s GCD Algorithm

Sieve of Eratosthenes

A circuit and state machine developed “by hand” observing the behavior of the algorithm Serves as an optimal implementation

Sieve of Eratosthenes Experiments using a Xilinx Spartan IIE FPGA FPGA-VHDL (hand implementation) took 233 Slices Flowpath took 295 Slices

Experimental Results Quick Sort algorithm sorting 4096 random numbers

Experimental Results Genetic Algorithm - population size of 50, probability of mutation 10%, and a probability of cross-over 20% run for 10 generations

Multi-threaded Experimental Results (Parallel) Pentium 4 PC ModuleClock Cycles (rounded) Prod/Cons Test314,400,000 Producer 1145,000 Producer 21,926,000 Consumer9,600,000 The Producer/Consumer Test took 40 clock cycles in the Flowpath! JStamp 121,200 clock cycles (Microcontroller that executes Java bytecode directly) ~20,000 gates

Conclusion Hardware can be generated directly from Java programs using Flowpaths There are enormous performance benefits to using Flowpaths instead of a JVM on a microprocessor Parallel algorithms with or without shared resources can easily be developed. These will truly execute in parallel, in the hardware sense

Thank You! Darrin Hanna, Michael DuChene, Girma Tewolde, Jay Sattler Oakland University Rochester, Michigan June 27, 2006