Java Flowpaths: Efficiently Generating Circuits for Embedded Systems from Java WorldComp ESA 2006 Las Vegas, Nevada EXCERPT Darrin Hanna, Michael DuChene, Girma Tewolde, Jay Sattler Oakland University, Rochester, Michigan Kettering University, Flint, Michigan June 27, 2006
Overview Motivation Background Some examples Class Instantiation in Flowpaths Implementing Parallel Flowpaths Results
Background Flowpaths – a type of SPP Generated using a Particular Method for Translating Stack-based Programs Directly to FPGAs –Java Java Byte Codes (Stack-based IR) –Forth Words as Flowpaths: Ops, Connections, and State Machines Converting Flowpaths to VHDL –Euclid’s Greatest Common Divisor Algorithm Sieve of Eratosthenes: A performance comparison
Executing the algorithm as an SPP without a Microprocessor Flowpaths – a type of SPP Generated using a Particular Method for Translating Stack-based Programs Directly to FPGAs
Java Byte Codes as Flowpaths: Ops, Connections, and State Machines FRAME Operand Stack Local Variable Array Constant Pool from Class invoking the method Each thread has a JVM stack that stores frames A frame is created each time a method is invoked
Java Byte Codes as Flowpaths: Ops, Connections, and State Machines FRAME Operand Stack Local Variable Array LOAD-EXECUTE-STORE STACK MANIPULATION Flowpath Datapath Controller OP MUX Operand Stack Local Variables
Java Byte Codes as Flowpaths: Ops, Connections, and State Machines TRADITIONAL LOAD-EXECUTE-STORE STACK MANIPULATION FLOWPATHS Operations – Sequential isub, iadd, etc… Data Manipulation – Connections iload, istore ZERO clock cycles … iload_1 iload_2 iadd istore_1 … 4 clock cycles Only 1 clock cycle
Converting Flowpaths to VHDL Euclid’s GCD Algorithm:
Converting Flowpaths to VHDL Euclid’s GCD Algorithm: Methods that implements each variable as a register results in over-crowded routing
Converting Flowpaths to VHDL Euclid’s GCD Algorithm
Sieve of Eratosthenes
A circuit and state machine developed “by hand” observing the behavior of the algorithm Serves as an optimal implementation
Sieve of Eratosthenes Experiments using a Xilinx Spartan IIE FPGA FPGA-VHDL (hand implementation) took 233 Slices Flowpath took 295 Slices
Experimental Results Quick Sort algorithm sorting 4096 random numbers
Experimental Results Genetic Algorithm - population size of 50, probability of mutation 10%, and a probability of cross-over 20% run for 10 generations
Multi-threaded Experimental Results (Parallel) Pentium 4 PC ModuleClock Cycles (rounded) Prod/Cons Test314,400,000 Producer 1145,000 Producer 21,926,000 Consumer9,600,000 The Producer/Consumer Test took 40 clock cycles in the Flowpath! JStamp 121,200 clock cycles (Microcontroller that executes Java bytecode directly) ~20,000 gates
Conclusion Hardware can be generated directly from Java programs using Flowpaths There are enormous performance benefits to using Flowpaths instead of a JVM on a microprocessor Parallel algorithms with or without shared resources can easily be developed. These will truly execute in parallel, in the hardware sense
Thank You! Darrin Hanna, Michael DuChene, Girma Tewolde, Jay Sattler Oakland University Rochester, Michigan June 27, 2006