Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering,

Slides:



Advertisements
Similar presentations
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 06: Verilog (2/3) Prof. Sherief Reda Division of.
Advertisements

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 07: Verilog (3/3) Prof. Sherief Reda Division of.
Multiplication and Division
Chapter 11 Verilog HDL Application-Specific Integrated Circuits Michael John Sebastian Smith Addison Wesley, 1997.
Table 7.1 Verilog Operators.
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Give qualifications of instructors: DAP
Data Representation COE 202 Digital Logic Design Dr. Aiman El-Maleh
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
5/4/2006BAE Analog to Digital (A/D) Conversion An overview of A/D techniques.
Number Theory and Cryptography
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Chapter 3 Arithmetic for Computers. Multiplication More complicated than addition accomplished via shifting and addition More time and more area Let's.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 17: Application-Driven Hardware Acceleration (3/4)
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 09: RC Principles: Software (2/4) Prof. Sherief Reda.
Reconfigurable Computing (EN2911X, Fall07)
11/16/2004EE 42 fall 2004 lecture 331 Lecture #33: Some example circuits Last lecture: –Edge triggers –Registers This lecture: –Example circuits –shift.
1 COMP541 Arithmetic Circuits Montek Singh Mar 20, 2007.
1 Lecture 8: Binary Multiplication & Division Today’s topics:  Addition/Subtraction  Multiplication  Division Reminder: get started early on assignment.
CSCE 211: Digital Logic Design Chin-Tser Huang University of South Carolina.
Reconfigurable Computing (EN2911X, Fall07) Lecture 05: Verilog (1/3) Prof. Sherief Reda Division of Engineering, Brown University
From Design to Verilog EECS150 Fall Lecture #4
Lecture – 5 Assembly Language Programming
Advanced Instructions Most PLCs now support more advanced functions such as Floating point math, Boolean operations, Shifting, Sequencing, Program control.
ECE 4110– Sequential Logic Design
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Programmable Logic Architecture Verilog HDL FPGA Design Jason Tseng Week 5.
Introduction to FPGA AVI SINGH. Prerequisites Digital Circuit Design - Logic Gates, FlipFlops, Counters, Mux-Demux Familiarity with a procedural programming.
ECE 2372 Modern Digital System Design
Registers CPE 49 RMUTI KOTAT.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Memory Layout and SLC500™ System Addresses. Processor Memory Division An SLC 500 processor's memory is divided into two storage areas. Like two drawers.
ECE 551 Digital System Design & Synthesis Fall 2011 Midterm Exam Overview.
Number System. Number Systems Important Number systems – Decimal – Binary – Hexadecimal.
1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.
COE 202 Introduction to Verilog Computer Engineering Department College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals.
Anurag Dwivedi. Basic Block - Gates Gates -> Flip Flops.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
Timing and Constraints “The software is the lens through which the user views the FPGA.” -Bill Carter.
A summary of TOY. 4 Main Components Data Processor Control Processor Memory Input/Output Device.
Csci 136 Computer Architecture II – Multiplication and Division
Outline Binary Addition 2’s complement Binary Subtraction Half Adder
Logic Design CS221 1 st Term combinational circuits Cairo University Faculty of Computers and Information.
Digital System Design using VHDL
Teaching Digital Logic courses with Altera Technology
ECE/CS 352 Digital System Fundamentals© T. Kaminski & C. Kime 1 ECE/CS 352 Digital Systems Fundamentals Fall 2000 Chapter 5 – Part 2 Tom Kaminski & Charles.
Integer Operations Computer Organization and Assembly Language: Module 5.
Systems Architecture, Fourth Edition 1 Processor Technology and Architecture Chapter 4.
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
Lecture 8. ALU, Shifter, Counter,
1 Modeling Synchronous Logic Circuits Debdeep Mukhopadhyay Associate Professor Dept of Computer Science and Engineering NYU Shanghai and IIT Kharagpur.
David Kauchak CS 52 – Spring 2017
ECE 2110: Introduction to Digital Systems
Lecture – 5 Assembly Language Programming
EKT 221 : Digital 2 COUNTERS.
Developing More Advanced Testbenches
Analog-to-Digital Converters
12/7/
COE 202 Introduction to Verilog
ECE 352 Digital System Fundamentals
Division and Modulo 15 Q A = Dividend B = Divisor Q = Quotient = A/B
Number Systems.
靜夜思 床前明月光, 疑是地上霜。 舉頭望明月, 低頭思故鄉。 ~ 李白 李商隱.
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering, Brown University

Reconfigurable Computing S. Reda, Brown University Runtimes by different teams 4.2 seconds 14 seconds 33 seconds 300 seconds 305 seconds 320 seconds

Reconfigurable Computing S. Reda, Brown University Cesare Ferri Rotor Le Palindrome Checker

Reconfigurable Computing S. Reda, Brown University Part I : Verilog Module CLOCK_50) begin not_palindrome = 1'd0;len = 0; tmp = number; //reset for (i = 0; i<9 ; i = i + 4'd1) begin if (tmp > 0) begin modulo = tmp % 4'd10; tmp = tmp / 10; vector[len % 9] = modulo; len = len + 1; end th = (len >> 1) ; for (j=0; j<th; j = j + 4'd1) begin tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2]; if ( tmp3 != tmp4 ) not_palindrome = 1'b1; end result = ~(not_palidrome); end DECOMPOSE THE NUMBER IN DIGITS Room for loop unrolling here..

Reconfigurable Computing S. Reda, Brown University Part I : Verilog Module CLOCK_50) begin not_palindrome = 1'd0;len = 0; tmp = number; //reset for (i = 0; i<9 ; i = i + 4'd1) begin if (tmp > 0) begin modulo = tmp % 4'd10; tmp = tmp / 10; vector[len % 9] = modulo; len = len + 1; end th = (len >> 1) ; for (j=0; j<th; j = j + 4'd1) begin tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2]; if ( tmp3 != tmp4 ) not_palindrome = 1'b1; end result = ~(not_palidrome); end COMPARE THE DIGITS STORED INTO THE VECTOR loop unrolling, again..

Reconfigurable Computing S. Reda, Brown University Optimized Verilog Code Do loop unrolling to compare digits: if (digits[0] == digits[3] && digits[1] == digits[2]) not_palindrome = 1'd1;//reset

Reconfigurable Computing S. Reda, Brown University Unsolved things Our running time now depends on the way that we extract digits from the number Some ideas to improve? –Using shift register –Using non-blocking instructions

Reconfigurable Computing S. Reda, Brown University Palindrome Homework Summary ENGN2911X Aaron Mandle Bryant Mairs

Reconfigurable Computing S. Reda, Brown University Setup Two-cycle fixed length custom instruction Operates on 20 numbers at a time Returns total palindromes in that 20- number block

Reconfigurable Computing S. Reda, Brown University Process Combinatorial conversion from binary to BCD Check number of digits Compare digits based on length Total up number of valid palindromes

Reconfigurable Computing S. Reda, Brown University Binary to BCD Conversion Built using blocks of conditional add-3 modules and shifts Add-3 modules: –4-bit input –Adds 3 if input was 5 or greater Based on adding 6 numbers > 9

Reconfigurable Computing S. Reda, Brown University module checkPalindrome(data, result); input [31:0] data; output [31:0] result; wire [3:0] digits [10:0]; wire [3:0] digCount; bin2bcd({digits[9], digits[8], digits[7], digits[6], digits[5], digits[4], digits[3], digits[2], digits[1], digits[0]}, data); assign digCount = digits[9] != 0?10: digits[8] != 0?9: digits[7] != 0?8: digits[6] != 0?7: digits[5] != 0?6: digits[4] != 0?5: digits[3] != 0?4: digits[2] != 0?3: digits[1] != 0?2: 1; assign result = digCount == 1 || digCount == 2 && (digits[0] == digits[1]) || digCount == 3 && (digits[0] == digits[2]) || digCount == 4 && (digits[0] == digits[3] && digits[1] == digits[2]) || digCount == 5 && (digits[0] == digits[4] && digits[1] == digits[3]) || digCount == 6 && (digits[0] == digits[5] && digits[1] == digits[4] && digits[2] == digits[3]) || digCount == 7 && (digits[0] == digits[6] && digits[1] == digits[5] && digits[2] == digits[4]) || digCount == 8 && (digits[0] == digits[7] && digits[1] == digits[6] && digits[2] == digits[5] && digits[3] == digits[4]) || digCount == 9 && (digits[0] == digits[8] && digits[1] == digits[7] && digits[2] == digits[6] && digits[3] == digits[5]); endmodule

Reconfigurable Computing S. Reda, Brown University Yossi

Reconfigurable Computing S. Reda, Brown University For all solutions Finding the length of the decimal representation (# digits) by: typedef unsigned long UINT; inline UINT GetMSDFIndx(UINT n) { return (n >= ? 8 : (n >= ? 7 : (n >= ? 6 : (n >= ? 5 : (n >= ? 4 : (n >= 1000 ? 3 : (n >= 100 ? 2 : (n >= 10 ? 1 : 0)))))))); }

Reconfigurable Computing S. Reda, Brown University Software Only Solutions Times: –On laptop (Intel 2333 MHz): 8 secs. –On NIOS (100 MHz): 3500 secs. Inherently sequential –Early false detection: quit the computation if we find two digits that do not match.  Brings down expected # divide operations to less than 2.2

Reconfigurable Computing S. Reda, Brown University Software Only Solutions Observations: 1. Detect whether the MSD is a given number without division –MSD test: d is the MSD of number n of length L if and only if d*10 L-1 ≤ n < (d+1)* 10 L-1 E.g 4*10 3 <= 4765 < 5* “Cut out” the MSD: 4665 – 4*10 3 = 665 and continue. Algorithm: find one LSD after another, compare with MSDs, quit early if not a palindrome. Runs in 8 seconds on laptop

Reconfigurable Computing S. Reda, Brown University Software Only Solutions On NIOS, division is really expensive Division free algorithm: Don’t test the MSD, find it with binary search

Reconfigurable Computing S. Reda, Brown University Software Only Solutions On NIOS, division is really expensive Algorithm: –Start from left –Find half of the digits –Compute the palindrome whose left half matches these digits –Compare to the tested number Loose the early false detection, but still better than division. Runs in 3500 secs on NIOS 100 MHz.

Reconfigurable Computing S. Reda, Brown University Using the Hardware A general trick to divide by a constant without using division. Based on trick I read in “Hackers Delight” of how to divide by 3. Demonstrate on divide by 10: –Given: number n < 2 30 –Needed: floor(n/10) –Algorithm: Multiply n by ( )/10 = 0xCCCCCCD, and then shift right 31 positions.

Reconfigurable Computing S. Reda, Brown University Division Free divide by 10 Algorithm: Multiply n<2 30 by ( )/10 = 0xCCCCCCD, and then shift right 31 positions. Proof: The above algorithm outputs: floor[ n * (( )/10) * 1/2 31 ] = floor [ n/10 + 2*n/(10*2 31 ) ] n < 2 30 implies: 2*n < 2 31  < 1/10 floor(n/10) <= n/10 <= floor(n/10) + 9/10 floor(n/10) <= n/10 + 2*n/(10*231) < floor(n/10) + 1 = floor(n/10)

Reconfigurable Computing S. Reda, Brown University Divide by Constant Similarly, to divide n by a constant C, we need to find P and R such that: –2 P + R = 0 mod C. –R*n < 2 P And then multiply n by (2 P + R)/C, and shift right P positions. Found the constants to all powers of 10 needed. Algorithm worst register to register delay: 25 ns. Run Time: 33 secs.

Reconfigurable Computing S. Reda, Brown University EN2911X Lab 2: Palindromes Brian Reggiannini and Chris Erway

Reconfigurable Computing S. Reda, Brown University Checking a palindrome All combinational logic! Step 1: Convert 30-bit integer to 37-bit binary-coded decimal (BCD) format Step 2: Detect the length of decimal number Step 3: Compare pairs of digits with XOR

Reconfigurable Computing S. Reda, Brown University Binary to BCD converter

Reconfigurable Computing S. Reda, Brown University Binary to BCD converter

Reconfigurable Computing S. Reda, Brown University Integration with Nios II Worst-case propagation delay: 43ns, 5 cycles Don’t want to wait! Use 32-bit PIO interface Array of 25 palindrome-checking units Write out 32-bit start value… –Read back # of total palindromes found (from next 25) –While Nios is waiting: increment loop counter

Reconfigurable Computing S. Reda, Brown University Nios Software

Reconfigurable Computing S. Reda, Brown University Results Original C program: 49.59s/billion Unoptimized Nios C program: 7842s/100million Final result: 4.2s/billion ( MHz) –Total logic elements: 23,039 / 33,216 (69%)