Instructor: Dr. Phillip Jones

Slides:



Advertisements
Similar presentations
1 - ECpE 583 (Reconfigurable Computing): XPS / MP3 Overview + Midterm Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 15:
Advertisements

1 - ECpE 583 (Reconfigurable Computing): State Machines (Part 2) Iowa State University (Ames) ECpE 583 Reconfigurable Computing Lect 5: Tues 9/9/2008 (State.
Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.
Digital signature using MD5 algorithm Hardware Acceleration
1 - ECpE 583 (Reconfigurable Computing): Placing Applications onto FPGAs, Part II Iowa State University (Ames) ECpE 583 Reconfigurable Computing Lecture.
1 - ECpE 583 (Reconfigurable Computing): Course overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 1: Wed 8/24/2011 (Course.
1 - CPRE 583 (Reconfigurable Computing): FPGA Features and Convey Computer HC-1 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 - CPRE 583 (Reconfigurable Computing): Exam 1 Review Session Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 13: Wed 10/5/2011.
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Efficient FPGA Implementation of QR
1 - CPRE 583 (Reconfigurable Computing): Floating Point Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 14: Fri 10/12/2011 (Floating.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
1 - CPRE 583 (Reconfigurable Computing): Compute Models Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 7: Wed 10/28/2009 (Compute.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Systems Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 8: Wed.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.
1 - CPRE 583 (Reconfigurable Computing): Reconfiguration Management Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: Wed 10/14/2009.
1 - CPRE 583 (Reconfigurable Computing): Reconfiguration Management Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 11: Wed 9/28/2011.
1 - ECpE 583 (Reconfigurable Computing): Map, Place & route Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 24: Wed 12/8/2010 (Map,
1 - CPRE 583 (Reconfigurable Computing): System Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 13: Fri 10/8/2010.
1 - CPRE 583 (Reconfigurable Computing): Compute Models Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 20: Wed 11/2/2011 (Compute.
1 - CPRE 583 (Reconfigurable Computing): System Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 21: Fri 11/4/2011.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
1 Implementation of Polymorphic Matrix Inversion using Viva Arvind Sudarsanam, Dasu Aravind Utah State University.
1 - ECpE 583 (Reconfigurable Computing): CoreGen Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 18: Wed 10/26/2011 (CoreGen.
1 - CPRE 583 (Reconfigurable Computing): Evolvable Hardware Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 24: Fri 11/18/2011 (Evolvable.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
1 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 23:
1 - CPRE 583 (Reconfigurable Computing): Compute Models Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 12: Wed 10/6/2010 (Compute.
1 - CPRE 583 (Reconfigurable Computing): Floating Point Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 18: Fri 10/27/2010 (Floating.
1 - ECpE 583 (Reconfigurable Computing): Project Introductions Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 16: Wed 10/14/2011.
1 - CPRE 583 (Reconfigurable Computing): Design Patterns Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 19: Fri 10/28/2011 (Design.
1 - CPRE 583 (Reconfigurable Computing): Streaming Applications Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 10: Fri 11/13/2009.
1 - ECpE 583 (Reconfigurable Computing): Midterm Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 17: Wed 10/21/2011 (Midterm.
Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:
Backprojection Project Update January 2002
Instructor: Dr. Phillip Jones
School of Engineering University of Guelph
Instructor: Dr. Phillip Jones
Reconfigurable Computing (High-level Acceleration Approaches)
Instructor: Dr. Phillip Jones
CPRE 583 Reconfigurable Computing Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
FPGAs in AWS and First Use Cases, Kees Vissers
CPRE 583 Reconfigurable Computing
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
CPRE 583 Reconfigurable Computing
Instructor: Dr. Phillip Jones
CPRE 583 Reconfigurable Computing
CPRE 583 Reconfigurable Computing Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
CPRE 583 Reconfigurable Computing
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Instructor: Dr. Phillip Jones
Presentation transcript:

Instructor: Dr. Phillip Jones CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/

Announcements/Reminders HW2: Due Wed 10/6 Problem 2 will have a separate deadline (to be announced) MP2: Due Fri 10/1 (you can work in pairs) Make sure to read the README file in the MP2 distribution Contains info on how to fix a Gigabit core licensing issue ISE has Start thinking of class projects and forming teams Submit teams and project ideas: Mon 10/11 midnight Project proposal presentations: Wed 10/20

Projects Expectations Working system Write up that can potentially be submitted to a conference Will use DAC format as write up guide line 15-20minute PowerPoint Presentation DAC (Design Automation Conference) http://www2.dac.com/ Conference papers Due Date: 5pm (MT) Thur 11/18/2010 Student Design Contest Due Date: 5pm (MT) Wed 11/24/2010,Cash Prizes!

Projects Ideas: Relevant conferences FPL FPT FCCM FPGA DAC ICCAD Reconfig RTSS RTAS ISCA Micro Super Computing HPCA IPDPS

Initial Project Proposal Slides (5-10 slides) Project team list: Name, Responsibility (who is project leader) Project idea Motivation (why is this interesting, useful) What will be the end result High-level picture of final product High-level Plan Break project into mile stones Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. System block diagrams High-level algorithms (if any) Concerns Implementation Conceptual Research papers related to you project idea

Weekly Project Updates The current state of your project write up Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section The current state of your Final Presentation Your Initial Project proposal presentation (Due Wed 10/20). Should make for a starting point for you Final presentation What things are work & not working What roadblocks are you running into

Projects: Target Timeline Teams Formed and Idea: Mon 10/11 Project idea in Power Point 3-5 slides Motivation (why is this interesting, useful) What will be the end result High-level picture of final product Project team list: Name, Responsibility High-level Plan/Proposal: Wed 10/20 Power Point 5-10 slides System block diagrams High-level algorithms (if any) Concerns Implementation Conceptual Related research papers (if any)

Projects: Target Timeline Work on projects: 10/22 - 12/8 Weekly update reports More information on updates will be given Presentations: Last Wed/Fri of class Present / Demo what is done at this point 15-20 minutes (depends on number of projects) Final write up and Software/Hardware turned in: Day of final (TBD)

Common Questions

Overview First 15 minutes of Google FPGA lecture How to run Gprof Discuss some high-level approaches for accelerating applications.

What you should learn Start to get a feel for approaches for accelerating applications.

Why use Customize Hardware? Great talk about the benefits of Heterogeneous Computing http://video.google.com/videoplay?docid=-4969729965240981475#

Profiling Applications Finding bottlenecks Profiling tools gprof: http://www.cs.nyu.edu/~argyle/tutorial.html Valgrind

Pipelining How many ns to process to process 100 input vectors? Assuming each LUT Has a 1 ns delay. Input vector <A,B,C,D> output A 4-LUT 4-LUT 4-LUT 4-LUT B C DFF DFF DFF DFF D How many ns to process 100 input vectors? Assume a 1 ns clock 4-LUT B C D A DFF 1 DFF delay per output

Pipelining (Systolic Arrays) Dynamic Programming Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner.

Pipelining (Systolic Arrays) Dynamic Programming Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1

Pipelining (Systolic Arrays) Dynamic Programming Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 1 1

Pipelining (Systolic Arrays) Dynamic Programming 1 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 1 1 1

Pipelining (Systolic Arrays) Dynamic Programming 1 3 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1

Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1

Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1 How many ns to process if CPU can process one cell per clock (1 ns clock)?

Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1 How many ns to process if FPGA can obtain maximum parallelism each clock? (1 ns clock)

Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 Start with base case Lower left corner Formula for computing numbering cells 3. Final result in upper right corner. 1 2 3 1 1 1 What speed up would an FPGA obtain (assuming maximum parallelism) for an 100x100 matrix. (Hint find a formula for an NxN matrix)

Dr. James Moscola (Example) MATL2 D10 ML9 MATP1 IL7 IR8 END3 E12 IL11 ROOT0 MP3 D6 MR5 ML4 S0 IL1 IR2 c g a 1 2 3 ROOT0 MATP1 MATL2 END3 1 2 3

Example RNA Model 1 2 3 ROOT0 MATP1 MATL2 END3 1 2 3 MATL2 MATP1 END3 ML9 MATP1 IL7 IR8 END3 E12 IL11 ROOT0 MP3 D6 MR5 ML4 S0 IL1 IR2 c g a 1 2 3 ROOT0 MATP1 MATL2 END3 1 2 3

Baseline Architecture Pipeline END3 MATL2 MATP1 ROOT0 E12 IL11 D10 ML9 IR8 IL7 D6 MR5 ML4 MP3 IR2 IL1 S0 u g g c g a c a c c c residue pipeline

Processing Elements IL7,3,2 IR8,3,2 ML9,3,2 D10,3,2 ML4 + = + = + = + 1 2 3 .40 -INF .22 .72 .30 .44 1 j  IL7,3,2 2 + ML4_t(7) = 3 IR8,3,2 + ML4_t(8) = ML9,3,2 + ML4_t(9) = D10,3,2 + + ML4,3,3 = .22 ML4_t(10) ML4_e(A) ML4_e(C) ML4_e(G) ML4_e(U) input residue, xi

Baseline Results for Example Model Comparison to Infernal software Infernal run on Intel Xeon 2.8GHz Baseline architecture run on Xilinx Virtex-II 4000 occupied 88% of logic resources run at 100 MHz Input database of 100 Million residues Bulk of time spent on I/O (41.434s)

Expected Speedup on Larger Models Name Num PEs Pipeline Width Pipeline Depth Latency (ns) HW Processing Time (seconds) Total Time with measured I/O (seconds) Infernal Time (seconds) Infernal Time (QDB) (seconds) Expected Speedup over Infernal Expected Speedup over Infernal (w/QDB) RF00001 3539545 39492 195 19500 1.0000195 42.4340195 349492 128443 8236 3027 RF00016 5484002 43256 282 28200 1.0000282 42.4340282 336000 188521 7918 4443 RF00034 3181038 38772 187 18700 1.0000187 42.4340187 314836 87520 7419 2062 RF00041 4243415 44509 206 20600 1.0000206 42.4340206 388156 118692 9147 2797 Example 81 26 6 600 1.0000006 42.4340006 1039 868 25 20 Speedup estimated ... using 100 MHz clock for processing database of 100 Million residues Speedups range from 500x to over 13,000x larger models with more parallelism exhibit greater speedups

Distributed Memory ALU Cache BRAM BRAM PE BRAM BRAM

Next Class Models of Computation (Design Patterns)

Questions/Comments/Concerns Write down Main point of lecture One thing that’s still not quite clear If everything is clear, then give an example of how to apply something from lecture OR