CS294-6 Reconfigurable Computing Day 9 September 22, 1998 Project Startup: Mediabench With annotations from class discussion.

Slides:



Advertisements
Similar presentations
CSC 360- Instructor: K. Wu Overview of Operating Systems.
Advertisements

Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
The Design Process Outline Goal Reading Design Domain Design Flow
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
CS294-6 Reconfigurable Computing Day 6 September 10, 1998 Comparing Computing Devices.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
December 5, 2001MICRO-34, Austin, Texas Cool-Cache for Hot Multimedia Osman S. Unsal, Raksit Ashok, Israel Koren, C. Mani Krishna, Csaba Andras Moritz.
CS294-6 Reconfigurable Computing Day 22 November 5, 1998 Requirements for Computing Systems (SCORE Introduction)
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
1 Other Technologies Off-the-shelf logic (SSI) IC –Logic IC has a few gates, connected to IC's pins Known as Small Scale Integration (SSI) –Popular logic.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 9: February 7, 2007 Instruction Space Modeling.
MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
Systems Analysis and Design in a Changing World, 6th Edition
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Penn ESE Spring DeHon 1 FUTURE Timing seemed good However, only student to give feedback marked confusing (2 of 5 on clarity) and too fast.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
GPGPU platforms GP - General Purpose computation using GPU
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Copyright© Jeffrey Jongko, Ateneo de Manila University Android.
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.
Telecommunications and Signal Processing Seminar Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * The University of Texas at.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
A new perspective on processing-in-memory architecture design These data are submitted with limited rights under Government Contract No. DE-AC52-8MA27344.
CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
Application Software System Software.
Full and Para Virtualization
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 11: February 20, 2012 Instruction Space Modeling.
SUBJECT : DIGITAL ELECTRONICS CLASS : SEM 3(B) TOPIC : INTRODUCTION OF VHDL.
Microarchitecture.
James Coole PhD student, University of Florida Aaron Landy Greg Stitt
Application-Specific Customization of Soft Processor Microarchitecture
Chapter 9 – Real Memory Organization and Management
CS184a: Computer Architecture (Structure and Organization)
FPGA: Real needs and limits
Introduction to Reconfigurable Computing
Virtualization Techniques
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Presentation transcript:

CS294-6 Reconfigurable Computing Day 9 September 22, 1998 Project Startup: Mediabench With annotations from class discussion

Today Project –Goals –Tuning (get feedback from class) –Benchmark set –…more architecture/compute model context

Pedagogical Goal Give students an appreciation for tradeoffs designing “post-fabrication” programmable computing devices –focus on spatial architectures benefits design

Ideal Design computing array for benchmark set and quantify benefits

Ideal Design computing array for benchmark set and quantify benefits –too much for one class?

Pragmatic How do we get most of the pedagogical value within the scope of this class?

High level things want to see Where spatial computing better than processor? –Worse? How optimize a design for spatial execution? –What’s different about? How tune/optimize spatial architecture?

Can’t “Do it All” Pick focussed pieces

What do you want? What are your most burning, unanswered questions in this area? –What should we try to answer? –What would you like to learn?

Burning Questions Real numbers for compute array versus processor –larger than Day 5 examples –computational density –energy/op How exploit real-time reconfiguration –swapping efficiently –spatio/temporal tradeoffs –virtualization (decompose, run-time manage)

Burning Questions Design memory for embedding with array –size distribution memory hierarchy –interconnect to –interfacing physical control

Burning Questions Good automatic compilation possible? –How describe ease mapping ease user job Costs and overheads to –upgradability –portability

Burning Questions What’s wrong w/ fixed length (processor) word model? –Architecturally –description –work knowledge of data size req. into specification –annotations requested by compiler?

Burning Questions How should P FPGA talk to each other –change ISA? –How does presence of RC change workload (requirements) for processor? Beneficial use RC as GP platform? –How attack? Computational model?

Burning Questions Homogenous/Heterogeneous architecture? Generation n+1 mainstream processor? Minimum array size to be useful? –Embedded cost sensitivity –general: benefit vs. array size

Burning Questions Applications –make new things viable which are not viable today? ? Anything other than putting 10x 100x computation in affordable package? –Building a better mousetrap isn’t enough? Lag to exploit technology? Low on innovation side?

Burning Questions System-on-a-chip designs? –Design complexity (up) –IP –…surely do something innovative with (?) –place for spatial building blocks

AMD (Benefit) Where beneficial? –(expand day 5 comparisons) Power implications –Spatial have benefit? –When/where?

AMD (How Use) Area Time tradeoffs –What look like? –How achieve? –Importance? Specialization –What opportunities exist? –Importance of exploiting? –How exploit?

AMD (How Use) How do we build programs? Including: –Convenience? –Abstraction? –Application longevity? –Virtualization

AMD (Architecture) How compute (media apps) requirements differ from random logic? –Interconnect less? More stylized? –Retiming depth, heirarchy?

Original Plan mediabench kernels start with HSRA architecture and tools series of weekly projects –(to be tuned based on student feedback and areas of interest) final writeup

Original Plan: Exercises Analyze sequential implementation Build spatial (HSRA) implementation –compare yielded density w/ sequential Model power –compare spatial/sequential

Original Plan: Exercises Interconnect –c,p for application –“right” amount of interconnect –non-Rent structure in application interconnect? –Quality of hand autoplacement ? –??? Heirarchical vs mesh style ???

Original Plan Retiming –depth distribution –hierarchy when use memories, what sizes? –?? Output vs. input ??

Original Plan: Exercises Specialization –opportunities in your application –binding times –benefits Programming –How fit into stream model of full computation? –Scheduling/virtualization?

Benchmark Set No need to hype multimedia as computational driver? See many “band-aids” on conventional architectures to handle –MMX, VIS Desire for “programmable” solutions –multi/evolving standards –one device does it all

Benchmarks Audio –adpcm –g.721 –gsm Still Image –epic –JPEG Video –MPEG-2 Encryption –pegwit –pgp Rendering –mesa –ghostscript(?) Speech Recognition –rasta

Hypothesis Spatial processing a better solution than “tweaking” ISAs –broader applicability –greater computational density –lower power Special-Purpose processing units brittle –e.g. FIR

FIR and MMX …but you saw on SPACE2/CYCLE that not everything works this well (at least easily).

DCT and MMX? Claim from processor crowd that FPGA version only 2x better than MMX? –AMD skeptical

Project Goals Learn about architectural design –pedagogy build intuition on key characteristics of arch. how attack architectural design –expand what is known (research) What’s good where? Why? Application characteristics? How tune architectures?