CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

Slides:



Advertisements
Similar presentations
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Greedy Algorithms.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 22: December 2, 2005 Routing 2 (Pathfinder)
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS.
CALTECH CS137 Spring DeHon CS137: Electronic Design Automation Day 5: April 15, 2002 Systolic Algorithms Andre' DeHon.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
BRASS SCORE: Eylon Caspi, Randy Huang, Yury Markovskiy, Joe Yeh, John Wawrzynek BRASS Research Group University of California, Berkeley Stream Computations.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 11, 2008 Static Timing Analysis and Multi-Level Speedup.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2009 Partitioning 2 (spectral, network flow)
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Memory Management. Process must be loaded into memory before being executed. Memory needs to be allocated to ensure a reasonable supply of ready processes.
CBSSS 2002: DeHon Architecture as Interface André DeHon Friday, June 21, 2002.
Compilation for Scalable, Paged Virtual Hardware Eylon Caspi Qualifying Exam 3/6/01 University of California, Berkeley IAIA IBIB OAOA OBOB.
CALTECH CS137 Spring DeHon CS137: Electronic Design Automation Day 9: May 6, 2002 FSM Equivalence Checking.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 8: February 4, 2004 Fault Detection.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 14: May 24, 2001 SCORE.
Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: February 4, 2014 Partitioning 2 (spectral, network flow)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
Memory Management Operating Systems CS550. Memory Manager Memory manager - manages allocation and de-allocation of main memory Plays significant impact.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 15: March 4, 2002 Two-Level Logic-Synthesis.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: February 27, 2002 Routing 2 (Pathfinder)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 10: February 1, 2006 Dynamic Programming.
Static Timing Analysis
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 31, 2003 Compute 2:
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 4, 2011 Static Timing Analysis and Multi-Level Speedup.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Graphcut Textures:Image and Video Synthesis Using Graph Cuts
CS184b: Computer Architecture (Abstractions and Optimizations)
CS137: Electronic Design Automation
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
ESE534: Computer Organization
ESE534: Computer Organization
CS184a: Computer Architecture (Structures and Organization)
ESE535: Electronic Design Automation
ESE535: Electronic Design Automation
CS137: Electronic Design Automation
CS137: Electronic Design Automation
Presentation transcript:

CALTECH CS137 Spring DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with Eylon Caspi]

CALTECH CS137 Spring DeHon Today Cover/clustering –Minimize Weight –W/ area and IO constraints Motivation: SCORE Page generation –Also energy minimization Techniques Current Results FPGA/hardware implementation?

CALTECH CS137 Spring DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i )

CALTECH CS137 Spring DeHon SCORE Compilation Programming ModelExecution Model Graph of TDF FSMD operators Graph of page configs - unlimited size, # IOs- fixed size, # IOs - no timing constraints- timed, single-cycle firing Compile memory segment TDF operator stream memory segment compute page stream

CALTECH CS137 Spring DeHon How Big is an Operator? Wavelet Decode Wavelet Encode JPEG Encode MPEG Encode JPEG Encode JPEG Decode MPEG (I) MPEG (P) Wavelet Encode IIR

CALTECH CS137 Spring DeHon Clustering is Critical Inter-page comm. latency may be long Inter-page feedback loops are slow Cluster to: –Fit feedback loops within page –Fit feedback loops on device

CALTECH CS137 Spring DeHon Pipeline Extraction Hoist uncontrolled FF data-flow out of FSMD Benefits: –Shrink FSM cyclic core –Extracted pipeline has more freedom for scheduling and partitioning Extract state foo(i): acc=acc+2*i state foo(two_i): acc=acc+two_i i state DF CF *2 two_i i pipeline

CALTECH CS137 Spring DeHon Pipeline Extraction – Extractable Area JPEG Encode JPEG Decode MPEG (I) MPEG (P) Wavelet Encode IIR

CALTECH CS137 Spring DeHon Page Generation Pipeline extraction –removes dataflow can freely extract from FSMD control Still have to partition potentially large FSMs –approach: turn into a clustering problem

CALTECH CS137 Spring DeHon State Clustering Start: consider each state to be a unit Cluster states into page-size sub- FSMDs –Inter-page transitions become streams Possible clustering goals: –Minimize delay (inter-page latency) –Minimize IO(inter-page BW) –Minimize area (fragmentation) IAIA IBIB OAOA OBOB

CALTECH CS137 Spring DeHon State Clustering to Minimize Inter-Page State Transfer Inter-page state transfer is slow Cluster to: –Contain feedback loops –Minimize frequency of inter-page state transfer Previously used in : –VLIW trace scheduling [Fisher ‘81] –FSM decomposition for low power [Benini/DeMicheli ISCAS ‘98] –VM/cache code placement –GarpCC code selection [Callahan ‘00]

CALTECH CS137 Spring DeHon Clustering Problem SCORE Page –Fixed area (# of LUTs) –Fixed IO Cost on edges is probability take state transition Clustering Goal is to minimize page-to-page transition –Maximize expected transitions within same page –Find page-count/page-transition tradeoff curve

CALTECH CS137 Spring DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i ) Pages Inter-Page Communication Frequency

CALTECH CS137 Spring DeHon DSM Possibly relevant for minimizing delay in DSM Previously discussed: –Larger area  longer wires, slower –Want to cluster logic locally Maybe: –Cluster common computations together –Make distant computation transfer uncommon

CALTECH CS137 Spring DeHon Island Packing for Energy Note: Modern FPGAs pack cluster of LUTs into an endpoint –e.g. Altera LAB

CALTECH CS137 Spring DeHon Island Packing for Energy Modern FPGAs pack cluster of LUTs into an endpoint –e.g. Altera LAB Local wiring less energy cost than long wiring Covering for energy: –minimize exposed activity factor –same covering problem

CALTECH CS137 Spring DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i ) Clusters/Islands Switching Activity

CALTECH CS137 Spring DeHon First Try Use FBB (flow cut) [Wong/cs137a:day7] Pick seed element Compute mincut –On mix of IO, cost edge weights? If too small, –Cluster in node and repeat Else –Cluster out node and repeat

CALTECH CS137 Spring DeHon Mincut lessons Couldn’t consistently control IO –Non-monotonic results adjusting weight Not clear what to cluster in

CALTECH CS137 Spring DeHon Idea #2 If we had an ordering of nodes –(wishful thinking) Then easy to know how to include more –Just pick the next node Order: 1D list of nodes Cluster: a contiguous sequence of nodes in list –Specify start, finish

CALTECH CS137 Spring DeHon From Sequence to Clusters Easy to know if a contiguous subsequence –Meets area constraints –Meets io constraints Cover –Set of (non-overlapping) subsequences –Include all nodes

CALTECH CS137 Spring DeHon Feasible Clusters (mult16a)

CALTECH CS137 Spring DeHon Covering Not clear when to put more or less stuff in a cluster…versus leave with next cluster –Can’t build clusters greedily Like associative/parthesization problem saw earlier [day 5]

CALTECH CS137 Spring DeHon Parenthesis Matching Similar But compute from all breaks across a diagonal –Not just nearest neighbor Hence extra O(N) Day 5

CALTECH CS137 Spring DeHon Dynamic Programming For each subsequence start,end –Either the area and io match –OR want to find a breakpoint between cluster sets Cluster sets start  midpoint, midpoint  end may each either be single or multiple clusters Different splits may –Minimize number of clusters –Minimize cost –Keep dominator set [day11]

CALTECH CS137 Spring DeHon Algorithm Compute Linear Order Compute IO, Area on each subsequence –Think NxN table (but sparse) Use Dynamic Programming to cover

CALTECH CS137 Spring DeHon Compute Order? Could experiment with various techniques Considering: Spectral Ordering –[Hall/cs137a:day7] How weight edges? –IO, cost, mix? –Try linear mix…vary mix weighting

CALTECH CS137 Spring DeHon Weight Mix Why unclear? –IO weight  good to cluster connectivity If Ios limited, allows to use fewer clusters Pack more stuff into page  less cases need to transition –Cost weight  what we’re minimizing Cluster high cost edges together Hide in page –But, cost ordering may get less stuff in page if poorly IO clustered…

CALTECH CS137 Spring DeHon spp results [see HTML]

CALTECH CS137 Spring DeHon Versus Weighting (w by 0.01)

CALTECH CS137 Spring DeHon Discussion Promising Results –New capability not clear what compare to Maybe LUT clustering to validate algorithm –Absolutes look promising Weighting –Not clear how to search for best –Maybe should try other ways of weighting? [Michael suggests try taking log(trans)]

CALTECH CS137 Spring DeHon Spatial/Hdw Implementation? Compute Linear Order –Use 1D FDSA? Compute IO, Area on each subsequence –Parallel prefix sum scan One for each start point? Use Dynamic Programming to cover –Like parenthesis –Maybe 1D and combine with area/io scan?

CALTECH CS137 Spring DeHon Promising Ideas Compute good ordering –Easy to vary inclusion when know what’s next to include/exclude Mix weights Cluster to minimize exposed (cut) costs