Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and Wayne Burleson Electrical and Computer Engineering University of Massachusetts Amherst This material is based upon work supported by the National Science Foundation under Grant No and SRC Tasks 766 and 1075
Burleson/UMASS2 Challenges in VLSI Education Advancing Processing Technology Higher level design tools Realistic yet tractable design projects Preparation for jobs in semiconductor and other sectors. Making best use of faculty/student time and university resources
Burleson/UMASS3 ECE 559/659: VLSI Design Project (10 grads, 20 seniors) Learn design process for a complex VLSI in deep sub-micron CMOS Learn VLSI design skills and tools, including working in teams Learn about a particular application component and its VLSI implementation Learn to present formal design reviews using oral, written, graphical and web-based techniques Course Objectives:
Burleson/UMASS4 Key Aspects of the Course aSoC (home-grown SoC platform) Provides a unifying framework to class Allows for subdivision but inter-relation of projects Interesting cutting edge architecture based on NSF- and SRC-funded research at UMASS and elsewhere Covers many aspects of VLSI Design Realistic constraints on area, timing, power and I/O Graduate and undergraduate teamwork Graduate students provide leadership, motivation and experience Commercial tools and design flow Review-based evaluation Oral and web-based reports for 4 different reviews: proposal, feasibility, implementation, integration
Burleson/UMASS5 Adaptive System-on-a-Chip (aSoC) Tiled architecture with mesh interconnect Point to point communication pipeline Allows for heterogeneous cores Differing sizes, clock rates, voltages Low-overhead core interface for On-chip bus substitute for streaming applications Based on static scheduling Fast and predictable Proc Tile Multiplier FPGA Multiplier ctrl South Core West North East Communication Interface
Burleson/UMASS6 Communication Interface Custom design to maximize speed and reduce power Core-ports Crossbar Controller Instruction memory Local frequency and voltage supply Core Core-ports Decoder Local Frequency & Voltage North to South & East Instruction Memory PC Controller North South East West Local Config. North South East West Inputs Outputs Crossbar
Burleson/UMASS7 Class Projects SoC Infrastructure 1,3 Communication Interface Interconnect 3 Power Distribution Clock System Power Management Cores Motion estimation for video encoding 2,3 AES Cryptography 3 Cache 2,3 Huffman Coding 3D Graphics 1,2,3 Discrete Cosine Transform 2,3 Smart Card 2,3 1 Used in PhD Dissertation 2 Used in Masters Thesis 3 Used in Publications
Burleson/UMASS8 Design Flow Architecture to Layout Architecture: Block diagram of system and behavioral description Logic: Gate level or schematic description Circuit: Transistor sizing Layout: Floorplanning, clock and power distribution Tools VerilogXL: behavioral representation VTVT: standard cell library Synopsys: standard cell gate level netlist generation Silicon Ensemble: standard cell netlist to layout Cadence LayoutPlus: schematic and layout design NCSU CDK: design and extraction rules Cadence Layout vs. Schematic: layout verification HSPICE: circuit simulator
Burleson/UMASS9 aSoC Implementation and Integration TSMC technology Full custom
Burleson/UMASS10 Advanced Signaling Techniques (building on SRC-funded work) Differential current sensingBooster Insertion Multi-level current signaling Phase coding
Burleson/UMASS11 Circuit Level Simulation (HSPICE) Evaluating Subsystems with realistic models Capacitance, resistance and inductance Process variations Process generations
Burleson/UMASS12 Interconnect Characterization: Comparing delay and power of signaling techniques for different tile sizes at 250nm, 180nm, 130nm, 100n
Burleson/UMASS13 Voltage Scaling Approach Core-ports Single buffer for each stream to cross clock/voltage barrier between core and interface Reading/Writing success rates indicate core utilization Input blocked: Core too slow Output blocked: Core too fast Controller Interprets core-port success rates to adjust local clock and voltage Interconnect Buffer Input Core-port Output Core-port Core Clock and Supply Controller Local Vdd Local Clock Blocked Processing Pipeline
Burleson/UMASS14 Vdd Selection Criteria Voltage Normalized Delay 0.73 As Vdd decreases delay increases exponentially Use curve to match available clock frequencies to voltages The voltage and frequency change reduces power by 79%, 96%, and 98.7% P = C(Vdd) 2 f Normalized Core Critical Path Delay vs. Vdd Max Speed 1/4 Speed 1/2 Speed 1/8 Speed 1.16
Burleson/UMASS15 Clock Distribution 64 tile aSoC70nm100nm130nm180nm Chip Area(9.24mm) 2 (13.3mm) 2 (17.2mm) 2 (23.8mm) 2 Frequency5 GHz2 GHz1 GHz0.5 GHz Power126 mW240 mW445 mW784 mW Mean Skew41 ps50 ps92 ps70.6 ps Percent Skew21 %10 %9 %4 % Tile Tiled architecture extends life of globally synchronous systems Precise H-tree implementation Load is small and equal at each branch Skew can be reduced by 70% with advanced deskew circuits 1 1 S. Tan et al. “Clock Generation and Distribution for the First IA-64 Microprocessor” IEEE JSSC, Nov. 2000
Burleson/UMASS16 Power Distribution 64 tile aSoCVhVh V mh V ml VlVl Voltage1.8V1.16V0.73V0.6V Current per Core 110mA25mA13mA7mA Total Power12.1 W1.86 W607 mW269 mW Heterogeneous cores may require multiple power supply voltages Tile structure enables uniform interwoven grid Larger grid for higher current demands Reduced resistance Higher capacitance Gnd VhVh VlVl V ml V mh
Burleson/UMASS17 Architecture Evaluation (Motion Estimation) Array-based architecture Pipelined ME Parameterized search window size Full search Choose 16x16 or 8x8 windows Reduce power Address Generation Unit Processing Element Array Memory FIFOs
Burleson/UMASS18 Modify Existing Designs Take existing Verilog code or hardware and improve or change functionality (e.g. add motion estimation algorithms, provide AES key-length flexibility) Evaluate changes in performance and overhead - Old PE Layout - New PE Layout
Burleson/UMASS19 Conclusions Advancing Process Technology Target.18u for affordable fab but also do scaling studies Higher level design tools Combine synthesis and custom techniques Realistic yet tractable design projects Re-use existing projects and provide unifying themes Preparation for jobs in semiconductor and other sectors. Focus on system design and appropriate levels of abstraction Teach how to learn new tools Making best use of faculty/student time and university resources Leverage research Combine grad and undergrad Re-use materials, tools