Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: 310-206-2775, Email:

Slides:



Advertisements
Similar presentations
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Advertisements

Computer Abstractions and Technology
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
Graduate Computer Architecture I Lecture 16: FPGA Design.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Design Technology Center National Tsing Hua University IC-SOC Design Driver Highlights Cheng-Wen Wu.
Design Technology Center National Tsing Hua University IC-SOC Design Driver Highlights Cheng-Wen Wu.
NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
A Timing-Driven Soft-Macro Resynthesis Method in Interaction with Chip Floorplanning Hsiao-Pin Su 1 2 Allen C.-H. Wu 1 Youn-Long Lin 1 1 Department of.
Tim Cheng1 Key Results - Verification Developed and released ATPG-based SAT solvers for circuits (Univ. of California, Santa Barbara) –Integrating structural.
Design Automation for VLSI, MS-SOCs & Nanotechnologies Dr. Malgorzata Chrzanowska-Jeske Mixed-Signal System-on-Chip (supported.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: ,
Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: ,
International Center on Design-for- Nanotechnologies (IC-DFN) Jason Cong University of California, Los Angeles
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
ELEN468 Lecture 11 ELEN468 Advanced Logic Design Lecture 1Introduction.
International Center on Design for Nanotechnologies (IC-DFN) Jason Cong University of California, Los Angeles Tel: ,
ELEN468 Lecture 11 ELEN468 Advanced Logic Design Lecture 1Introduction.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Combining High Level Synthesis and Floorplan Together EDA Lab, Tsinghua University Jinian Bian.
Architecture-Level Synthesis for Automatic Interconnect Pipelining
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
Presenter : Ching-Hua Huang 2012/11/3 Implementation and Prototyping of a Complex Multi-Project System-on-a-Chip Chun-Ming Huang, Chien-Ming Wu, Chih-Chyau.
Power Reduction for FPGA using Multiple Vdd/Vth
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Presenter: Hong-Wei Zhuang On-Chip SOC Test Platform Design Based on IEEE 1500 Standard Very Large Scale Integration (VLSI) Systems, IEEE Transactions.
CAD for Physical Design of VLSI Circuits
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Automated Design of Custom Architecture Tulika Mitra
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
HDL-Based Layout Synthesis Methodologies Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
ELEC692/04 course_des 1 ELEC 692 Special Topic VLSI Signal Processing Architecture Fall 2004 Chi-ying Tsui Department of Electrical and Electronic Engineering.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL ASICs vs. FPGAs ECE 448 Lecture 15.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
Research on 3-D Parasitic Extraction and Interconnect Analysis Wenjian Yu EDA Lab, Dept. of Computer Science & Technology, Tsinghua University Beijing,
Jason Cong University of California, Los Angeles Tel: ,
ASIC Design Methodology
Architecture and Synthesis for Multi-Cycle Communication
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
HIGH LEVEL SYNTHESIS.
Measuring the Gap between FPGAs and ASICs
Department of Computer Science and Technology
Presentation transcript:

Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: , (Other participants are listed inside)

Jason Cong2 Background: “Double Exponential” Growth of Design Complexity C1: complexity due to exponential increase of chip capacity –More devices –More power –Heterogeneous integration, …… C2: complexity due to exponential decrease of feature size –Interconnect delay –Coupling noise –EMI, …… Design Complexity  C1 x C2

Jason Cong3 Motivation: Productivity Gap x x x x x x x 21%/Yr. Productivity growth rate x 58%/Yr. Complexity growth rate ,000 10, ,000 1,000,000 10,000, ,000 10, ,000 1,000,000 10,000, ,000,000 Logic Transistors/Chip (K) Transistor/Staff-Month Chip Capacity and Designer Productivity 2003 Source: NTRS’97

Jason Cong4 Project Summary Develop new design methodology to enable efficient giga-scale integration for system-on-a-chip (SOC) designs Project includes three major components –SOC synthesis tools and methodologies –SOC verification, test, and diagnosis –SOC design driver – network processor

Jason Cong5 Research Team by Institutions  US  UCLA: Jason Cong  UC Santa Barbara: Tim Cheng  Taiwan  NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu  NCTU: Jing-Yang Jou  China  Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue  Peking Univ.: Xu Cheng  Zhejiang Univ.: Xiaolang Yan

Jason Cong6 Current Research Team  US  UCLA: Jason Cong  UC Santa Barbara: Tim Cheng  Taiwan  NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu  NCTU: Jing-Yang Jou  China  Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue  Peking Univ.: Xu Cheng  Zhejiang Univ.: Xiaolang Yan  Several new faculty members in the 7 institutions  Guest members from National University of Singapore, Purdue Univ., and UCLA (EE Dept)

Jason Cong7 Thrust 1 -- SOC Synthesis Environment/Methodology (Led by Jason Cong) Code Generation for Retargetable Compiler and Assembler Generator Design Spec VHDL/C Co-Simulation Design Partitioning DSP Synthesis and Optimization FPGA Synthesis and Technology Mapping ASIC Synthesis Interconnect-Driven High-level Synthesis Synthesis for IP Reuse Physical Synthesis for Full-Chip Assembly Embedded Processors DSPs Embedded FPGAs Customiz ed Logic

Jason Cong8 n ITRS’ um Tech n 5.63 G Hz across-chip clock n 800 mm 2 (28.3mm x 28.3mm) n IPEM BIWS estimations u Buffer size: 100x u Driver/receiver size: 100x n On semi-global layer (tier 3) : u Can travel up to 11.4 mm in one cycle u Need 5 clock cycles from corner to corner Interconnect Bottleneck in Nanometer Designs u Challenge: Single-cycle full chip communication is no longer possible u Not supported by the current CAD toolset cycle 2 cycles 3 cycles 4 cycles 5 cycles

Jason Cong9 Regular Distributed Register Architecture Global Interconnect … LCC Reg. file … LCC Reg. file … LCC Reg. file … LCC Reg. file … LCC Reg. file … LCC Reg. file FSM Local Computational Cluster (LCC) …. Register File WiWi HiHi Island FSM ADD MUX MUL Cluster with area constraint  Use register banks:  Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, … k cycle interconnect communication in each island  Highly regular 1 cycle 2 cycle k cycle

Jason Cong10 MCAS: Architectural Synthesis for Multi-Cycle Communication Using RDR Architecture ICG C program Locations Placement-driven rescheduling & rebinding Placement-driven rescheduling & rebinding Scheduling-driven placement CDFG generation Register and port binding Datapath & FSM generation Floorplan constraints Resource allocation & Functional unit binding Resource allocation & Functional unit binding RTL VHDL Multi-cycle path constraints CDFG MCAS (Multi-Cycle Architectural Synthesis)

Jason Cong11 MCAS flow vs. Synopsys Behavioral Compiler (on Virtex-II)  Synopsys Behavioral Compiler setting: default (optimizing latency)  Average latency ratio of MCAS vs. BC: 69% LatencyResource

Jason Cong12 Optimality Study of Large-Scale Circuit Placement Construction of Placement Example with Known Optimal (PEKO) [C. Chang et al, 2003] ? n Construct instances with known optimal using the characteristic of the original problem n First quantitative evaluation of the optimality of circuit placement problem n Existing placement algorithms can be 70% to 150% away from the optimal

Jason Cong13 High Interest in the Community Three EE Times articles coverage –Placement tools criticized for hampering IC designs [Feb’03] –IC placement benchmarks needed, researchers say [April’03] –FPGA placement performance [Nov03] More than 150 downloads from our website –Cadence, IBM, Intel, Magma, Mentor Graphics, Synopsys, etc –CMU, SUNY, UCB, UCSB, UCSD, UIC, UMichgan, UWaterloo, etc Used in every placement since its publication

Jason Cong14 Floorplanning & Interconnect Planning Based on proposed Corner Block List (CBL) representation propose several Extended Corner Block List, ECBL, CCBL and SUB-CBL to speed up floorplanning and handle more complicate L/T shaped and rectilinear shaped blocks. Propose floorplanning algorithms with some geometric constraints, such as boundary, abutment, L/T shaped blocks. Propose integrated floorplanning and buffer planning algorithms with consideration of congestion. Using research results from UCLA on interconnect planning About 30 papers published in DAC, ICCAD, ISPD, ASPDAC, ISCAS and Transactions.

Jason Cong15 P/G Network Analysis & Optimization Propose an Area Minimization of Power Distribution Network Using Efficient Nonlinear Programming Techniques (ICCAD2001, accepted by IEEE Trans. On CAD) Propose a decoupling capacitance optimization algorithm for Robust On-Chip Power Delivery (ASPDAC2004, ASICON2003)

Jason Cong16 Parasitic R/L/C Etraction 3-D R/C Extraction using Boundary Element Method (BEM) Quasi-Multiple Medium (QMM) BEM algorithms Hierarchical Block BEM (HBBEM) technique Fast 3-D Inductance Extraction (FIE) Papers were published in ASPDAC, ASICON and IEEE Transaction on MTT

Jason Cong17 Thrust 2 -- SOC Verification, Test, and Diagnosis (Led by Tim Cheng) Verification and Testing Enabling techniques for semi- formal functional verification Integrated framework for simulation, vector generation and model checking Testing and diagnosis for heterogeneous SOC Self-testing using on-chip programmable components Self-testing for on- chip analog/mixed- signal components New test techniques for deep-submicron embedded memories Scalable constraint- solving techniques Automatic/semi- automatic functional vector generation from HDL code

Tim Cheng18 Key Results - Verification Developed and released ATPG-based SAT solvers for circuits (Univ. of California, Santa Barbara) –Integrating structural ATPG and SAT techniques with new conflict learning –CSAT: Fast combinational solver (released on March 2003) Demonstrated X speedup over state-of-the-art SAT solvers on industrial test cases (reported by Intel and Calypto) Has been integrated into Intel’s FV verification system and a startup’s verification engine Publications: DATE2003 and DAC2003 –Satori2: Fast sequential solver (released on Dec. 2003) Demonstrated 10X-200X speedup over a commercial, sequential ATPG engine on public benchmark circuits Publications: ICCAD2003, HLDVT2003 and ASPDAC2004

Tim Cheng19 Key Results - Testing A new Statistical Delay Testing and Diagnosis framework consisting of five major components (UCSB): Defect Injection & Simulation Statistical Timing Analysis Framework (Cell-based characterization) Static Timing Analysis Dynamic Timing Simulator Path Filtering Critical Path Selection Diagnosis ATPG/Pattern Selection Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004] Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004]  Identifying tests that activate longer delay along the target path Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03] Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03]  Ref: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in Test Statistical timing analysis Statistical critical path selection [DAC’02,ICCAD’02]  Selecting statistical long & true paths whose tests maximize detection of parametric failures Path coverage metric [ASPDAC’03]  Estimating the quality of a path set

Tim Cheng20 Key Results - Testing On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of Multi- GHz Signal (UCSB) –Using on-chip, single-shot measurement unit to sample signal periods for spectral analysis –Demonstrated, through simulation, accurate extraction of multiple sinusoids and random jitter components for a 3GHz signal –Publications: ASPDAC2004 and DATE2004

Jason Cong21 Thrust 3 – Design Driver: Network Security Processor (Led by Prof. C. W. Wu & Xu Cheng) Applications: IPSec, SSL, VPN, etc. Functionalities: –Public key: RSA, ECC –Secret key: AES –Hashing (Message authentication): HMAC (SHA-1/MD5) –Truly random number generator (FIPS 140-1,140-2 compliant) Target technology: 0.18  m or below Clock rate: 200MHz or higher (internal) 32-bit data and instruction word 10Gbps (OC192) Power: 1 to 10mW/MHz at 3V (LP to HP) Die size: 50mm 2 On-chip bus: AMBA (Advanced Microcontroller Bus Architecture)

Jason Cong22 Encryption Modules (PKEM) Public key encryption module –Operations: 32-bit word-based modular multiplication Multiplication over GF(p) and GF(2 m ) An RSA cryptography engine with small area overhead and high speed Scalable word-width TSMC 0.35μm 34K gates (1.7×1.8 mm 2 ) 100MHz clock Scalable key length Throughput –512-bit key: 1.79Kbps/MHz –1024-bit key: 470bps/MHz

Jason Cong23 Encryption Modules (SKEM) Secret key encryption module –Operations: Matrix operations, manipulation AES cryptography 32-bit external interface 58K gates Over 200MHz clock Throughput: 2Gbps Support key length of 128/192/256 bits Technology TSMC 0.25  m CMOS Package128CQFP Core Size 1,279 x 1,271  m 2 Gate Count63.4K Max. Freq.250MHz Throughput Gbps (128-bit key) Gbps (196-bit key) Gbps (256-bit key)

Jason Cong24 International Collaborations Joint NSF/NSC workshop in Aug on SOC (Hsin-Chu, Taiwan) First team preparation meeting for the proposed center in Jan (Yokohama, Japan) 2 nd planning meeting held in April 2000 (Hawaii, US) 3 rd planning meeting in Aug (Chengde, China) Proposal submitted to NSF in Aug and funded in Dec Workshops –March 30-31, 2001 in Taipei, Taiwan. –June 23-24, 2001 in Los Angeles, USA –August 31-September 1, 2001 in HangZhou, China March 28-29, 2002, National Tsing Hua University, Hsinchu, Taiwan August 20-21, 2002, Peking University, Beijing, China November 15-16, 2002, University of California, Santa Barbara March 27-29, 2003, National Taiwan University, Taipei, Taiwan December 19-21, 2003, Yunnan University, Kunming, China

Jason Cong25 Publications 56 research publications up to this point 17 in top conferences/journals (DAC, ICCAD, ASPDAC, ITC, etc.) in the field

Jason Cong26 People & Education Many interactions among participants from different institutes Two new IEEE fellows: –Prof. Xiaolang Hong, Tsinghua Univ. –Prof. Cheng-Wen Wu, National Tsing Hua Univ. Involved many young faculty members and researchers Trained an army of graduate students