Download presentation
Presentation is loading. Please wait.
1
Giga-Scale System-On-A-Chip International Center on System-on-a-Chip (ICSOC) Jason Cong University of California, Los Angeles Tel: 310-206-2775, Email: cong@cs.ucla.educong@cs.ucla.edu (Other participants are listed inside)
2
Jason Cong2 Background: “Double Exponential” Growth of Design Complexity C1: complexity due to exponential increase of chip capacity –More devices –More power –Heterogeneous integration, …… C2: complexity due to exponential decrease of feature size –Interconnect delay –Coupling noise –EMI, …… Design Complexity C1 x C2
3
Jason Cong3 Motivation: Productivity Gap x x x x x x x 21%/Yr. Productivity growth rate x 58%/Yr. Complexity growth rate 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1998 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 Logic Transistors/Chip (K) Transistor/Staff-Month Chip Capacity and Designer Productivity 2003 Source: NTRS’97
4
Jason Cong4 Project Summary Develop new design methodology to enable efficient giga-scale integration for system-on-a-chip (SOC) designs Project includes three major components –SOC synthesis tools and methodologies –SOC verification, test, and diagnosis –SOC design driver – network processor
5
Jason Cong5 Research Team by Institutions US UCLA: Jason Cong UC Santa Barbara: Tim Cheng Taiwan NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou China Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue Peking Univ.: Xu Cheng Zhejiang Univ.: Xiaolang Yan
6
Jason Cong6 Current Research Team US UCLA: Jason Cong UC Santa Barbara: Tim Cheng Taiwan NTHU: Shi-Yu Huang, Tingting Hwang, J. K. Lee, Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu NCTU: Jing-Yang Jou China Tsinghua Univ.: Jinian Bian, Xianlong Hong, Zeyi Wang, Hongxi Xue Peking Univ.: Xu Cheng Zhejiang Univ.: Xiaolang Yan Several new faculty members in the 7 institutions Guest members from National University of Singapore, Purdue Univ., and UCLA (EE Dept)
7
Jason Cong7 Thrust 1 -- SOC Synthesis Environment/Methodology (Led by Jason Cong) Code Generation for Retargetable Compiler and Assembler Generator Design Spec VHDL/C Co-Simulation Design Partitioning DSP Synthesis and Optimization FPGA Synthesis and Technology Mapping ASIC Synthesis Interconnect-Driven High-level Synthesis Synthesis for IP Reuse Physical Synthesis for Full-Chip Assembly Embedded Processors DSPs Embedded FPGAs Customiz ed Logic
8
Jason Cong8 n ITRS’01 0.07um Tech n 5.63 G Hz across-chip clock n 800 mm 2 (28.3mm x 28.3mm) n IPEM BIWS estimations u Buffer size: 100x u Driver/receiver size: 100x n On semi-global layer (tier 3) : u Can travel up to 11.4 mm in one cycle u Need 5 clock cycles from corner to corner Interconnect Bottleneck in Nanometer Designs u Challenge: Single-cycle full chip communication is no longer possible u Not supported by the current CAD toolset 11.4 22.8 28.3 0 1 cycle 2 cycles 3 cycles 4 cycles 5 cycles
9
Jason Cong9 Regular Distributed Register Architecture Global Interconnect … LCC Reg. file … LCC Reg. file … LCC Reg. file … LCC Reg. file … LCC Reg. file … LCC Reg. file FSM Local Computational Cluster (LCC) …. Register File WiWi HiHi Island FSM ADD MUX MUL Cluster with area constraint Use register banks: Registers in each island are partitioned to k banks for 1 cycle, 2 cycle, … k cycle interconnect communication in each island Highly regular 1 cycle 2 cycle k cycle
10
Jason Cong10 MCAS: Architectural Synthesis for Multi-Cycle Communication Using RDR Architecture ICG C program Locations Placement-driven rescheduling & rebinding Placement-driven rescheduling & rebinding Scheduling-driven placement CDFG generation Register and port binding Datapath & FSM generation Floorplan constraints Resource allocation & Functional unit binding Resource allocation & Functional unit binding RTL VHDL Multi-cycle path constraints CDFG MCAS (Multi-Cycle Architectural Synthesis)
11
Jason Cong11 MCAS flow vs. Synopsys Behavioral Compiler (on Virtex-II) Synopsys Behavioral Compiler setting: default (optimizing latency) Average latency ratio of MCAS vs. BC: 69% LatencyResource
12
Jason Cong12 Optimality Study of Large-Scale Circuit Placement Construction of Placement Example with Known Optimal (PEKO) [C. Chang et al, 2003] ? n Construct instances with known optimal using the characteristic of the original problem n First quantitative evaluation of the optimality of circuit placement problem n Existing placement algorithms can be 70% to 150% away from the optimal
13
Jason Cong13 High Interest in the Community Three EE Times articles coverage –Placement tools criticized for hampering IC designs [Feb’03] –IC placement benchmarks needed, researchers say [April’03] –FPGA placement performance [Nov03] More than 150 downloads from our website –Cadence, IBM, Intel, Magma, Mentor Graphics, Synopsys, etc –CMU, SUNY, UCB, UCSB, UCSD, UIC, UMichgan, UWaterloo, etc Used in every placement since its publication http://ballade.cs.ucla.edu/~pubbench
14
Jason Cong14 Floorplanning & Interconnect Planning Based on proposed Corner Block List (CBL) representation propose several Extended Corner Block List, ECBL, CCBL and SUB-CBL to speed up floorplanning and handle more complicate L/T shaped and rectilinear shaped blocks. Propose floorplanning algorithms with some geometric constraints, such as boundary, abutment, L/T shaped blocks. Propose integrated floorplanning and buffer planning algorithms with consideration of congestion. Using research results from UCLA on interconnect planning About 30 papers published in DAC, ICCAD, ISPD, ASPDAC, ISCAS and Transactions.
15
Jason Cong15 P/G Network Analysis & Optimization Propose an Area Minimization of Power Distribution Network Using Efficient Nonlinear Programming Techniques (ICCAD2001, accepted by IEEE Trans. On CAD) Propose a decoupling capacitance optimization algorithm for Robust On-Chip Power Delivery (ASPDAC2004, ASICON2003)
16
Jason Cong16 Parasitic R/L/C Etraction 3-D R/C Extraction using Boundary Element Method (BEM) Quasi-Multiple Medium (QMM) BEM algorithms Hierarchical Block BEM (HBBEM) technique Fast 3-D Inductance Extraction (FIE) Papers were published in ASPDAC, ASICON and IEEE Transaction on MTT
17
Jason Cong17 Thrust 2 -- SOC Verification, Test, and Diagnosis (Led by Tim Cheng) Verification and Testing Enabling techniques for semi- formal functional verification Integrated framework for simulation, vector generation and model checking Testing and diagnosis for heterogeneous SOC Self-testing using on-chip programmable components Self-testing for on- chip analog/mixed- signal components New test techniques for deep-submicron embedded memories Scalable constraint- solving techniques Automatic/semi- automatic functional vector generation from HDL code
18
Tim Cheng18 Key Results - Verification Developed and released ATPG-based SAT solvers for circuits (Univ. of California, Santa Barbara) –Integrating structural ATPG and SAT techniques with new conflict learning –CSAT: Fast combinational solver (released on March 2003) Demonstrated 10-100X speedup over state-of-the-art SAT solvers on industrial test cases (reported by Intel and Calypto) Has been integrated into Intel’s FV verification system and a startup’s verification engine Publications: DATE2003 and DAC2003 –Satori2: Fast sequential solver (released on Dec. 2003) Demonstrated 10X-200X speedup over a commercial, sequential ATPG engine on public benchmark circuits Publications: ICCAD2003, HLDVT2003 and ASPDAC2004
19
Tim Cheng19 Key Results - Testing A new Statistical Delay Testing and Diagnosis framework consisting of five major components (UCSB): Defect Injection & Simulation Statistical Timing Analysis Framework (Cell-based characterization) Static Timing Analysis Dynamic Timing Simulator Path Filtering Critical Path Selection Diagnosis ATPG/Pattern Selection Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004] Selection/Generation of high quality tests for target paths [ITC’01][DATE 2004] Identifying tests that activate longer delay along the target path Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03] Delay fault diagnosis based on statistical timing model [DATE’03, VTS’03, DAC’03] Ref: Krstic, Wang, Cheng,& Abadir, DATE’03–Best Paper Award in Test Statistical timing analysis Statistical critical path selection [DAC’02,ICCAD’02] Selecting statistical long & true paths whose tests maximize detection of parametric failures Path coverage metric [ASPDAC’03] Estimating the quality of a path set
20
Tim Cheng20 Key Results - Testing On-Chip Jitter Extraction for Bit-Error-Rate (BER) Testing of Multi- GHz Signal (UCSB) –Using on-chip, single-shot measurement unit to sample signal periods for spectral analysis –Demonstrated, through simulation, accurate extraction of multiple sinusoids and random jitter components for a 3GHz signal –Publications: ASPDAC2004 and DATE2004
21
Jason Cong21 Thrust 3 – Design Driver: Network Security Processor (Led by Prof. C. W. Wu & Xu Cheng) Applications: IPSec, SSL, VPN, etc. Functionalities: –Public key: RSA, ECC –Secret key: AES –Hashing (Message authentication): HMAC (SHA-1/MD5) –Truly random number generator (FIPS 140-1,140-2 compliant) Target technology: 0.18 m or below Clock rate: 200MHz or higher (internal) 32-bit data and instruction word 10Gbps (OC192) Power: 1 to 10mW/MHz at 3V (LP to HP) Die size: 50mm 2 On-chip bus: AMBA (Advanced Microcontroller Bus Architecture)
22
Jason Cong22 Encryption Modules (PKEM) Public key encryption module –Operations: 32-bit word-based modular multiplication Multiplication over GF(p) and GF(2 m ) An RSA cryptography engine with small area overhead and high speed Scalable word-width TSMC 0.35μm 34K gates (1.7×1.8 mm 2 ) 100MHz clock Scalable key length Throughput –512-bit key: 1.79Kbps/MHz –1024-bit key: 470bps/MHz
23
Jason Cong23 Encryption Modules (SKEM) Secret key encryption module –Operations: Matrix operations, manipulation AES cryptography 32-bit external interface 58K gates Over 200MHz clock Throughput: 2Gbps Support key length of 128/192/256 bits Technology TSMC 0.25 m CMOS Package128CQFP Core Size 1,279 x 1,271 m 2 Gate Count63.4K Max. Freq.250MHz Throughput 2.977 Gbps (128-bit key) 2.510 Gbps (196-bit key) 2.169 Gbps (256-bit key)
24
Jason Cong24 International Collaborations Joint NSF/NSC workshop in Aug. 1999 on SOC (Hsin-Chu, Taiwan) First team preparation meeting for the proposed center in Jan. 2000 (Yokohama, Japan) 2 nd planning meeting held in April 2000 (Hawaii, US) 3 rd planning meeting in Aug. 2000 (Chengde, China) Proposal submitted to NSF in Aug. 2000 and funded in Dec. 2000 Workshops –March 30-31, 2001 in Taipei, Taiwan. –June 23-24, 2001 in Los Angeles, USA –August 31-September 1, 2001 in HangZhou, China March 28-29, 2002, National Tsing Hua University, Hsinchu, Taiwan August 20-21, 2002, Peking University, Beijing, China November 15-16, 2002, University of California, Santa Barbara March 27-29, 2003, National Taiwan University, Taipei, Taiwan December 19-21, 2003, Yunnan University, Kunming, China
25
Jason Cong25 Publications 56 research publications up to this point 17 in top conferences/journals (DAC, ICCAD, ASPDAC, ITC, etc.) in the field
26
Jason Cong26 People & Education Many interactions among participants from different institutes Two new IEEE fellows: –Prof. Xiaolang Hong, Tsinghua Univ. –Prof. Cheng-Wen Wu, National Tsing Hua Univ. Involved many young faculty members and researchers Trained an army of graduate students
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.