An Introduction to Electronic System Level Design 錢偉德 國家晶片及系統中心設計服務組 清大資工系視訊通訊研究室
Trend Fashion Driven Applications Question 1: how do we push out a brand new application every 3 months? Question 2: How do we help our customers do application driven SoC ’ s?
HW Problems HW is getting more complicated: Multiple processors/autonomous engines for parallelism Sophisticated algorithms for acceleration High throughput and low latency Management of dynamic and static power Smaller chip size
SW Problems SW becomes a massive task: SW/HW engineer ratios: Multimedia – 2:1 Networking --3:1 Wireless – 4:1 Need to ensure HW spec is what they want. Need to program for the complicated HW. 80% design is determined when 20% into the project. So better do it earlier.
Design Team ► Hardware Team Components, devices, memory Glue logic, clock tree, bus, PLL, etc. ► FW/SW Team Device drivers RTOS, application porting ► System Team Application/algorithm analysis Architecture design
System Team ► Comprehend the system at transaction level ► Application oriented ► It is good to understand hardware designing, but it is not a must-to-have. ► Solve big problems at the design phase, not the verification phase
System Design Flow HW Implementation System Integration Algorithm Design & Analysis Simulink, SPW, ADS, etc. Matlab Constraints Architecture Design Dataflow Analysis System Verification SW/FW Implementation
Algorithm & Architecture Design ► Algorithm Design Dataflow Analysis Memory access Low-power ► Architecture Design Memory infrastructure Bus architecture IP Reuse Cache/DMA Multi-V dd /Multi-Frequency Platform design Performance evaluation Multi-Core SoC
Design Flow ▶ Algorithm Design ▶ Architecture Design ▶ Cycle-Accurate System Modeling ▶ Transaction-Level and Cycle-Accurate Modeling ▶ RTL Design ▶ High-Level Synthesis ▶ FPGA Implementation ▶ Logic Synthesis ▶ Place & Route ▶ Signal Integrity/IR Drop
Typical Project Schedule Time to Market System Design Hardware Design Prototype Build Hardware Debug Software Design Software Coding Software Debug Project Complete
HW/SW Co-design Benefits System Design Hardware Design Prototype Build Hardware Debug Software Design Software Coding Software Debug Project Complete Integrate earlier Debug SW sooner Iterate changes faster Reduce project risk Debug starts on a Co-verification Env. Early architecture closure reduces risk by 80% Start software development 6 months earlier
Simulation Speed Issue ▶ To be categorized as a system-level language, the simulation SPEED is the key. ▶ The simulation speed should take no 1,000 time slower than the real HW. In another word, 1 second of HW execution time equals 16 minutes and 40 seconds simulation time. ▶ To achieve this kind of performance, the system is best modeled in transaction level.
Solution: Virtual Platform High-Speed Simulation SystemC-Based Models Transactional Level Modeling Methodology Abstraction Levels range from Programmer ’ s View to Cycle-Accurate
Current System Design Methodology Refine C, C++ System-level Modeling Simulation & Analysis Results Verilog/VHDL Simulation Synthesis Done To tape out, test and product delivery
Conventional Hardware Modeling
Transaction Level Modeling
Transfers
Abstraction Level of Hardware Models * Willamette HDL, Inc.
SystemC 2.1 Language Architecture (IEEE 1666) Methodology-Specific Libraries Master/Slave Library, etc. Layered Libraries Verification Library, TLM Library, etc. Primitive Channels Signal, Mutex, Semaphore, FIFO, etc. Core Language Modules Ports Interfaces Channels Data Types 4-valued Logic Type 4-valued Logic Vectors Bits and Bit Vectors Arbitrary Precision Integers Fixed-Point Types Event-Driven Simulation Events, Processes C++ Language Standards
SystemC & C++ ▶ SystemC is a set of C++ class definitions and a methodology for using these classes. ▶ C++ class definition means systemc.h and the matching library. ▶ Methodology means the use of simulation kernel and modeling. ▶ You can use all of the C++ syntax, semantics, run time library, STL and such. ▶ However you need to follow SystemC methodology closely to make sure the simulation executes correctly.
SystemC & HDL ▶ SystemC is a Hardware Description Language (HDL) from system-level down to gate level. ▶ Modules written in traditional HDLs like Verilog and VHDL can be translated into SystemC, but not vise versa. Reason: Verilog and VHDL do not support transaction-level. ▶ System-Verilog is Verilog plus assertion, which is an idea borrowed from programming languages. And SystemC supports assertion as well through the C++ syntax and semantics.
SystemVerilog vs. SystemC ▶ SystemVerilog is Verilog plus verification (assertion). ▶ Actually the above statement is not fair but it is the truth now. ▶ SystemVerilog and SystemC work together to complete the design platform from system-level to gate-level. ▶ SystemC deals with whatever above RTL. ▶ SystemVerilog deals with RTL and below.
Simulation Speed Up MEAESDWT Verilog Platform Verilog Module VHM Module Verilog Platform Verilog Module VHM Module Verilog Platform Verilog Module VHM Module 4246 Sec64 Sec20 Sec431 Sec15 Sec4 Sec214 Sec7 Sec2 Sec CPU:Pentium IV 1.6GHz RAM:512MB OS:RedHat Linux 8.0 ConvergenSC Ver NC-SIM Ver. 4.0 Carbon Ver. C SP2
Analysis Case Studies: JPEG Decoding Platform
Configuration 1 ARM926EJ-S Core Dual Master Port DMA Controller Interrupt Controller Static Memory Interface APB AMBA bus Input JPEG Single layer AHB AMBA bus Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl Display Controller
Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl
Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl
Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl
Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl
Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl
Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl
Bus Contention Analysis This is probably DMA input to external memory What is this activity? Max Average Notice there is always buscontention And this? There are several different problems. Let’s start by zooming in on a suspected DMA transfer.
Max Average During these time intervals there is contention approximately 60 and 90 percent of the time So our next step is to examine the transaction counts by initiators and targets in this time period.
Bus Utilization Target Count Initiator Count ARM to ROM ARM to RAM AHB to Int ROM AHB to Int RAM APB to AHB AHB to Ext Mem APB to DMA Master2 DMA Master1 to Ext Mem Utilization is down
APB to DMA Master2DMA Master1 to Ext Mem AHB to Int ROM AHB to Int RAM APB to AHB AHB to Ext Mem Target Avg Initiator Avg Utilization Avg Read time Write time
DMA Activity We have confirmed that: It is DMA activity There is contention approximately 60 to 90% of the time over these time intervals The DMA is contending with the CPU for AHB access We determined the transaction counts and their duration our next step is to examine the activity in this time period.
Max Average During these time intervals there is contention approximately 33 percent of the time our next step is to examine the target and initiator counts. In these views we are zoomed in at the very beginning of the time period of interest. We can see the increase in activity.
ARM to Int ROM ARM to Int RAM AHB to Int ROM AHB to Int RAM Initiator Count Bus Utilization Target Count An increase in CPU to RAM activity due to...
Function Trace Bus Contention Average (these are low level routines called from the process_sos function as it is preparing for the Huffman decoding activity) Increase due to the initializaiton of memory allocation
Increased CPU to RAM activity The contention and utilization problems are primarily due to: An increase of CPU to RAM activity There is contention approximately 33% of the time over this time interval The software activity is the initialization of memory allocation Our next step is to examine the activity in this time period.
Initiator Count Bus Utilization Target Count ARM to Display ARM to RAM Primarily an increase in ARM to ROM activity
The contention and utilization problems are primarily due to: ARM core to ROM and RAM activity Dual DMA activity XB ROM RAM0 RAM1 External Memory Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI ARM Core DMA Ctrl Next, examine two possible solutions.
Configuration 2 Instruction Data AHB4 IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl Int. ROM Int. RAM Bus Matrix AHB2 AHB3 Input and Output Stages I O in2 out1 I in1 I in0 O out2 O out0
Configuration 3 Instruction Data AHB2 IRQ FIQ DMA_Int Master1 Master2 Slave APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl Int. ROM Int. RAM Input and Output Stages Bus Matrix I I O I I I O out0 Bus Matrix O in2 in3 in4 out2 out1 in0 in1
Bus Contention Analysis of Three Configurations Configuration 1 Configuration 2 Configuration 3 3 AHB with 1 Multi-layer Less DMA Contention Single AHB with 2 Multi-layers Single AHB CPU to Memory Contention DMA Contention No Bus Contention No CPU to Memory Contention
Cache Analysis Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl We begin with the cache disabled and examine the software execution.
IDCT activity Huffman Decoding
Large number of accesses to the Internal ROM The ARM core does not have the cache enabled. We will build a system with cache enabled and compare the results.
We will now compare the Master to Slave Access views. Cache disabled Cache enabled
Notice the striking reduction in ROM access. Cache disabled Cache enabled
The Best Settings for The Cache Memory Access Trace The default cache setting is “write- through.” From the Memory Access Trace view, we can zoom in on the cached memory region.
Zoomed Out View The cached memory region of interest is Start addr: 0x Stop addr: 0x IDCT activity Huffman Decoding click release Zoomed in View
Memory Access Trace From the Memory Access Trace view, we notice less write activity.
IDCT activity Huffman Decoding click release Compare the Memory Access Views Group the views and compare as shown on the next page. The cached memory region of interest is Start addr: 0x Stop addr: 0x Zoomed Out ViewZoomed in View
The Simulation and Analysis Cycle Simulation Processor Interface RAM ROM Intr Addr Data Analysis Hardware Processors Bus Memory Peripherals Interfaces Software Startup code Device drivers RTOS Application code
ESL Future Works Verification Between TLM & RTL Models High-Level Synthesis