An Introduction to Electronic System Level Design 錢偉德 國家晶片及系統中心設計服務組 清大資工系視訊通訊研究室.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

SoCks Hardware / Software Codesign Andrew Pearson Sanders DeNardi ECE6502 May 4, 2010.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
ENEL111 Digital Electronics
Using emulation for RTL performance verification
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
February 28 – March 3, 2011 Stepwise Refinement and Reuse: The Key to ESL Ashok B. Mehta Senior Manager (DTP/SJDMP) TSMC Technology, Inc. Mark Glasser.
Synchron’08 Jean-François LE TALLEC INRIA SOP lab, AOSTE INRIA SOP lab, EPI AOSTE ScaleoChip Company SoC Conception Methodology.
MotoHawk Training Model-Based Design of Embedded Systems.
Implementing Rule Checking Early in the Design Cycle to Reduce Design Iterations and Verification Time Kent Moffat DesignAnalyst Product Manager Mentor.
Easy Steps Towards Virtual Prototyping using the SystemVerilog DPI
Transaction Level Modeling with SystemC Adviser :陳少傑 教授 Member :王啟欣 P Member :陳嘉雄 R Member :林振民 P
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
Energy Evaluation Methodology for Platform Based System-On- Chip Design Hildingsson, K.; Arslan, T.; Erdogan, A.T.; VLSI, Proceedings. IEEE Computer.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Configurable System-on-Chip: Xilinx EDK
Dipartimento di Informatica - Università di Verona Networked Embedded Systems The HW/SW/Network Cosimulation-based Design Flow Introduction Transaction.
6/30/2015HY220: Ιάκωβος Μαυροειδής1 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.
NS Training Hardware. System Controller Module.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
Role of Standards in TLM driven D&V Methodology
1 Chapter 2. The System-on-a-Chip Design Process Canonical SoC Design System design flow The Specification Problem System design.
SOC Consortium Course Material ASIC Logic National Taiwan University Adopted from National Chiao-Tung University IP Core Design.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
New Strategies for System Level Design Daniel Gajski Center for Embedded Computer Systems (CECS) University of California, Irvine
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
System Verilog Testbench Language David W. Smith Synopsys Scientist
System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
Languages for HW and SW Development Ondrej Cevan.
The Macro Design Process The Issues 1. Overview of IP Design 2. Key Features 3. Planning and Specification 4. Macro Design and Verification 5. Soft Macro.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
1 IMEC / KHBO June 2004 Micro-electronics SystemC Dorine Gevaert.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Electronic system level design Teacher : 蔡宗漢 Electronic system level Design Lab environment overview Speaker: 范辰碩 2012/10/231.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Advanced SW/HW Optimization Techniques for Application Specific MCSoC m Yumiko Kimezawa Supervised by Prof. Ben Abderazek Graduate School of Computer.
Recen progress R93088 李清新. Recent status – about hardware design Finishing the EPXA10 JPEG2000 project. Due to the DPRAM problem can’t be solved by me,
CIS 4930/6930 System-on-Chip Design Transaction-Level Modeling with SystemC Dr. Hao Zheng Comp. Sci & Eng. U of South Florida.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
ARM Embedded Systems
Introduction to Programmable Logic
How to Quick Start Virtual Platform Development
FPGAs in AWS and First Use Cases, Kees Vissers
Introduction to High-level Synthesis
Design Flow System Level
Introduction to cosynthesis Rabi Mahapatra CSCE617
Figure 1 PC Emulation System Display Memory [Embedded SOC Software]
Field Programmable Gate Array
Field Programmable Gate Array
Field Programmable Gate Array
CoCentirc System Studio (CCSS) by
Speaker: Tian-Sheuan Chang July, 2004
A High Performance SoC: PkunityTM
ECE 699: Lecture 3 ZYNQ Design Flow.
Presentation transcript:

An Introduction to Electronic System Level Design 錢偉德 國家晶片及系統中心設計服務組 清大資工系視訊通訊研究室

Trend  Fashion Driven Applications  Question 1: how do we push out a brand new application every 3 months?  Question 2: How do we help our customers do application driven SoC ’ s?

HW Problems HW is getting more complicated: Multiple processors/autonomous engines for parallelism Sophisticated algorithms for acceleration High throughput and low latency Management of dynamic and static power Smaller chip size

SW Problems SW becomes a massive task: SW/HW engineer ratios: Multimedia – 2:1 Networking --3:1 Wireless – 4:1 Need to ensure HW spec is what they want. Need to program for the complicated HW. 80% design is determined when 20% into the project. So better do it earlier.

Design Team ► Hardware Team Components, devices, memory Glue logic, clock tree, bus, PLL, etc. ► FW/SW Team Device drivers RTOS, application porting ► System Team Application/algorithm analysis Architecture design

System Team ► Comprehend the system at transaction level ► Application oriented ► It is good to understand hardware designing, but it is not a must-to-have. ► Solve big problems at the design phase, not the verification phase

System Design Flow HW Implementation System Integration Algorithm Design & Analysis Simulink, SPW, ADS, etc. Matlab Constraints Architecture Design Dataflow Analysis System Verification SW/FW Implementation

Algorithm & Architecture Design ► Algorithm Design Dataflow Analysis Memory access Low-power ► Architecture Design Memory infrastructure Bus architecture IP Reuse Cache/DMA Multi-V dd /Multi-Frequency Platform design Performance evaluation Multi-Core SoC

Design Flow ▶ Algorithm Design ▶ Architecture Design ▶ Cycle-Accurate System Modeling ▶ Transaction-Level and Cycle-Accurate Modeling ▶ RTL Design ▶ High-Level Synthesis ▶ FPGA Implementation ▶ Logic Synthesis ▶ Place & Route ▶ Signal Integrity/IR Drop

Typical Project Schedule Time to Market System Design Hardware Design Prototype Build Hardware Debug Software Design Software Coding Software Debug Project Complete

HW/SW Co-design Benefits System Design Hardware Design Prototype Build Hardware Debug Software Design Software Coding Software Debug Project Complete Integrate earlier Debug SW sooner Iterate changes faster Reduce project risk Debug starts on a Co-verification Env. Early architecture closure reduces risk by 80% Start software development 6 months earlier

Simulation Speed Issue ▶ To be categorized as a system-level language, the simulation SPEED is the key. ▶ The simulation speed should take no 1,000 time slower than the real HW. In another word, 1 second of HW execution time equals 16 minutes and 40 seconds simulation time. ▶ To achieve this kind of performance, the system is best modeled in transaction level.

Solution: Virtual Platform High-Speed Simulation SystemC-Based Models Transactional Level Modeling Methodology Abstraction Levels range from Programmer ’ s View to Cycle-Accurate

Current System Design Methodology Refine C, C++ System-level Modeling Simulation & Analysis Results Verilog/VHDL Simulation Synthesis Done To tape out, test and product delivery

Conventional Hardware Modeling

Transaction Level Modeling

Transfers

Abstraction Level of Hardware Models * Willamette HDL, Inc.

SystemC 2.1 Language Architecture (IEEE 1666) Methodology-Specific Libraries Master/Slave Library, etc. Layered Libraries Verification Library, TLM Library, etc. Primitive Channels Signal, Mutex, Semaphore, FIFO, etc. Core Language Modules Ports Interfaces Channels Data Types 4-valued Logic Type 4-valued Logic Vectors Bits and Bit Vectors Arbitrary Precision Integers Fixed-Point Types Event-Driven Simulation Events, Processes C++ Language Standards

SystemC & C++ ▶ SystemC is a set of C++ class definitions and a methodology for using these classes. ▶ C++ class definition means systemc.h and the matching library. ▶ Methodology means the use of simulation kernel and modeling. ▶ You can use all of the C++ syntax, semantics, run time library, STL and such. ▶ However you need to follow SystemC methodology closely to make sure the simulation executes correctly.

SystemC & HDL ▶ SystemC is a Hardware Description Language (HDL) from system-level down to gate level. ▶ Modules written in traditional HDLs like Verilog and VHDL can be translated into SystemC, but not vise versa. Reason: Verilog and VHDL do not support transaction-level. ▶ System-Verilog is Verilog plus assertion, which is an idea borrowed from programming languages. And SystemC supports assertion as well through the C++ syntax and semantics.

SystemVerilog vs. SystemC ▶ SystemVerilog is Verilog plus verification (assertion). ▶ Actually the above statement is not fair but it is the truth now. ▶ SystemVerilog and SystemC work together to complete the design platform from system-level to gate-level. ▶ SystemC deals with whatever above RTL. ▶ SystemVerilog deals with RTL and below.

Simulation Speed Up MEAESDWT Verilog Platform Verilog Module VHM Module Verilog Platform Verilog Module VHM Module Verilog Platform Verilog Module VHM Module 4246 Sec64 Sec20 Sec431 Sec15 Sec4 Sec214 Sec7 Sec2 Sec CPU:Pentium IV 1.6GHz RAM:512MB OS:RedHat Linux 8.0 ConvergenSC Ver NC-SIM Ver. 4.0 Carbon Ver. C SP2

Analysis Case Studies: JPEG Decoding Platform

Configuration 1 ARM926EJ-S Core Dual Master Port DMA Controller Interrupt Controller Static Memory Interface APB AMBA bus Input JPEG Single layer AHB AMBA bus Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl Display Controller

Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl

Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl

Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl

Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl

Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl

Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl

Bus Contention Analysis This is probably DMA input to external memory What is this activity? Max Average Notice there is always buscontention And this? There are several different problems. Let’s start by zooming in on a suspected DMA transfer.

Max Average During these time intervals there is contention approximately 60 and 90 percent of the time So our next step is to examine the transaction counts by initiators and targets in this time period.

Bus Utilization Target Count Initiator Count ARM to ROM ARM to RAM AHB to Int ROM AHB to Int RAM APB to AHB AHB to Ext Mem APB to DMA Master2 DMA Master1 to Ext Mem Utilization is down

APB to DMA Master2DMA Master1 to Ext Mem AHB to Int ROM AHB to Int RAM APB to AHB AHB to Ext Mem Target Avg Initiator Avg Utilization Avg Read time Write time

DMA Activity We have confirmed that: It is DMA activity There is contention approximately 60 to 90% of the time over these time intervals The DMA is contending with the CPU for AHB access We determined the transaction counts and their duration our next step is to examine the activity in this time period.

Max Average During these time intervals there is contention approximately 33 percent of the time our next step is to examine the target and initiator counts. In these views we are zoomed in at the very beginning of the time period of interest. We can see the increase in activity.

ARM to Int ROM ARM to Int RAM AHB to Int ROM AHB to Int RAM Initiator Count Bus Utilization Target Count An increase in CPU to RAM activity due to...

Function Trace Bus Contention Average (these are low level routines called from the process_sos function as it is preparing for the Huffman decoding activity) Increase due to the initializaiton of memory allocation

Increased CPU to RAM activity The contention and utilization problems are primarily due to: An increase of CPU to RAM activity There is contention approximately 33% of the time over this time interval The software activity is the initialization of memory allocation Our next step is to examine the activity in this time period.

Initiator Count Bus Utilization Target Count ARM to Display ARM to RAM Primarily an increase in ARM to ROM activity

The contention and utilization problems are primarily due to: ARM core to ROM and RAM activity Dual DMA activity XB ROM RAM0 RAM1 External Memory Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI ARM Core DMA Ctrl Next, examine two possible solutions.

Configuration 2 Instruction Data AHB4 IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl Int. ROM Int. RAM Bus Matrix AHB2 AHB3 Input and Output Stages I O in2 out1 I in1 I in0 O out2 O out0

Configuration 3 Instruction Data AHB2 IRQ FIQ DMA_Int Master1 Master2 Slave APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl Int. ROM Int. RAM Input and Output Stages Bus Matrix I I O I I I O out0 Bus Matrix O in2 in3 in4 out2 out1 in0 in1

Bus Contention Analysis of Three Configurations Configuration 1 Configuration 2 Configuration 3 3 AHB with 1 Multi-layer Less DMA Contention Single AHB with 2 Multi-layers Single AHB CPU to Memory Contention DMA Contention No Bus Contention No CPU to Memory Contention

Cache Analysis Instruction Data AHB Int. ROM Int. RAM IRQ FIQ DMA_Int Master1 Master2 Slave APB AHB2APB Display Ctrl Input Device Interrupt Ctrl AHB APB_cfg APB Reset Ctrl Clock Gen. SMI XB ROM RAM0 RAM1 External Memory ARM Core DMA Ctrl We begin with the cache disabled and examine the software execution.

IDCT activity Huffman Decoding

Large number of accesses to the Internal ROM The ARM core does not have the cache enabled. We will build a system with cache enabled and compare the results.

We will now compare the Master to Slave Access views. Cache disabled Cache enabled

Notice the striking reduction in ROM access. Cache disabled Cache enabled

The Best Settings for The Cache Memory Access Trace The default cache setting is “write- through.” From the Memory Access Trace view, we can zoom in on the cached memory region.

Zoomed Out View The cached memory region of interest is Start addr: 0x Stop addr: 0x IDCT activity Huffman Decoding click release Zoomed in View

Memory Access Trace From the Memory Access Trace view, we notice less write activity.

IDCT activity Huffman Decoding click release Compare the Memory Access Views Group the views and compare as shown on the next page. The cached memory region of interest is Start addr: 0x Stop addr: 0x Zoomed Out ViewZoomed in View

The Simulation and Analysis Cycle Simulation Processor Interface RAM ROM Intr Addr Data Analysis Hardware Processors Bus Memory Peripherals Interfaces Software Startup code Device drivers RTOS Application code

ESL Future Works Verification Between TLM & RTL Models High-Level Synthesis