HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

System Integration and Performance
I/O Organization popo.
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
Chapter 3 Basic Input/Output
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
Computer System Overview
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
MotoHawk Training Model-Based Design of Embedded Systems.
Input-output and Communication Prof. Sin-Min Lee Department of Computer Science.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Architectural Support for Operating Systems. Announcements Most office hours are finalized Assignments up every Wednesday, due next week CS 415 section.
Chapter 7 Interupts DMA Channels Context Switching.
Chapter 1 and 2 Computer System and Operating System Overview
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
Computer System Structures memory memory controller disk controller disk controller printer controller printer controller tape-drive controller tape-drive.
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
INPUT/OUTPUT ORGANIZATION INTERRUPTS CS147 Summer 2001 Professor: Sin-Min Lee Presented by: Jing Chen.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Introduction to Embedded Systems
Meier208/MAPLD DMA Controller for a Credit-Card Size Satellite Onboard Computer Michael Meier, Tanya Vladimirova*, Tim Plant and Alex da Silva Curiel.
LPC2148 Programming Using BLUEBOARD
SOC Consortium Course Material ASIC Logic National Taiwan University Adopted from National Chiao-Tung University IP Core Design.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
ELEC4601 Microprocessor systems Lab 3 Tutorial
MICROPROCESSOR INPUT/OUTPUT
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
THE COMPUTER SYSTEM. Lecture Objectives Computer functions – Instruction fetch & execute – Interrupt Handling – I/O functions Interconnections Computer.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Khaled A. Al-Utaibi  Interrupt-Driven I/O  Hardware Interrupts  Responding to Hardware Interrupts  INTR and NMI  Computing the.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Operating Systems and Networks AE4B33OSS Introduction.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Expected Course Outcome #Course OutcomeCoverage 1Explain the concepts that underlie modern.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
EEE440 Computer Architecture
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
13-Nov-15 (1) CSC Computer Organization Lecture 7: Input/Output Organization.
HW/SW Co-design Lecture 2: Lab Environment Setup Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE Dept, NTHU.
Modes of transfer in computer
Operating System Structure A key concept of operating systems is multiprogramming. –Goal of multiprogramming is to efficiently utilize all of the computing.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
6-1 Infineon 167 Interrupts The C167CS provides 56 separate interrupt sources that may be assigned to 16 priority levels. The C167CS uses a vectored interrupt.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Dr Mohamed Menacer College of Computer Science and Engineering, Taibah University CE-321: Computer.
Lecture 1: Review of Computer Organization
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
IT3002 Computer Architecture
Processor Memory Processor-memory bus I/O Device Bus Adapter I/O Device I/O Device Bus Adapter I/O Device I/O Device Expansion bus I/O Bus.
Lecture 4 General-Purpose Input/Output NCHUEE 720A Lab Prof. Jichiang Tsai.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
1 Run-to-Completion Non-Preemptive Scheduler. 2 In These Notes... What is Scheduling? What is non-preemptive scheduling? Examples Run to completion (cooperative)
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
HW/SW Co-design Lecture 3: Lab 1 – Getting Started with the Tools Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
ARM7 TDMI INTRODUCTION.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
Timer and Interrupts.
Computer Architecture
Lecture 5: Lab 3 – Active HW Accelerator Design
Presentation transcript:

HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE Dept, NTHU

Outline Introduction to AMBA Bus System Passive Hardware Design Interrupt Service Routine Environment Configuration Co-designed System with GHDL Simulation Co-designed System on FPGA

INTRODUCTION TO AMBA BUS SYSTEM

AMBA 2.0 Bus System (1/7) Established by ARM Advanced High-performance Bus (AHB) For high-performance, high clock frequency system modules such as embedded processor, DMA controller, and memory controller Advanced Peripheral Bus (APB) Optimized for minimal power consumption and reduced interface complexity to support peripheral functions For more details, please refer to the following documents AMBA 2.0 Specification Introduction to AMBA Bus System GRLIB AHBCTRL - AMBA AHB controller with plug&play support

AMBA 2.0 Bus System (2/7) Slave on AHB The only master on APB

AMBA 2.0 Bus System (3/7) AMBA AHB is designed to be used with a central multiplexor interconnection scheme Avoids tri-state bus

AMBA 2.0 Bus System (4/7) An AHB transfer consists of two distinct sections The address phase, which lasts only a single cycle The data phase, which may require several cycles This is achieved using the HREADY signal

AMBA 2.0 Bus System (5/7) A slave may insert wait states into any transfer For write operations, the bus master will hold the data stable throughout the extended cycles For read transfers, the slave does not have to provide valid data until the transfer is about to complete wait states

AMBA 2.0 Bus System (6/7) GRLIB implements AMBA AHB with slight modifications Please refer to the GRLIB User's Manual and GRLIB IP Cores Manual for detailed informationGRLIB User's Manual GRLIB IP Cores Manual

AMBA 2.0 Bus System (7/7) The GRLIB implementation of AHB includes a mechanism to provide plug&play support The implementation is located at grlib-gpl b3188/lib/grlib/amba/ The configuration record from each AHB unit is sent to the AHB bus controller via the HCONFIG signal identification of attached units address mapping of slaves interrupt routing type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;

PASSIVE HARDWARE DESIGN

Passive HW Accelerators The accelerator (bus slave) does not actively send signals to the bus It only responds to the master The master gives commands to the slave via its control registers and probes its status registers master slave

Passive 1-D IDCT HW Acc. (1/4) A simple 2-stage design Gate delay Stage 1: ~1 mult Stage 2: ~3 add Action register Write ‘1’ to start, reset to 0 automatically by the accelerator when done Mode register Row/column mode No wait states Immediate response action mode

Passive 1-D IDCT HW Acc. (2/4) Data packing Since the 8x8 blocks are of type short (16-bit), each value occupies only half of the data bus (32-bit) We pack two values together to increase data bus utilization and reduce the communication overhead The action bit and mode bit are also packed together actionmodeUNUSED 31012

Passive 1-D IDCT HW Acc. (3/4) 1-D IDCT calculation STEP1: Write Y registers (4 transfers) STEP2: Write mode bit & action bit STEP3: Poll the action bit STEP4: Read x registers after action bit reset

Passive 1-D IDCT HW Acc. (4/4) static void hw_idct_1d(short *dst, short *src, unsigned int mode) { long *long_ptr = (long *)src; Y_array_base[0] = long_ptr[0]; Y_array_base[1] = long_ptr[1];... *c_reg = (long)((mode << 1) | 0x1); while (*c_reg & 0x1){ /*busy waiting loop*/ } dst[ 0] = ((short *)x_array_base)[0]; dst[ 8] = ((short *)x_array_base)[1];... }

INTERRUPT SERVICE ROUTINE

GRLIB GPTIMER (1/2) General Purpose Timer Unit Timers are present in almost any electronic device which needs timing functions (e.g. timekeeping & time measurement) Acts as a slave on AMBA APB Provides a common decrementing prescaler (clocked by the system clock) and decrementing timers Capable of asserting interrupt on timer underflow We initialize timer 2 for 1ms resolution (i.e. an interrupt will be asserted every 1ms)

GRLIB GPTIMER (2/2) Please refer to the GRLIB IP Cores Manual for detailed informationGRLIB IP Cores Manual

eCos ISR (1/3) When an interrupt occurs, the processor jumps to a specific address for execution of the Interrupt Service Routine (ISR) One of the key concerns in embedded systems with respect to interrupts is latency, which is the interval of time from when an interrupt occurs until the ISR begins to execute interrupt latency

eCos ISR (2/3) Basic API for implementing ISR Please refer to the eCos Reference Manual for detailed informationeCos Reference Manual #include void cyg_interrupt_create(cyg_vector_t vector, cyg_priority_t priority, cyg_addrword_t data, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t* handle, cyg_interrupt* intr); void cyg_interrupt_delete(cyg_handle_t interrupt); void cyg_interrupt_attach(cyg_handle_t interrupt); void cyg_interrupt_detach(cyg_handle_t interrupt); void cyg_interrupt_acknowledge(cyg_vector_t vector); void cyg_interrupt_mask(cyg_vector_t vector); void cyg_interrupt_unmask(cyg_vector_t vector);

eCos ISR (3/3) An ISR is a C function which takes the following form An ISR should complete as soon as possible cyg_uint32 isr_function(cyg_vector_t vector, cyg_addrword_t data) {.../* do the service routine */ return CYG_ISR_HANDLED; }

Program Profiling (1/2) We use GPTIMER for time measurment Every time the timer asserts an interrupt, the timer ISR will increase a global variable time_tick cyg_uint32 timer_isr(cyg_vector_t vector, cyg_addrword_t data) { unsigned long *time_tick = (unsigned long *) data; (*time_tick)++; cyg_interrupt_acknowledge(vector); return CYG_ISR_HANDLED; }

Program Profiling (2/2) We record the latency of every function block by monitoring the time_tick variable void func() { unsigned long local_timer = time_tick;... time_elapsed += (time_tick - local_timer); }

ENVIRONMENT CONFIGURATION

Build SW Application Copy the files in lab_pkg/lab2/sw to your original Lab 1 directory Replace the Makefile and modify the path for ECOSDIR in Makefile Type “ make ” to build -D_HW_ACC_ flag will link the co-designed version of hw_idct_2d() in idct_hw.c with the testbench Without this flag, hw_idct_2d() will be identical to sw_idct_2d() -D_PROFILING_ flag will enable profiling using timer interrupt, and report the results in the end

Install IDCT Accelerator Copy lab_pkg/lab2/hw/devices.vhd to grlib-gpl b3188/lib/grlib/amba/ and replace the original file Copy lab_pkg/lab2/hw/libs.txt and the whole lab_pkg/lab2/hw/esw folder to grlib- gpl b3188/lib/ The 1-D IDCT passive accelerator is located at lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd Copy lab_pkg/lab2/hw/leon3mp.vhd to grlib-gpl b3188/designs/leon3-gr- xc3s-1500/ and replace the original file

CO-DESIGNED SYSTEM WITH GHDL SIMULATION

GHDL Simulation (1/6) We compile our program as a virtual SDRAM for LEON3 processor LEON3 will fetch the instructions and perform the corresponding operations All the hardware signals can be recorded and dumped by GHDL

GHDL Simulation (2/6) In order to perform GHDL simulation, we disallow our program to link with eCos Remove -D__ECOS & -I$(ECOSDIR)/include from CFLAGS Remove -Ttarget.ld, -nostdlib, & -L$(ECOSDIR)/lib from LFLAGS Remove –D_PROFILING_ flag You can remove -D_VERBOSE_ for faster simulation You can modify the NUM_BLKS macro in idct_test.c to reduce the number of testbench iterations Type “ make ” to build You should see a file named sdram.srec

GHDL Simulation (3/6) Start Cygwin cd grlib-gpl b3188/designs/leon3-gr- xc3s-1500/ make distclean make soft Copy sdram.srec we built into this directory and replace the original one make ghdl You can check for syntax errors through GHDL

GHDL Simulation (4/6) Type “./testbench.exe --vcd=waveform.vcd ” after compilation to begin simulation You should see an AHB slave with “Unknown vendor” appear, which is our IDCT accelerator

GHDL Simulation (5/6) The dump file waveform.vcd can be viewed on-the-fly using GTKWave Drag waveform.vcd and drop it over the gtkwave.exe icon to open You can also use Windows cmd to open “File → Reload Waveform” in GTKWave to update the dump file

GHDL Simulation (6/6) addr phase data phase stage 1 stage 2 probe control reg

CO-DESIGNED SYSTEM ON FPGA

Build FPGA Bitstream (1/2) Type “ make ise | tee ise_log ” under grlib-gpl b3188/designs/leon3-gr-xc3s-1500/ after you install the accelerator It is strongly suggested that you verify the hardware with GHDL simulation first It is also suggested that you take a look at ise_log for more information Configure your FPGA with leon3mp.bit after generating the bitstream

Build FPGA Bitstream (2/2) After entering GRMON, check the system configuration using “ info sys ” You should see a device with “Unknown vendor” appear

Profiling Results Build the program with -D_PROFILING_ flag on Compare the computation results of sw_idct_2d() and hw_idct_2d() Compare the computation results with and without -D_VERBOSE_ flag