System-on-Chip Design Homework Solutions Hao Zheng Comp Sci & Eng U of South Florida
HW 4
[CIS 6930] 9.1 CPU1 -> CPU2 Communication speed/delay: Each bus transaction on the high-speed bus takes 50 ns. Each bus transaction on the low-speed bus takes 200 ns. Each memory access (read or write) takes 80 ns. Each bridge transfer takes 100 ns. The CPU’s are much faster than the bus system, and can read/write data on the bus at any chosen data rate.
Total runtime in co-processor implementation [CIS 6930] 9.3 A C function takes 1000 cycles to execute in SW. 10 inputs and 10 outputs, each of which is an integer (word) Now evaluate under what conditions that a co-processor implementation brings performance benefits. K cycles need to to execute the function in HW co-processor. Q cycles needed to transfer 1 word from SW to co-processor. Total runtime in co-processor implementation 10*Q + K + 10*Q
Better co-processor implementation if [CIS 6930] 9.3 Better co-processor implementation if 10*Q + K + 10*Q <= 1000
[CIS 6930] 9.4 Draw FSM implementing the 2-way handshake protocols. req req ack ack Draw FSM implementing the 2-way handshake protocols. Optimize it by sending two data in a single transaction.
[CIS 6930] 9.4 Draw FSM implementing the 2-way handshake protocols. 1 ack=0/ send data req <= 1 ack=1/ req <= 0 2
[CIS 6930] 9.4 Optimize it by sending two data in a single transaction. 1 ack=0/ send data1 req <= 1 ack=1/ send data2 req <= 0 2
P2: 1- or 2-Way Handshake Protocols Read section 9.2.3 in the CoDesign book. 1-way HS assumes that sender runs faster than receivers. 2-way HS can handle sender/receiver of various speeds.
P3: Steps of Basic Bus Transfers. Read section 10.2.1 in the CoDesign book. Steps: Master gets bus access by negotiating with bus arbiter. Master issues address/data/command. Slave acknowledges the transfer. Master releases the bus.
10.3 memory address of i? 0x3F68 memory address of a[0] 0x3F6C
11.1 Address Decoder A decode that maps the register to range 0x3F000000 – 0x3F00FFFF A 16-bit AND gate A decoder that maps the register to 0x3F000000. A 32-bit AND gate
11.2 Double the data transfer rate for the following design.
11.2 Double the data transfer rate for the following design.
P5: Complete the Figure HW Module CPU FIFO
P5: Complete the Figure reqin reqout din HW dout Module CPU FIFO ackin ackout Master Slave
P5: FSMs for Interfaces reqin & !full / store din; !reqin / ackin <= ‘1’ !reqin / ackin <= ‘0’ !ackout & !empty / load dout; reqout <= ‘1’ ackout / reqout <= ‘0’ Slave interface FSM Slave interface FSM
[CIS 6930] P6 A custom HW connected to a CPU through a 32-bit bus. The HW has a single 128-bit input port. How does CPU send data to HW before HW is activated? Use a FIFO as in P5 to buffer data, or CPU waits for a ready signal from HW.
P6 A custom HW connected to a CPU through a 32-bit bus. The HW has a single 128-bit input port. Interface design of HW module Use a serial-in parallel-out shift register and a counter. start / read d1 read d2 read d3 read d4 read_done <= 1
P6 A custom HW connected to a CPU through a 32-bit bus. The HW has a single 128-bit input port. Draw the timing diagram for data transfer The answer varies depending on the interface and protocol used to connect CPU and HW.
HW 3
Problem 4.1 1 2 3 4 5 The longest length in DFG is 4.
Problem 4.1
Problem 4.2 1 2 3 4 a, b 1 1 a, b 2 4 a 2 4 a a CFG 3 DFG 3 a
Problem 4.3 int mysqrt(int N) { int x = 0, j; 0x80 int mysqrt(int N) { int x = 0, j; for (j=1<<7; j!=0; j>>1) { x = x + j; if (x*x > N) x = x – j; } return x; 1 1 2 3 1 2
Problem 4.3 int mysqrt(int N) { int x = 0, j; flag_zero for (j=1<<7; j!=0; j>>1) { x = x + j; if (x*x > N) x = x – j; } return x; flag_zero !flag_if / !flag_zero / flag_if /
Problem 4.7 int mysqrt(int N) { int x = 0, j; for (j=1<<7; j!=0; j>>1) { x = x + j; if (x*x > N) x = x – j; } return x; }
Problem 4.7 0x80 flag_zero flag_if
HW 2
P2.1 The values of tokens into the snk actor is 2, 4, 8, …, i.e., Figure 2.24 The values of tokens into the snk actor is 2, 4, 8, …, i.e.,
P2.2 Fibonacci Sequence Figure 2.24
P2.3: Original SDF
P2.3: Transformed SDF snk2 fork fork add fork 1
P2.4: Accumulator with Adder src add
P2.5 The topology matrix Figure 2.26 For PASS to exist, the rank must be 2. Set X = 2 and Y = 1, then the combination of first two columns gives the third one.
HW 1
P1: Structural Models at System and Processor Levels
P1: Structural Models at System and Processor Levels
P2: why a system-level structural model is more abstract than a processor- level structural model? Each component in a system-level structural model represents a design at the processor-level, which can be in many various forms such as a behavioral model, structural model as shown before, or a different structural model, etc. Implementation of communications over buses are defined in terms of messages, not bits.
P2: why a system-level structural model is more abstract than a processor- level structural model? Components in a processor-level structural model are described at the more detailed register-transfer cycle accurate level. Component interfaces and buses are bit-accurate and
P3: differences between the behavioral models at the cycle-accurate level and the instruction level. In a cycle-accurate model, the design behavior captures how registers are updated on individual clock edges. Each instruction typically takes a number of cycles to execution. An instruction accurate model captures how memory/registers are updated after execution of each individual instructions.
P4: benefits of using instruction accurate models compared to cycle-accurate models Each instruction typically takes a number of cycles to execution. An instruction accurate model captures how memory/registers are updated after execution of each individual instructions, without considering the register updates at each cycle. Therefore, simulating instruction accurate model is much faster. Allows early development and validation of SW.
P4: benefits of using transaction-level models compared to instruction accurate models Each transaction represents a sequence of instructions, ex. printf() Simulating transaction accurate model is much faster. Important for early exploration of system design space. Provide a function-accurate system prototype for early development and evaluation of SW.
P5 What is the system level synthesis? A process that converts a system behavioral model to a system-level structural model.
P5 What is the input model to the synthesis like?
P5 What are the key elements in the generated models? Processing elements such as CPUs, DSPs, memory controller, buses, communication/peripheral interfaces, custom HW logic components, etc.
P6(a) What would the design model look like if the system behavioral model is implemented in software completely? CPU DSP
P6(b) What would the design model look like if the system behavioral model is implemented in hardware completely? HW/ASIC HW/ASIC HW/ASIC
P7 pros and cons of pure software or pure hardware implementations for a given system. Fig. 1.6 Driving factors in HW/SW co-design
P8 Differences between concurrency and parallelism Concurrency is the ability to execute simultaneous operations because these operations are completely independent. Related to behavior of applications. Parallelism is the ability to execute simultaneous operations because the operations can run on different processors or circuit elements. Related to HW implementations.