Download presentation
Presentation is loading. Please wait.
1
Asynchronous Pipelined Ring Interconnection for SoC Final Presentation One semester project, Spring 2005 Supervisor: Nitzan Miron Students: Ziv Zeev Shwaitser Chen Damishian Based on the article: “ SELF-TIMED COMMUNICATION PLATFORM FOR SYSTEM-ON-CHIP DESIGN ” Pasi Liljeberg, Juha Plosila, and Jouni Isoaho Electronics and Communication Systems Dept. of Information Technology University of Turku, Finland
2
Agenda Presenting the system. Presenting the system. –Preface. –Project purposes. –The system architecture. Implementation steps. Implementation steps. –Implementing the processing elements in VHDL. –Simulating the system in Modelsim. –Defining the system in EDK. System architecture. System architecture. Implementing protocols. Implementing protocols. Implementing the code. Implementing the code. Debugging the system. Debugging the system. Simulation results. Simulation results. Summary. Summary.
3
Presenting the system The system is implemented using an Asynchronous Pipelined Ring Interconnection that was designed in the VLSI lab. The system is implemented using an Asynchronous Pipelined Ring Interconnection that was designed in the VLSI lab. Each processing element works in a different clock domain, and the synchronization is done using asynchronous transfer stages. Each processing element works in a different clock domain, and the synchronization is done using asynchronous transfer stages.
4
Presenting the system Each transfer stage has three pipe stages. Each transfer stage has three pipe stages. –The request from the bus is accepted in the Input Control unit. –The request is transferred to the processing element or to the next transfer stage, according to the address, in the Forward Control unit. –The request from the previous transfer stage or from the processing element is transferred to the bus in the Output Control unit. BUSBUS BUSBUS HOST
5
Presenting the system The purposes of the project are: The purposes of the project are: 1) Design a system that is composed of different processing elements. 2) Implement the system using the Asynchronous Pipelined Ring Interconnection system. 3) Synthesize the system on an FPGA card. 4) Debug the system and check its feasibility.
6
Presenting the system The proposed system is a calculator with a stack architecture. The proposed system is a calculator with a stack architecture. –The data and the operations are presented in a postfix order. ((1 + 2) * 3) will be represented as (1 2 + 3 *). ((1 + 2) * 3) will be represented as (1 2 + 3 *). The input can be a number or an opcode. The input can be a number or an opcode. When a number is received, it is pushed to the stack. When a number is received, it is pushed to the stack. When an opcode is received, its arguments are popped from the stack, and its result is pushed to the stack. When an opcode is received, its arguments are popped from the stack, and its result is pushed to the stack. For example, for the input 1 2 +. For example, for the input 1 2 +. At first, the stack is empty. The first argument is written to the stack 1 The second argument is written to the stack 2 The arguments for the operation are popped from the stack The result of the operation is pushed to the stack 3
7
Presenting the system The system is composed of five processing elements: The system is composed of five processing elements: –Stack. –Memory. –ALU. –Input. –Output. Each processing element has a different address in the ring. Each processing element has a different address in the ring. The data bus width was chosen to be 80 bits, which are divided according to the following: The data bus width was chosen to be 80 bits, which are divided according to the following: Destination address Source address OpcodeData 15:031:1647:3279:48
8
Presenting the system The Stack: The Stack: –Performs POP, PUSH and CLEAR operation. The Memory: The Memory: –Upon a LOAD, performs: PUSH(Memory[POP]). PUSH(Memory[POP]). –Upon a STORE, performs: POP(Addr). POP(Addr). POP(Data). POP(Data). Memory[Addr] <= Data. Memory[Addr] <= Data. The ALU performs the following operation: The ALU performs the following operation: –PUSH(POP OP POP). The following error conditions cause an abort of the operation, followed by an indication to the user: The following error conditions cause an abort of the operation, followed by an indication to the user: –Stack empty (or not enough elements in the Stack). –Operation error (Memory address not in range, division by 0). –Stack full.
9
Presenting the system The Input: The Input: –Gets requests from the plb bus, and transfers them to the appropriate processing element. –Gets an indication from the Output, that a request has finished, and another request can be processed. The Output: The Output: –Gets results from the processing elements, and returns them on the plb bus. –Gets results, and sends indication to the Input, that a request has finished, and another request can be processed.
10
Presenting the system For example: 1 2 + 5 store. For example: 1 2 + 5 store. Input Bus Output StackMemory ALU For example: 1 2 + 5 store. For example: 1 2 + 5 store. 1 & PUSH & OUTPUT & STACK 1 & SUCCESS & STACK & OUTPUT 1 & NEXT & OUTPUT & INPUT
11
Presenting the system For example: 1 2 + 5 store. For example: 1 2 + 5 store. Input Bus Output StackMemory ALU For example: 1 2 + 5 store. For example: 1 2 + 5 store. 2 & PUSH & OUTPUT & STACK 2 & SUCCESS & STACK & OUTPUT 2 & NEXT & OUTPUT & INPUT
12
Presenting the system For example: 1 2 + 5 store. For example: 1 2 + 5 store. Input Bus Output StackMemory ALU For example: 1 2 + 5 store. For example: 1 2 + 5 store. 0 & PLUS & OUTPUT & ALU 2 & POP & ALU & STACK 2 & SUCCESS & STACK & ALU 1 & SUCCESS & STACK & ALU 3 & PUSH & ALU & STACK 3 & SUCCESS & STACK & ALU 3 & (SUCCESS PLUS) & ALU & OUTPUT 3 & NEXT & OUTPUT & INPUT
13
Presenting the system For example: 1 2 + 5 store. For example: 1 2 + 5 store. Input Bus Output StackMemory ALU For example: 1 2 + 5 store. For example: 1 2 + 5 store. 5 & PUSH & OUTPUT & STACK 5 & SUCCESS & STACK & OUTPUT 5 & NEXT & OUTPUT & INPUT
14
Presenting the system For example: 1 2 + 5 store. For example: 1 2 + 5 store. Input Bus Output StackMemory ALU For example: 1 2 + 5 store. For example: 1 2 + 5 store. 0 & STORE & OUTPUT & MEMORY 2 & POP & MEMORY & STACK 5 & SUCCESS & STACK & MEMORY 3 & SUCCESS & STACK & MEMORY 3 & (SUCCESS STORE) & MEMORY & OUTPUT 3 & NEXT & OUTPUT & INPUT For example: 1 2 + 5 store. For example: 1 2 + 5 store.
15
Agenda Presenting the system. Presenting the system. –Preface. –Project purposes. –The system architecture. Implementation steps. Implementation steps. –Implementing the processing elements in VHDL. –Simulating the system in Modelsim. –Defining the system in EDK. System architecture. System architecture. Implementing protocols. Implementing protocols. Implementing the code. Implementing the code. Debugging the system. Debugging the system. Simulation results. Simulation results. Summary Summary
16
Implementation steps The files in VHDL were imported to HDL Designer. The files in VHDL were imported to HDL Designer. The Ring was built from five transfer stage elements. Each transfer stage element was assigned a different address. The Ring was built from five transfer stage elements. Each transfer stage element was assigned a different address. Each processing element was implemented using graphical views such as flow diagrams or state machines. Each processing element was implemented using graphical views such as flow diagrams or state machines.
17
Simulation with Modelsim Different clock domain Output Memory ALU Stack Input PUSH STORE
18
Simulation with Modelsim Since the interconnection ring is asynchronous, the data is transferred immediately from one transfer stage to the other.
19
Defining the system in EDK The system is composed of two buses: The system is composed of two buses: –One bus, the PLB (Processor Local Bus) is faster, and closer to the PPC. On which the PPC and the instruction controller are located. On which the PPC and the instruction controller are located. –The other, the OPB (On board Peripheral Bus) is slower, and farther from the PPC. On which the IO peripherals are located. On which the IO peripherals are located. –The two busses are connected through a bridge (plb2opb). The Asynchronous Pipelined Ring system is located on the PLB. The Asynchronous Pipelined Ring system is located on the PLB.
20
Defining the system in EDK The system was allocated addresses on the PLB bus. The system was allocated addresses on the PLB bus. –A 32 bit write to address 0x000 means a number that should be pushed to the Stack. –A 16 bit write to address 0x010 means an opcode that needs to be executed (Such as POP, LOAD, STORE, ‘ + ’, etc). –A 32 bit read from address 0x020 means getting the result of the previous operation. –A 16 bit read from address 0x030 means getting the returned error code of the previous operation.
21
Implementing protocols To implement the interface with the PLB, we have used the EDK tool to generate an instance of an IPIF unit, with which our system can communicate easily. To implement the interface with the PLB, we have used the EDK tool to generate an instance of an IPIF unit, with which our system can communicate easily.
22
Implementing protocols The IPIF protocol with the IP was implemented according to the specifications. The IPIF protocol with the IP was implemented according to the specifications.
23
Implementing the code DIP Switches – Used to enter a number between 0 and 255. The code was written in C. The code was written in C. The code reads requests from the user, sends them to the system, and returns results to the user. The code reads requests from the user, sends them to the system, and returns results to the user. Push button – The input is a number to be pushed to the Stack. Push button – The input is an opcode, which should be sent to the appropriate element. Push button – used to end the program. Serial port – used to send the results to the user, through hyper terminal. LCD screen – used to display the project information.
24
Agenda Presenting the system. Presenting the system. –Preface. –Project purposes. –The system architecture. Implementation steps. Implementation steps. –Implementing the processing elements in VHDL. –Simulating the system in Modelsim. –Defining the system in EDK. System architecture. System architecture. Implementing protocols. Implementing protocols. Implementing the code. Implementing the code. Debugging the system. Debugging the system. Simulation results. Simulation results. Summary. Summary.
25
Debugging the system The resource allocation problem. The resource allocation problem. Adding buffers. Adding buffers. Synchronizing two different clock domains. Synchronizing two different clock domains.
26
The resource allocation problem The first project files could not fit into the card. The first project files could not fit into the card. Analysis of the problem led to the conclusion that the problem was with a conversion function from std_logic_vector to integer. Analysis of the problem led to the conclusion that the problem was with a conversion function from std_logic_vector to integer.
27
Adding buffers The first checking of the system gave different results than the results in the simulation. The first checking of the system gave different results than the results in the simulation. The first suspicion was about the validity of the data in respect to the time the request arrives. The first suspicion was about the validity of the data in respect to the time the request arrives.
28
Adding buffers
29
The synthesis took off the buffers, for not being essential logically. The synthesis took off the buffers, for not being essential logically. The way to overcome this limitation was to create a buffer with enable bits, whose enable bits come from a register that can contain different results (And not considered constant according to the synthesis tool, even though it remains 1 throughout the simulation). The way to overcome this limitation was to create a buffer with enable bits, whose enable bits come from a register that can contain different results (And not considered constant according to the synthesis tool, even though it remains 1 throughout the simulation).
30
Adding buffers An alternative solution to the problem that is delay independent is to use dual rail representation for the data. An alternative solution to the problem that is delay independent is to use dual rail representation for the data. This would require additional logic for request valid indication. This would require additional logic for request valid indication. This would also double the registers. This would also double the registers. For the above reasons, this solution was abandoned. For the above reasons, this solution was abandoned. C Done OR bit 0 OROROROR bit 1 OR bit n bit m bit 1 ack AB
31
Synchronizing two different clock domains Some combinations of clock frequencies gave unpredicted results. Some combinations of clock frequencies gave unpredicted results. The problem was the lack of synchronization between two synchronous elements, which operate in a different clock domain. The problem was the lack of synchronization between two synchronous elements, which operate in a different clock domain. –The user_logic (clk_100) and the Input. –The user_logic (clk_100) and the Output. Input Output
32
The solution: Adding a pair of FF at each input. The solution: Adding a pair of FF at each input. Synchronizing two different clock domains
33
Agenda Presenting the system. Presenting the system. –Preface. –Project purposes. –The system architecture. Implementation steps. Implementation steps. –Implementing the processing elements in VHDL. –Simulating the system in Modelsim. –Defining the system in EDK. System architecture. System architecture. Implementing protocols. Implementing protocols. Implementing the code. Implementing the code. Debugging the system. Debugging the system. Simulation results. Simulation results. Summary. Summary.
34
Simulation results Input: Output:
35
Agenda Presenting the system. Presenting the system. –Preface. –Project purposes. –The system architecture. Implementation steps. Implementation steps. –Implementing the processing elements in VHDL. –Simulating the system in Modelsim. –Defining the system in EDK. System architecture. System architecture. Implementing protocols. Implementing protocols. Implementing the code. Implementing the code. Debugging the system. Debugging the system. Simulation results. Simulation results. Summary. Summary.
36
Summary We enjoyed working on this project, and we learned a lot. We enjoyed working on this project, and we learned a lot. We gained experience in: We gained experience in: –VHDL design and simulation. –Design of a system on an FPGA card, and the synthesis flow. –Debug of a complicated system. –Debug of an asynchronous logic on an FPGA.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.