Technion Digital Lab Project Performance evaluation of Virtex-II-Pro embedded solution of Xilinx Students: Tsimerman Igor Firdman Leonid Firdman Leonid Supervisors: Rivkin Ina Bergman Alexander
Technion Digital Lab Project Agenda Project goals Abstract Project resources System overview System implementation & results Results summary Possible improvements
Technion Digital Lab Project Project goals Creation, integration & testing of Xilinx’s PLB Master core using standard IPIF. Comparing between hardware (FPGA) and software (PowerPC based) implementation. Estimation of performance level of Virtex II Pro embedded solution on real digital design example.
Technion Digital Lab Project Abstract Power PC 405 Monitored BUS Virtex II Pro Test Platform HOST Some design requires use of external Logic Analyzer for testability purposes. Virtex II Pro may have the capabilities to serve as Programmable On-Chip-Logic Analyzer. In order to achieve modularity and unification of design, it is preferred to build design around one standard bus. Power PC or Hardware IP may be served as the analyzing units within Virtex II Pro, therefore their performance must be evaluated for this task on the same standard bus (PLB).
Technion Digital Lab Project Project resources
Technion Digital Lab Project Project resources
Technion Digital Lab Project Virtex II Pro XC2VP30 – FF896 –~30K ASIC gates –136 18x18-bit Multipliers –2448 Kb of BRAM (18K in each block) –2 Power PC 405 CPU core (up to 300 MHZ each) –8 DCM (digital clock manager) units –8 Rocket IO transceivers (MGTs) Project resources
Technion Digital Lab Project System Overview Virtex II Pro Generator PLB SW EN block DCM 0 CLKDV Counter 1 Random event (reference) Reset block SYS_RST Power PC 405 Timer (on OPB) Non critical intr OCMCounter 0 SW_EN Random event System block diagram Generator is creating mass stream of random and sequential data patterns. PowerPC and Master core doing the same logical function in data analyzing and being compared to each other in the end of the test. Master core Event counters count random events from Generator (Reference) and Master core.
Technion Digital Lab Project System Overview Master core Virtex II Pro Generator PLB SW EN block DCM 0 CLKDV Counter 1 Random event (reference) Reset block SYS_RST Power PC 405 Timer (on OPB) Non critical intr OCMCounter 0 SW_EN Random event Start sequence
Technion Digital Lab Project System Overview Master core Virtex II Pro Generator PLB SW EN block DCM 0 CLKDV Counter 1 Random event (reference) Reset block SYS_RST Power PC 405 Timer (on OPB) Non critical intr OCMCounter 0 SW_EN Random event 12 Stop sequence
Technion Digital Lab Project System Overview Random Pattern Random event on PLB = Not increment by one of previous data Master core Power PC 405 PLB … Random events
Technion Digital Lab Project System Overview Generator Block Diagram Pseudo- Random Generator Random delay Count to PLB Counter Controlled CLK Din 32 bit Load Max 5 bit Din Load Synchronized Random event
Technion Digital Lab Project System Overview Pseudo-Random Generator The placement and number of XORs may vary Random cycle is 2^32 in length Initial pattern is const Pseudo - Random Generator XOR Shift right each clock
Technion Digital Lab Project System Overview ModelSim simulation results
Technion Digital Lab Project System Overview Chip Scope results
Technion Digital Lab Project System Overview Chip Scope results
Technion Digital Lab Project System Overview Chip Scope results
Technion Digital Lab Project System implementation & Results Power PC Code and data in OCM 32 bit data read Power PC freq. = 300MHz, PLB freq. = 100MHz The results were displayed at the end of the test via UART (Tera-Pro) Code example:
Technion Digital Lab Project Power PC – Chip Scope results Note: All Chip Scope results are measured at: PLB sys_clk freq: 100MHz Generator freq: 100/12 (8.33MHz) System implementation & Results
Technion Digital Lab Project Power PC – Chip Scope results System implementation & Results
Technion Digital Lab Project Power PC – Chip Scope results System implementation & Results
Technion Digital Lab Project Power PC – Chip Scope results 20 sys_clocks between PPC read requests. Max freq: ~5MHz System implementation & Results
Technion Digital Lab Project Power PC – Statistics results System implementation & Results
Technion Digital Lab Project Master core Single transaction configuration Connected through standard IPIF 2.01a Performs data analyzing operation similar to PPC operation Code example: System implementation & Results
Technion Digital Lab Project System implementation & Results Master core – Chip Scope results Note: All Chip Scope results are measured at: PLB sys_clk freq: 100MHz Generator freq: 100/12 (8.33MHz)
Technion Digital Lab Project System implementation & Results Master core – Chip Scope results
Technion Digital Lab Project System implementation & Results Master core – Chip Scope results 24 sys_clocks between Master core read requests. Max freq: ~4.16MHz
Technion Digital Lab Project System implementation & Results Master core – Statistics results
Technion Digital Lab Project System implementation & Results PPC & Master core – Chip Scope results
Technion Digital Lab Project System implementation & Results PPC & Master core – Chip Scope results PPC: sys_clocks between PPC read requests. Aver Max freq: ~5MHz Master:24 sys_clocks between PPC read requests. Max freq: ~4.16MHz
Technion Digital Lab Project System implementation & Results PPC & Master core – Statistics results Note: the statistic results are regarding only PPC transactions
Technion Digital Lab Project Results Summary PPC: 20 system clocks between PPC read requests. Max freq: ~5MHzMaster: 24 system clocks between Master core read requests. Max freq: ~4.16MHz PPC & Master: PPC:18-22 system clocks between PPC read requests. Aver Max freq: ~5MHz Master: 24 system clocks between PPC read requests. Max freq: ~4.16MHz
Technion Digital Lab Project Results Summary Previous results are valid for certain design. For example, additional statements in PPC design will cause additional delay between PPC read requests and therefore, lower read frequency. Power PC – Alternative test implementation
Technion Digital Lab Project Results Summary PPC: 28 sys_clocks between PPC read requests. Max freq: ~3.57MHz Power PC – Alternative test implementation
Technion Digital Lab Project The conclusion: In current configuration Master core throughput is lower than PPC’s. Possible reasons: –PLB protocol limitations in single read transactions (Max freq: sys_clk / 6 = 16.66MHz) –IPIF latency time Results Summary
Technion Digital Lab Project Results Summary PLB protocol limitations
Technion Digital Lab Project Results Summary IPIF latency time In this example, master core initiates single read transaction from slave on PLB through IPIF. We can see 23 clocks between Master’s request till valid data on BUS2IP_DATA signals.
Technion Digital Lab Project Possible improvements PPC: Code optimization (assembly level)Code optimization (assembly level) Avoid single read transactions to the same address whenever possibleAvoid single read transactions to the same address whenever possible Use burst mode and cache transactions.Use burst mode and cache transactions.Master: Avoid using standard Xilinx IPIF when using Master core on PLB (designing of embedded interface with PLB).Avoid using standard Xilinx IPIF when using Master core on PLB (designing of embedded interface with PLB). Use FIFO and burst mode transactions.Use FIFO and burst mode transactions.
Technion Digital Lab Project That’s it!!! That’s it!!!